Detect mentions of general medical terms (coarse)

Description

Extract general medical terms in text like body parts, cells, genes, symptoms, etc in text using pretrained NER model.

Predicted Entities

Qualitative_Concept, Organization, Manufactured_Object, Amino_Acid,_Peptide,_or_Protein, Pharmacologic_Substance, Professional_or_Occupational_Group, Cell_Component, Neoplastic_Process, Substance, Laboratory_Procedure, Nucleic_Acid,_Nucleoside,_or_Nucleotide, Research_Activity, Gene_or_Genome, Indicator,_Reagent,_or_Diagnostic_Aid, Biologic_Function, Chemical, Mammal, Molecular_Function, Quantitative_Concept, Prokaryote, Mental_or_Behavioral_Dysfunction, Injury_or_Poisoning, Body_Location_or_Region, Spatial_Concept, Nucleotide_Sequence, Tissue, Pathologic_Function, Body_Substance, Fungus, Mental_Process, Medical_Device, Plant, Health_Care_Activity, Clinical_Attribute, Genetic_Function, Food, Therapeutic_or_Preventive_Procedure, Body_Part,_Organ,_or_Organ_Component, Geographic_Area, Virus, Biomedical_or_Dental_Material, Diagnostic_Procedure, Eukaryote, Anatomical_Structure, Organism_Attribute, Molecular_Biology_Research_Technique, Organic_Chemical, Cell, Daily_or_Recreational_Activity, Population_Group, Disease_or_Syndrome, Group, Sign_or_Symptom, Body_System

Live Demo Open in Colab Download

How to use


...
embeddings_clinical = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")  .setInputCols(["sentence", "token"])  .setOutputCol("embeddings")
clinical_ner = MedicalNerModel.pretrained("ner_medmentions_coarse", "en", "clinical/models")   .setInputCols(["sentence", "token", "embeddings"])   .setOutputCol("ner")
...
nlpPipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, embeddings_clinical, clinical_ner, ner_converter])
model = nlpPipeline.fit(spark.createDataFrame([[""]]).toDF("text"))
results = model.transform(spark.createDataFrame([["EXAMPLE_TEXT"]]).toDF("text"))

...
val embeddings_clinical = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
  .setInputCols(Array("sentence", "token"))
  .setOutputCol("embeddings")
val ner = MedicalNerModel.pretrained("ner_medmentions_coarse", "en", "clinical/models")
  .setInputCols(Array("sentence", "token", "embeddings"))
  .setOutputCol("ner")
...
val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, embeddings_clinical, ner, ner_converter))
val result = pipeline.fit(Seq.empty[""].toDS.toDF("text")).transform(data)

Model Information

Model Name: ner_medmentions_coarse
Compatibility: Spark NLP for Healthcare 3.0.0+
License: Licensed
Edition: Official
Input Labels: [document, token, embeddings]
Output Labels: [ner]
Language: en