Detect mentions of general medical terms (coarse)

Description

Extract general medical terms in text like body parts, cells, genes, symptoms, etc in text using pretrained NER model.

Predicted Entities

Qualitative_Concept, Organization, Manufactured_Object, Amino_Acid,_Peptide,_or_Protein, Pharmacologic_Substance, Professional_or_Occupational_Group, Cell_Component, Neoplastic_Process, Substance, Laboratory_Procedure, Nucleic_Acid,_Nucleoside,_or_Nucleotide, Research_Activity, Gene_or_Genome, Indicator,_Reagent,_or_Diagnostic_Aid, Biologic_Function, Chemical, Mammal, Molecular_Function, Quantitative_Concept, Prokaryote, Mental_or_Behavioral_Dysfunction, Injury_or_Poisoning, Body_Location_or_Region, Spatial_Concept, Nucleotide_Sequence, Tissue, Pathologic_Function, Body_Substance, Fungus, Mental_Process, Medical_Device, Plant, Health_Care_Activity, Clinical_Attribute, Genetic_Function, Food, Therapeutic_or_Preventive_Procedure, Body_Part,_Organ,_or_Organ_Component, Geographic_Area, Virus, Biomedical_or_Dental_Material, Diagnostic_Procedure, Eukaryote, Anatomical_Structure, Organism_Attribute, Molecular_Biology_Research_Technique, Organic_Chemical, Cell, Daily_or_Recreational_Activity, Population_Group, Disease_or_Syndrome, Group, Sign_or_Symptom, Body_System

Open in Colab Copy S3 URI

How to use

document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")
         
sentence_detector = SentenceDetector()\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

embeddings_clinical = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols(["sentence", "token"])\
    .setOutputCol("embeddings")

clinical_ner = MedicalNerModel.pretrained("ner_medmentions_coarse", "en", "clinical/models")\
    .setInputCols(["sentence", "token", "embeddings"])\
    .setOutputCol("ner")

ner_converter = NerConverter()\
 	.setInputCols(["sentence", "token", "ner"])\
 	.setOutputCol("ner_chunk")

nlpPipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, embeddings_clinical, clinical_ner, ner_converter])

model = nlpPipeline.fit(spark.createDataFrame([[""]]).toDF("text"))

results = model.transform(spark.createDataFrame([["EXAMPLE_TEXT"]]).toDF("text"))
val document_assembler = new DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("document")
         
val sentence_detector = new SentenceDetector()
    .setInputCols("document")
    .setOutputCol("sentence")

val tokenizer = new Tokenizer()
    .setInputCols("sentence")
    .setOutputCol("token")

val embeddings_clinical = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
    .setInputCols(Array("sentence", "token"))
    .setOutputCol("embeddings")

val ner = MedicalNerModel.pretrained("ner_medmentions_coarse", "en", "clinical/models")
    .setInputCols(Array("sentence", "token", "embeddings"))
    .setOutputCol("ner")

val ner_converter = new NerConverter()
 	.setInputCols(Array("sentence", "token", "ner"))
 	.setOutputCol("ner_chunk")

val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, embeddings_clinical, ner, ner_converter))

val result = pipeline.fit(data).transform(data)
import nlu
nlu.load("en.med_ner.medmentions").predict("""Put your text here.""")

Model Information

Model Name: ner_medmentions_coarse
Compatibility: Healthcare NLP 3.0.0+
License: Licensed
Edition: Official
Input Labels: [document, token, embeddings]
Output Labels: [ner]
Language: en