Description
Extract general medical terms in text like body parts, cells, genes, symptoms, etc in text using pretrained NER model.
Predicted Entities
Qualitative_Concept, Organization, Manufactured_Object, Amino_Acid,_Peptide,_or_Protein, Pharmacologic_Substance, Professional_or_Occupational_Group, Cell_Component, Neoplastic_Process, Substance, Laboratory_Procedure, Nucleic_Acid,_Nucleoside,_or_Nucleotide, Research_Activity, Gene_or_Genome, Indicator,_Reagent,_or_Diagnostic_Aid, Biologic_Function, Chemical, Mammal, Molecular_Function, Quantitative_Concept, Prokaryote, Mental_or_Behavioral_Dysfunction, Injury_or_Poisoning, Body_Location_or_Region, Spatial_Concept, Nucleotide_Sequence, Tissue, Pathologic_Function, Body_Substance, Fungus, Mental_Process, Medical_Device, Plant, Health_Care_Activity, Clinical_Attribute, Genetic_Function, Food, Therapeutic_or_Preventive_Procedure, Body_Part,_Organ,_or_Organ_Component, Geographic_Area, Virus, Biomedical_or_Dental_Material, Diagnostic_Procedure, Eukaryote, Anatomical_Structure, Organism_Attribute, Molecular_Biology_Research_Technique, Organic_Chemical, Cell, Daily_or_Recreational_Activity, Population_Group, Disease_or_Syndrome, Group, Sign_or_Symptom, Body_System
How to use
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentence_detector = SentenceDetector()\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
embeddings_clinical = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
.setInputCols(["sentence", "token"])\
.setOutputCol("embeddings")
clinical_ner = MedicalNerModel.pretrained("ner_medmentions_coarse", "en", "clinical/models")\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")
ner_converter = NerConverter()\
.setInputCols(["sentence", "token", "ner"])\
.setOutputCol("ner_chunk")
nlpPipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, embeddings_clinical, clinical_ner, ner_converter])
model = nlpPipeline.fit(spark.createDataFrame([[""]]).toDF("text"))
results = model.transform(spark.createDataFrame([["EXAMPLE_TEXT"]]).toDF("text"))
val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentence_detector = new SentenceDetector()
.setInputCols("document")
.setOutputCol("sentence")
val tokenizer = new Tokenizer()
.setInputCols("sentence")
.setOutputCol("token")
val embeddings_clinical = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
.setInputCols(Array("sentence", "token"))
.setOutputCol("embeddings")
val ner = MedicalNerModel.pretrained("ner_medmentions_coarse", "en", "clinical/models")
.setInputCols(Array("sentence", "token", "embeddings"))
.setOutputCol("ner")
val ner_converter = new NerConverter()
.setInputCols(Array("sentence", "token", "ner"))
.setOutputCol("ner_chunk")
val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, embeddings_clinical, ner, ner_converter))
val result = pipeline.fit(data).transform(data)
import nlu
nlu.load("en.med_ner.medmentions").predict("""Put your text here.""")
Model Information
| Model Name: | ner_medmentions_coarse |
| Compatibility: | Healthcare NLP 3.0.0+ |
| License: | Licensed |
| Edition: | Official |
| Input Labels: | [document, token, embeddings] |
| Output Labels: | [ner] |
| Language: | en |