Description
Pretrained named entity recognition (NER) deep learning model for clinical entities. The SparkNLP deep learning model (MedicalNerModel) is inspired by a former state of the art model for NER: Chiu & Nicols, Named Entity Recognition with Bidirectional LSTM-CNN.
The corpus used for model training is provided by European Clinical Case Corpus (E3C), a project aimed at offering a freely available multilingual corpus of semantically annotated clinical narratives.
Predicted Entities
clinical_event
, bodypart
, clinical_condition
, units_measurements
, patient
, date_time
How to use
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentenceDetectorDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "en")\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical","en","clinical/models")\
.setInputCols(["sentence","token"])\
.setOutputCol("embeddings")
ner = MedicalNerModel.pretrained('ner_eu_clinical_case', "en", "clinical/models") \
.setInputCols(["sentence", "token", "embeddings"]) \
.setOutputCol("ner")
ner_converter = NerConverter()\
.setInputCols(["sentence", "token", "ner"])\
.setOutputCol("ner_chunk")
pipeline = pipeline(stages=[
document_assembler,
sentenceDetectorDL,
tokenizer,
word_embeddings,
ner,
ner_converter])
data = spark.createDataFrame([["""A 3-year-old boy with autistic disorder on hospital of pediatric ward A at university hospital. He has no family history of illness or autistic spectrum disorder. The child was diagnosed with a severe communication disorder, with social interaction difficulties and sensory processing delay. Blood work was normal (thyroid-stimulating hormone (TSH), hemoglobin, mean corpuscular volume (MCV), and ferritin). Upper endoscopy also showed a submucosal tumor causing subtotal obstruction of the gastric outlet. Because a gastrointestinal stromal tumor was suspected, distal gastrectomy was performed. Histopathological examination revealed spindle cell proliferation in the submucosal layer."""]]).toDF("text")
result = pipeline.fit(data).transform(data)
val documenter = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentenceDetector = SentenceDetectorDLModel.pretrained()
.setInputCols("document")
.setOutputCol("sentence")
val tokenizer = new Tokenizer()
.setInputCols("sentence")
.setOutputCol("token")
val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical","en","clinical/models")
.setInputCols(Array("sentence","token"))
.setOutputCol("embeddings")
val ner_model = MedicalNerModel.pretrained("ner_eu_clinical_case", "en", "clinical/models")
.setInputCols(Array("sentence", "token", "embeddings"))
.setOutputCol("ner")
val ner_converter = new NerConverter()
.setInputCols(Array("sentence", "token", "ner"))
.setOutputCol("ner_chunk")
val pipeline = new Pipeline().setStages(Array(documenter, sentenceDetector, tokenizer, word_embeddings, ner_model, ner_converter))
val data = Seq(Array("""A 3-year-old boy with autistic disorder on hospital of pediatric ward A at university hospital. He has no family history of illness or autistic spectrum disorder. The child was diagnosed with a severe communication disorder, with social interaction difficulties and sensory processing delay. Blood work was normal (thyroid-stimulating hormone (TSH), hemoglobin, mean corpuscular volume (MCV), and ferritin). Upper endoscopy also showed a submucosal tumor causing subtotal obstruction of the gastric outlet. Because a gastrointestinal stromal tumor was suspected, distal gastrectomy was performed. Histopathological examination revealed spindle cell proliferation in the submucosal layer.""")).toDS().toDF("text")
val result = pipeline.fit(data).transform(data)
import nlu
nlu.load("en.med_ner.clinical_case_eu").predict("""A 3-year-old boy with autistic disorder on hospital of pediatric ward A at university hospital. He has no family history of illness or autistic spectrum disorder. The child was diagnosed with a severe communication disorder, with social interaction difficulties and sensory processing delay. Blood work was normal (thyroid-stimulating hormone (TSH), hemoglobin, mean corpuscular volume (MCV), and ferritin). Upper endoscopy also showed a submucosal tumor causing subtotal obstruction of the gastric outlet. Because a gastrointestinal stromal tumor was suspected, distal gastrectomy was performed. Histopathological examination revealed spindle cell proliferation in the submucosal layer.""")
Results
+------------------------------+------------------+
|chunk |ner_label |
+------------------------------+------------------+
|A 3-year-old boy |patient |
|autistic disorder |clinical_condition|
|He |patient |
|illness |clinical_event |
|autistic spectrum disorder |clinical_condition|
|The child |patient |
|diagnosed |clinical_event |
|disorder |clinical_event |
|difficulties |clinical_event |
|Blood |bodypart |
|work |clinical_event |
|normal |units_measurements|
|hormone |clinical_event |
|hemoglobin |clinical_event |
|volume |clinical_event |
|endoscopy |clinical_event |
|showed |clinical_event |
|tumor |clinical_condition|
|causing |clinical_event |
|obstruction |clinical_event |
|the gastric outlet |bodypart |
|gastrointestinal stromal tumor|clinical_condition|
|suspected |clinical_event |
|gastrectomy |clinical_event |
|examination |clinical_event |
|revealed |clinical_event |
|spindle cell proliferation |clinical_condition|
|the submucosal layer |bodypart |
+------------------------------+------------------+
Model Information
Model Name: | ner_eu_clinical_case |
Compatibility: | Healthcare NLP 4.2.7+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Size: | 849.0 KB |
References
The corpus used for model training is provided by European Clinical Case Corpus (E3C), a project aimed at offering a freely available multilingual corpus of semantically annotated clinical narratives.
Benchmarking
label tp fp fn total precision recall f1
date_time 54.0 7.0 15.0 69.0 0.8852 0.7826 0.8308
units_measurements 111.0 48.0 12.0 123.0 0.6981 0.9024 0.7872
clinical_condition 93.0 47.0 81.0 174.0 0.6643 0.5345 0.5924
patient 119.0 16.0 5.0 124.0 0.8815 0.9597 0.9189
clinical_event 331.0 126.0 89.0 420.0 0.7243 0.7881 0.7548
bodypart 171.0 58.0 84.0 255.0 0.7467 0.6706 0.7066
macro - - - - - - 0.7651
micro - - - - - - 0.7454