Description
Pretrained named entity recognition (NER) deep learning model for clinical conditions. The SparkNLP deep learning model (MedicalNerModel) is inspired by a former state of the art model for NER: Chiu & Nichols, Named Entity Recognition with Bidirectional LSTM-CNN.
The corpus used for model training is provided by European Clinical Case Corpus (E3C), a project aimed at offering a freely available multilingual corpus of semantically annotated clinical narratives.
Predicted Entities
clinical_condition
How to use
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentenceDetectorDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "en")\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical","en","clinical/models")\
.setInputCols(["sentence","token"])\
.setOutputCol("embeddings")
ner = MedicalNerModel.pretrained('ner_eu_clinical_condition', "en", "clinical/models") \
.setInputCols(["sentence", "token", "embeddings"]) \
.setOutputCol("ner")
ner_converter = NerConverterInternal()\
.setInputCols(["sentence", "token", "ner"])\
.setOutputCol("ner_chunk")
pipeline = Pipeline(stages=[
document_assembler,
sentenceDetectorDL,
tokenizer,
word_embeddings,
ner,
ner_converter])
data = spark.createDataFrame([["""Hyperparathyroidism was considered upon the fourth occasion. The history of weakness and generalized joint pains were present. He also had history of epigastric pain diagnosed informally as gastritis. He had previously had open reduction and internal fixation for the initial two fractures under general anesthesia. He sustained mandibular fracture."""]]).toDF("text")
result = pipeline.fit(data).transform(data)
val documenter = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentenceDetector = SentenceDetectorDLModel.pretrained()
.setInputCols("document")
.setOutputCol("sentence")
val tokenizer = new Tokenizer()
.setInputCols("sentence")
.setOutputCol("token")
val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical","en","clinical/models")
.setInputCols(Array("sentence","token"))
.setOutputCol("embeddings")
val ner_model = MedicalNerModel.pretrained("ner_eu_clinical_condition", "en", "clinical/models")
.setInputCols(Array("sentence", "token", "embeddings"))
.setOutputCol("ner")
val ner_converter = new NerConverterInternal()
.setInputCols(Array("sentence", "token", "ner"))
.setOutputCol("ner_chunk")
val pipeline = new Pipeline().setStages(Array(documenter, sentenceDetector, tokenizer, word_embeddings, ner_model, ner_converter))
val data = Seq(Array("""Hyperparathyroidism was considered upon the fourth occasion. The history of weakness and generalized joint pains were present. He also had history of epigastric pain diagnosed informally as gastritis. He had previously had open reduction and internal fixation for the initial two fractures under general anesthesia. He sustained mandibular fracture.""")).toDS().toDF("text")
val result = pipeline.fit(data).transform(data)
Results
+-----------------------+------------------+
|chunk |ner_label |
+-----------------------+------------------+
|Hyperparathyroidism |clinical_condition|
|weakness |clinical_condition|
|generalized joint pains|clinical_condition|
|epigastric pain |clinical_condition|
|gastritis |clinical_condition|
|fractures |clinical_condition|
|anesthesia |clinical_condition|
|mandibular fracture |clinical_condition|
+-----------------------+------------------+
Model Information
Model Name: | ner_eu_clinical_condition |
Compatibility: | Healthcare NLP 4.2.8+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Size: | 851.3 KB |
References
The corpus used for model training is provided by European Clinical Case Corpus (E3C), a project aimed at offering a freely available multilingual corpus of semantically annotated clinical narratives.
Benchmarking
label tp fp fn total precision recall f1
clinical_event 230.0 28.0 70.0 300.0 0.8915 0.7667 0.8244
macro - - - - - - 0.8244
micro - - - - - - 0.8244