Description
This model extracts mentions of different types of disease in medical text. It is the version of ner_diseases model augmented with langtest
library.
test_type | before fail_count | after fail_count | before pass_count | after pass_count | minimum pass_rate | before pass_rate | after pass_rate |
---|---|---|---|---|---|---|---|
add_ocr_typo | 266 | 55 | 541 | 752 | 70% | 67% | 93% |
lowercase | 152 | 107 | 665 | 710 | 70% | 81% | 87% |
swap_entities | 120 | 118 | 686 | 688 | 70% | 85% | 85% |
titlecase | 320 | 113 | 497 | 704 | 70% | 61% | 86% |
uppercase | 768 | 175 | 51 | 644 | 70% | 6% | 79% |
weighted average | 1626 | 568 | 2440 | 3498 | 70% | 60.01% | 86.03% |
Predicted Entities
Disease
How to use
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentence_detector = SentenceDetector()\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
embeddings_clinical = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
.setInputCols(["sentence", "token"])\
.setOutputCol("embeddings")
clinical_ner = MedicalNerModel.pretrained("ner_diseases_langtest", "en", "clinical/models") \
.setInputCols(["sentence", "token", "embeddings"]) \
.setOutputCol("ner")
ner_converter = NerConverter() \
.setInputCols(["sentence", "token", "ner"]) \
.setOutputCol("ner_chunk")
nlpPipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, embeddings_clinical, clinical_ner, ner_converter])
model = nlpPipeline.fit(spark.createDataFrame([[""]]).toDF("text"))
result = model.transform(spark.createDataFrame([["""POSTOPERATIVE DIAGNOSES:
1. Epidural fibrosis with nerve root entrapment.
OPERATION PERFORMED:
Left L4-L5 transforaminal neuroplasty with nerve root decompression and lysis of adherence followed by epidural steroid injection."""]], ["text"]))
val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentence_detector = new SentenceDetector()
.setInputCols("document")
.setOutputCol("sentence")
val tokenizer = new Tokenizer()
.setInputCols("sentence")
.setOutputCol("token")
val embeddings_clinical = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
.setInputCols(Array("sentence", "token"))
.setOutputCol("embeddings")
val ner = MedicalNerModel.pretrained("ner_diseases_langtest", "en", "clinical/models")
.setInputCols(Array("sentence", "token", "embeddings"))
.setOutputCol("ner")
val ner_converter = new NerConverter()
.setInputCols(Array("sentence", "token", "ner"))
.setOutputCol("ner_chunk")
val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, embeddings_clinical, ner, ner_converter))
val data = Seq("""POSTOPERATIVE DIAGNOSES:
1. Epidural fibrosis with nerve root entrapment.
OPERATION PERFORMED:
Left L4-L5 transforaminal neuroplasty with nerve root decompression and lysis of adherence followed by epidural steroid injection.""").toDS().toDF("text")
val result = pipeline.fit(data).transform(data)
Results
+------------------------+---------+
|chunk |ner_label|
+------------------------+---------+
|fibrosis |Disease |
|nerve root decompression|Disease |
+------------------------+---------+
Model Information
Model Name: | ner_diseases_langtest |
Compatibility: | Healthcare NLP 5.1.1+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Size: | 14.5 MB |
References
Trained with an augmented version of the i2b2 dataset with embeddings_clinical
.
Benchmarking
label precision recall f1-score support
Disease 0.91 0.93 0.92 1348
micro-avg 0.91 0.93 0.92 1348
macro-avg 0.91 0.93 0.92 1348
weighted-avg 0.91 0.93 0.92 1348