Description
This model is trained to extract clinical acronyms and acronyms from text. It is the version of ner_abbreviation_clinical model augmented with langtest
library.
test_type | before fail_count | after fail_count | before pass_count | after pass_count | minimum pass_rate | before pass_rate | after pass_rate |
---|---|---|---|---|---|---|---|
lowercase | 351 | 78 | 223 | 496 | 90% | 39% | 86% |
titlecase | 325 | 73 | 248 | 500 | 85% | 43% | 87% |
uppercase | 117 | 47 | 382 | 452 | 90% | 77% | 91% |
weighted average | 793 | 198 | 853 | 1448 | 88.33% | 51.82% | 87.97% |
Predicted Entities
ABBR
How to use
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models")\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")\
embeddings = WordEmbeddingsModel.pretrained('embeddings_clinical', 'en', 'clinical/models') \
.setInputCols(['sentence', 'token']) \
.setOutputCol('embeddings')
ner_model = MedicalNerModel.pretrained('ner_clinical_abbreviation_langtest', 'en', 'clinical/models') \
.setInputCols(["sentence", "token", "embeddings"]) \
.setOutputCol("ner")
ner_converter = NerConverter() \
.setInputCols(["sentence", "token", "abbr_ner"]) \
.setOutputCol("ner_chunk")\
ner_pipeline = Pipeline(
stages = [
document_assembler,
sentence_detector,
tokenizer,
embeddings,
ner_model,
ner_converter
])
text = """Gravid with an Estimated Fetal Weight of 6-6/12 Pounds. Lower Extremities: There are no signs of edema in the lower extremities. Laboratory Data: Laboratory tests revealed a normal cbc. Blood Type: The patient's blood type has been identified as AB Positive. Rubella Status: The patient has confirmed immunity to rub. VDRL Test: The vdrl test for syphilis is nonreactive. Hepatitis C Screening (anti-hcv): The screening for Hepatitis C surface antigen returned a negative result. Testing for hiv showed a negative outcome."""
data = spark.createDataFrame([[text]]).toDF("text")
result = ner_pipeline.fit(data).transform(data)
val document_assembler = DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")
.setInputCols(Array("document"))
.setOutputCol("sentence")
val tokenizer = Tokenizer()
.setInputCols(Array("sentence"))
.setOutputCol("token")
val embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
.setInputCols(Array("sentence", "token"))
.setOutputCol("embeddings")
val ner_model = MedicalNerModel.pretrained("ner_clinical_abbreviation_langtest", "en", "clinical/models")
.setInputCols(Array("sentence", "token", "embeddings"))
.setOutputCol("ner")
val ner_converter = NerConverter()
.setInputCols(Array("sentence", "token", "ner"))
.setOutputCol("ner_chunk")
val ner_pipeline = new Pipeline().setStages(Array(document_assembler, sentence_aetector, tokenizer, embeddings, ner_model, ner_converter))
val data = Seq("Gravid with an Estimated Fetal Weight of 6-6/12 Pounds. Lower Extremities: There are no signs of edema in the lower extremities. Laboratory Data: Laboratory tests revealed a normal cbc. Blood Type: The patient's blood type has been identified as AB Positive. Rubella Status: The patient has confirmed immunity to rub. VDRL Test: The vdrl test for syphilis is nonreactive. Hepatitis C Screening (anti-hcv): The screening for Hepatitis C surface antigen returned a negative result. Testing for hiv showed a negative outcome.").toDF("text")
val result = ner_pipeline.fit(data).transform(data)
Results
+--------+---------+
|chunk |ner_label|
+--------+---------+
|cbc |ABBR |
|VDRL |ABBR |
|vdrl |ABBR |
|anti-hcv|ABBR |
|hiv |ABBR |
+--------+---------+
Model Information
Model Name: | ner_clinical_abbreviation_langtest |
Compatibility: | Healthcare NLP 5.2.1+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Size: | 14.7 MB |
Sample text from the training dataset
Trained on the in-house dataset.
Benchmarking
label precision recall f1-score support
ABBR 0.90 0.94 0.92 683
micro-avg 0.90 0.94 0.92 683
macro-avg 0.90 0.94 0.92 683
weighted-avg 0.90 0.94 0.92 683