Extraction of Clinical Abbreviations and Acronyms (LangTest)


This model is trained to extract clinical acronyms and acronyms from text. It is the version of ner_abbreviation_clinical model augmented with langtest library.

test_type before fail_count after fail_count before pass_count after pass_count minimum pass_rate before pass_rate after pass_rate
lowercase 351 78 223 496 90% 39% 86%
titlecase 325 73 248 500 85% 43% 87%
uppercase 117 47 382 452 90% 77% 91%
weighted average 793 198 853 1448 88.33% 51.82% 87.97%

Predicted Entities


Copy S3 URI

How to use

document_assembler = DocumentAssembler()\

sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models")\

tokenizer = Tokenizer()\

embeddings = WordEmbeddingsModel.pretrained('embeddings_clinical', 'en', 'clinical/models') \
    .setInputCols(['sentence', 'token']) \

ner_model = MedicalNerModel.pretrained('ner_clinical_abbreviation_langtest', 'en', 'clinical/models') \
    .setInputCols(["sentence", "token", "embeddings"]) \

ner_converter = NerConverter() \
    .setInputCols(["sentence", "token", "abbr_ner"]) \

ner_pipeline = Pipeline(
        stages = [

text = """Gravid with an Estimated Fetal Weight of 6-6/12 Pounds. Lower Extremities: There are no signs of edema in the lower extremities. Laboratory Data: Laboratory tests revealed a normal cbc. Blood Type: The patient's blood type has been identified as AB Positive. Rubella Status: The patient has confirmed immunity to rub. VDRL Test: The vdrl test for syphilis is nonreactive. Hepatitis C Screening (anti-hcv): The screening for Hepatitis C surface antigen returned a negative result. Testing for hiv showed a negative outcome."""

data = spark.createDataFrame([[text]]).toDF("text")

result = ner_pipeline.fit(data).transform(data)
val document_assembler = DocumentAssembler()

val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")

val tokenizer = Tokenizer()

val embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models") 
    .setInputCols(Array("sentence", "token")) 

val ner_model = MedicalNerModel.pretrained("ner_clinical_abbreviation_langtest", "en", "clinical/models") 
    .setInputCols(Array("sentence", "token", "embeddings")) 

val ner_converter = NerConverter() 
    .setInputCols(Array("sentence", "token", "ner")) 

val ner_pipeline = new Pipeline().setStages(Array(document_assembler, sentence_aetector, tokenizer, embeddings, ner_model, ner_converter))

val data = Seq("Gravid with an Estimated Fetal Weight of 6-6/12 Pounds. Lower Extremities: There are no signs of edema in the lower extremities. Laboratory Data: Laboratory tests revealed a normal cbc. Blood Type: The patient's blood type has been identified as AB Positive. Rubella Status: The patient has confirmed immunity to rub. VDRL Test: The vdrl test for syphilis is nonreactive. Hepatitis C Screening (anti-hcv): The screening for Hepatitis C surface antigen returned a negative result. Testing for hiv showed a negative outcome.").toDF("text")

val result = ner_pipeline.fit(data).transform(data)


|chunk   |ner_label|
|cbc     |ABBR     |
|VDRL    |ABBR     |
|vdrl    |ABBR     |
|anti-hcv|ABBR     |
|hiv     |ABBR     |

Model Information

Model Name: ner_clinical_abbreviation_langtest
Compatibility: Healthcare NLP 5.2.1+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en
Size: 14.7 MB

Sample text from the training dataset

Trained on the in-house dataset.


label         precision  recall  f1-score  support 
ABBR          0.90       0.94    0.92      683     
micro-avg     0.90       0.94    0.92      683     
macro-avg     0.90       0.94    0.92      683     
weighted-avg  0.90       0.94    0.92      683