Extraction of Clinical Abbreviations and Acronyms


This model is trained to extract clinical abbreviations and acronyms in text.

Predicted Entities


How to use

documentAssembler = DocumentAssembler()\

sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")\

tokenizer = Tokenizer()\

embeddings = WordEmbeddingsModel.pretrained('embeddings_clinical', 'en', 'clinical/models') \
.setInputCols(['sentence', 'token']) \

abbr_ner = MedicalNerModel.pretrained('ner_abbreviation_clinical', 'en', 'clinical/models') \
.setInputCols(["sentence", "token", "embeddings"]) \

abbr_converter = NerConverter() \
.setInputCols(["sentence", "token", "abbr_ner"]) \

ner_pipeline = Pipeline(
stages = [

sample_df = spark.createDataFrame([["Gravid with estimated fetal weight of 6-6/12 pounds. LOWER EXTREMITIES: No edema. LABORATORY DATA: Laboratory tests include a CBC which is normal. Blood Type: AB positive. Rubella: Immune. VDRL: Nonreactive. Hepatitis C surface antigen: Negative. HIV: Negative. One-Hour Glucose: 117. Group B strep has not been done as yet."]]).toDF("text")
result = ner_pipeline.fit(sample_df).transform(sample_df)
val documentAssembler = DocumentAssembler()

val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")

val tokenizer = Tokenizer()

val embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models") 
.setInputCols(Array("sentence", "token")) 

val abbr_ner = MedicalNerModel.pretrained("ner_abbreviation_clinical", "en", "clinical/models") 
.setInputCols(Array("sentence", "token", "embeddings")) 

val abbr_converter = NerConverter() 
.setInputCols(Array("sentence", "token", "abbr_ner")) 

val ner_pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDetector, tokenizer, embeddings, abbr_ner, abbr_converter))

val sample_df = Seq("Gravid with estimated fetal weight of 6-6/12 pounds. LOWER EXTREMITIES: No edema. LABORATORY DATA: Laboratory tests include a CBC which is normal. Blood Type: AB positive. Rubella: Immune. VDRL: Nonreactive. Hepatitis C surface antigen: Negative. HIV: Negative. One-Hour Glucose: 117. Group B strep has not been done as yet.").toDF("text")
val result = ner_pipeline.fit(sample_df).transform(sample_df)
import nlu
nlu.load("en.med_ner.abbreviation_clinical").predict("""Gravid with estimated fetal weight of 6-6/12 pounds. LOWER EXTREMITIES: No edema. LABORATORY DATA: Laboratory tests include a CBC which is normal. Blood Type: AB positive. Rubella: Immune. VDRL: Nonreactive. Hepatitis C surface antigen: Negative. HIV: Negative. One-Hour Glucose: 117. Group B strep has not been done as yet.""")


|CBC  |ABBR     |
|AB   |ABBR     |
|VDRL |ABBR     |
|HIV  |ABBR     |

Model Information

Model Name: ner_abbreviation_clinical
Compatibility: Healthcare NLP 3.3.4+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en
Size: 14.6 MB

Data Source

Trained on the in-house dataset.


Quality on validation dataset (20.0%), validation examples = 454
time to finish evaluation: 5.34s

| Label |    tp|    fp|    fn| precision|recall|    f1|
| B-ABBR| 672.0|  42.0|  40.0|    0.9411|0.9438|0.9424|

|            | precision|  recall|      f1|
|       macro|    0.9411|  0.9438|  0.9424|
|       micro|    0.9411|  0.9438|  0.9424|