Extraction of Clinical Abbreviations and Acronyms

Description

This model is trained to extract clinical abbreviations and acronyms in text.

Predicted Entities

ABBR

Live Demo Open in Colab Download Copy S3 URI

How to use

documentAssembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")

sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")\
.setInputCols(["document"])\
.setOutputCol("sentence")

tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")\

embeddings = WordEmbeddingsModel.pretrained('embeddings_clinical', 'en', 'clinical/models') \
.setInputCols(['sentence', 'token']) \
.setOutputCol('embeddings')

abbr_ner = MedicalNerModel.pretrained('ner_abbreviation_clinical', 'en', 'clinical/models') \
.setInputCols(["sentence", "token", "embeddings"]) \
.setOutputCol("abbr_ner")

abbr_converter = NerConverter() \
.setInputCols(["sentence", "token", "abbr_ner"]) \
.setOutputCol("abbr_ner_chunk")\


ner_pipeline = Pipeline(
stages = [
documentAssembler,
sentenceDetector,
tokenizer,
embeddings,
abbr_ner,
abbr_converter
])

sample_df = spark.createDataFrame([["Gravid with estimated fetal weight of 6-6/12 pounds. LOWER EXTREMITIES: No edema. LABORATORY DATA: Laboratory tests include a CBC which is normal. Blood Type: AB positive. Rubella: Immune. VDRL: Nonreactive. Hepatitis C surface antigen: Negative. HIV: Negative. One-Hour Glucose: 117. Group B strep has not been done as yet."]]).toDF("text")
result = ner_pipeline.fit(sample_df).transform(sample_df)

val documentAssembler = DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")

val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")
.setInputCols(Array("document"))
.setOutputCol("sentence")

val tokenizer = Tokenizer()
.setInputCols(Array("sentence"))
.setOutputCol("token")

val embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models") 
.setInputCols(Array("sentence", "token")) 
.setOutputCol("embeddings")

val abbr_ner = MedicalNerModel.pretrained("ner_abbreviation_clinical", "en", "clinical/models") 
.setInputCols(Array("sentence", "token", "embeddings")) 
.setOutputCol("abbr_ner")

val abbr_converter = NerConverter() 
.setInputCols(Array("sentence", "token", "abbr_ner")) 
.setOutputCol("abbr_ner_chunk")


val ner_pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDetector, tokenizer, embeddings, abbr_ner, abbr_converter))

val sample_df = Seq("Gravid with estimated fetal weight of 6-6/12 pounds. LOWER EXTREMITIES: No edema. LABORATORY DATA: Laboratory tests include a CBC which is normal. Blood Type: AB positive. Rubella: Immune. VDRL: Nonreactive. Hepatitis C surface antigen: Negative. HIV: Negative. One-Hour Glucose: 117. Group B strep has not been done as yet.").toDF("text")
val result = ner_pipeline.fit(sample_df).transform(sample_df)

import nlu
nlu.load("en.med_ner.abbreviation_clinical").predict("""Gravid with estimated fetal weight of 6-6/12 pounds. LOWER EXTREMITIES: No edema. LABORATORY DATA: Laboratory tests include a CBC which is normal. Blood Type: AB positive. Rubella: Immune. VDRL: Nonreactive. Hepatitis C surface antigen: Negative. HIV: Negative. One-Hour Glucose: 117. Group B strep has not been done as yet.""")

Results

+-----+---------+
|chunk|ner_label|
+-----+---------+
|CBC  |ABBR     |
|AB   |ABBR     |
|VDRL |ABBR     |
|HIV  |ABBR     |
+-----+---------+

Model Information

Model Name:	ner_abbreviation_clinical
Compatibility:	Healthcare NLP 3.3.4+
License:	Licensed
Edition:	Official
Input Labels:	[sentence, token, embeddings]
Output Labels:	[ner]
Language:	en
Size:	14.6 MB

Data Source

Trained on the in-house dataset.

Benchmarking

Quality on validation dataset (20.0%), validation examples = 454
time to finish evaluation: 5.34s

+-------+------+------+------+----------+------+------+
| Label |    tp|    fp|    fn| precision|recall|    f1|
+-------+------+------+------+----------+------+------+
| B-ABBR| 672.0|  42.0|  40.0|    0.9411|0.9438|0.9424|
+-------+------+------+------+----------+------+------+

+------------+----------+--------+--------+
|            | precision|  recall|      f1|
+------------+----------+--------+--------+
|       macro|    0.9411|  0.9438|  0.9424|
+------------+----------+--------+--------+
|       micro|    0.9411|  0.9438|  0.9424|
+------------+----------+--------+--------+

PREVIOUSSentence Entity Resolver for RxNorm (sbluebert_base_uncased_mli embeddings)

NEXTSentence Entity Resolver for LOINC (sbluebert_base_uncased_mli embeddings)