Extraction of Clinical Abbreviations and Acronyms (LangTest)

Description

This model is trained to extract clinical acronyms and acronyms from text. It is the version of ner_abbreviation_clinical model augmented with langtest library.

test_type before fail_count after fail_count before pass_count after pass_count minimum pass_rate before pass_rate after pass_rate
lowercase 351 78 223 496 90% 39% 86%
titlecase 325 73 248 500 85% 43% 87%
uppercase 117 47 382 452 90% 77% 91%
weighted average 793 198 853 1448 88.33% 51.82% 87.97%

Predicted Entities

ABBR

Copy S3 URI

How to use

document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models")\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")\

embeddings = WordEmbeddingsModel.pretrained('embeddings_clinical', 'en', 'clinical/models') \
    .setInputCols(['sentence', 'token']) \
    .setOutputCol('embeddings')

ner_model = MedicalNerModel.pretrained('ner_clinical_abbreviation_langtest', 'en', 'clinical/models') \
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("ner")

ner_converter = NerConverter() \
    .setInputCols(["sentence", "token", "abbr_ner"]) \
    .setOutputCol("ner_chunk")\


ner_pipeline = Pipeline(
        stages = [
        document_assembler,
        sentence_detector,
        tokenizer,
        embeddings,
        ner_model,
        ner_converter
])

text = """Gravid with an Estimated Fetal Weight of 6-6/12 Pounds. Lower Extremities: There are no signs of edema in the lower extremities. Laboratory Data: Laboratory tests revealed a normal cbc. Blood Type: The patient's blood type has been identified as AB Positive. Rubella Status: The patient has confirmed immunity to rub. VDRL Test: The vdrl test for syphilis is nonreactive. Hepatitis C Screening (anti-hcv): The screening for Hepatitis C surface antigen returned a negative result. Testing for hiv showed a negative outcome."""

data = spark.createDataFrame([[text]]).toDF("text")

result = ner_pipeline.fit(data).transform(data)
val document_assembler = DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("document")

val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")
    .setInputCols(Array("document"))
    .setOutputCol("sentence")

val tokenizer = Tokenizer()
    .setInputCols(Array("sentence"))
    .setOutputCol("token")

val embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models") 
    .setInputCols(Array("sentence", "token")) 
    .setOutputCol("embeddings")

val ner_model = MedicalNerModel.pretrained("ner_clinical_abbreviation_langtest", "en", "clinical/models") 
    .setInputCols(Array("sentence", "token", "embeddings")) 
    .setOutputCol("ner")

val ner_converter = NerConverter() 
    .setInputCols(Array("sentence", "token", "ner")) 
    .setOutputCol("ner_chunk")


val ner_pipeline = new Pipeline().setStages(Array(document_assembler, sentence_aetector, tokenizer, embeddings, ner_model, ner_converter))

val data = Seq("Gravid with an Estimated Fetal Weight of 6-6/12 Pounds. Lower Extremities: There are no signs of edema in the lower extremities. Laboratory Data: Laboratory tests revealed a normal cbc. Blood Type: The patient's blood type has been identified as AB Positive. Rubella Status: The patient has confirmed immunity to rub. VDRL Test: The vdrl test for syphilis is nonreactive. Hepatitis C Screening (anti-hcv): The screening for Hepatitis C surface antigen returned a negative result. Testing for hiv showed a negative outcome.").toDF("text")

val result = ner_pipeline.fit(data).transform(data)

Results

+--------+---------+
|chunk   |ner_label|
+--------+---------+
|cbc     |ABBR     |
|VDRL    |ABBR     |
|vdrl    |ABBR     |
|anti-hcv|ABBR     |
|hiv     |ABBR     |
+--------+---------+

Model Information

Model Name: ner_clinical_abbreviation_langtest
Compatibility: Healthcare NLP 5.2.1+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en
Size: 14.7 MB

Sample text from the training dataset

Trained on the in-house dataset.

Benchmarking

label         precision  recall  f1-score  support 
ABBR          0.90       0.94    0.92      683     
micro-avg     0.90       0.94    0.92      683     
macro-avg     0.90       0.94    0.92      683     
weighted-avg  0.90       0.94    0.92      683