Description
This model is trained to extract clinical abbreviations and acronyms in text.
Predicted Entities
ABBR
Live Demo Open in Colab Copy S3 URI
How to use
documentAssembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")\
embeddings = WordEmbeddingsModel.pretrained('embeddings_clinical', 'en', 'clinical/models') \
.setInputCols(['sentence', 'token']) \
.setOutputCol('embeddings')
abbr_ner = MedicalNerModel.pretrained('ner_abbreviation_clinical', 'en', 'clinical/models') \
.setInputCols(["sentence", "token", "embeddings"]) \
.setOutputCol("abbr_ner")
abbr_converter = NerConverter() \
.setInputCols(["sentence", "token", "abbr_ner"]) \
.setOutputCol("abbr_ner_chunk")\
ner_pipeline = Pipeline(
stages = [
documentAssembler,
sentenceDetector,
tokenizer,
embeddings,
abbr_ner,
abbr_converter
])
sample_df = spark.createDataFrame([["Gravid with estimated fetal weight of 6-6/12 pounds. LOWER EXTREMITIES: No edema. LABORATORY DATA: Laboratory tests include a CBC which is normal. Blood Type: AB positive. Rubella: Immune. VDRL: Nonreactive. Hepatitis C surface antigen: Negative. HIV: Negative. One-Hour Glucose: 117. Group B strep has not been done as yet."]]).toDF("text")
result = ner_pipeline.fit(sample_df).transform(sample_df)
val documentAssembler = DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")
.setInputCols(Array("document"))
.setOutputCol("sentence")
val tokenizer = Tokenizer()
.setInputCols(Array("sentence"))
.setOutputCol("token")
val embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
.setInputCols(Array("sentence", "token"))
.setOutputCol("embeddings")
val abbr_ner = MedicalNerModel.pretrained("ner_abbreviation_clinical", "en", "clinical/models")
.setInputCols(Array("sentence", "token", "embeddings"))
.setOutputCol("abbr_ner")
val abbr_converter = NerConverter()
.setInputCols(Array("sentence", "token", "abbr_ner"))
.setOutputCol("abbr_ner_chunk")
val ner_pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDetector, tokenizer, embeddings, abbr_ner, abbr_converter))
val sample_df = Seq("Gravid with estimated fetal weight of 6-6/12 pounds. LOWER EXTREMITIES: No edema. LABORATORY DATA: Laboratory tests include a CBC which is normal. Blood Type: AB positive. Rubella: Immune. VDRL: Nonreactive. Hepatitis C surface antigen: Negative. HIV: Negative. One-Hour Glucose: 117. Group B strep has not been done as yet.").toDF("text")
val result = ner_pipeline.fit(sample_df).transform(sample_df)
import nlu
nlu.load("en.med_ner.abbreviation_clinical").predict("""Gravid with estimated fetal weight of 6-6/12 pounds. LOWER EXTREMITIES: No edema. LABORATORY DATA: Laboratory tests include a CBC which is normal. Blood Type: AB positive. Rubella: Immune. VDRL: Nonreactive. Hepatitis C surface antigen: Negative. HIV: Negative. One-Hour Glucose: 117. Group B strep has not been done as yet.""")
Results
+-----+---------+
|chunk|ner_label|
+-----+---------+
|CBC |ABBR |
|AB |ABBR |
|VDRL |ABBR |
|HIV |ABBR |
+-----+---------+
Model Information
Model Name: | ner_abbreviation_clinical |
Compatibility: | Healthcare NLP 3.3.4+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Size: | 14.6 MB |
Data Source
Trained on the in-house dataset.
Benchmarking
Quality on validation dataset (20.0%), validation examples = 454
time to finish evaluation: 5.34s
+-------+------+------+------+----------+------+------+
| Label | tp| fp| fn| precision|recall| f1|
+-------+------+------+------+----------+------+------+
| B-ABBR| 672.0| 42.0| 40.0| 0.9411|0.9438|0.9424|
+-------+------+------+------+----------+------+------+
+------------+----------+--------+--------+
| | precision| recall| f1|
+------------+----------+--------+--------+
| macro| 0.9411| 0.9438| 0.9424|
+------------+----------+--------+--------+
| micro| 0.9411| 0.9438| 0.9424|
+------------+----------+--------+--------+