Detect Radiology Concepts - WIP (biobert)

Description

Extract clinical entities from Radiology reports using pretrained NER model.

Predicted Entities

Test_Result, OtherFindings, BodyPart, ImagingFindings, Disease_Syndrome_Disorder, ImagingTest, Measurements, Procedure, Score, Test, Medical_Device, Direction, Symptom, Imaging_Technique, ManualFix, Units

Open in Colab Download

How to use

...
embeddings_clinical = BertEmbeddings.pretrained('biobert_pubmed_base_cased') \
    .setInputCols(['sentence', 'token']) \
    .setOutputCol('embeddings')
clinical_ner = MedicalNerModel.pretrained("jsl_rd_ner_wip_greedy_biobert", "en", "clinical/models") \
  .setInputCols(["sentence", "token", "embeddings"]) \
  .setOutputCol("ner")
...
nlpPipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, embeddings_clinical,  clinical_ner, ner_converter])
model = nlpPipeline.fit(spark.createDataFrame([[""]]).toDF("text"))
results = model.transform(spark.createDataFrame([["Bilateral breast ultrasound was subsequently performed, which demonstrated an ovoid mass measuring approximately 0.5 x 0.5 x 0.4 cm in diameter located within the anteromedial aspect of the left shoulder. This mass demonstrates isoechoic echotexture to the adjacent muscle, with no evidence of internal color flow. This may represent benign fibrous tissue or a lipoma."]], ["text"]))
...
val embeddings_clinical = BertEmbeddings.pretrained("biobert_pubmed_base_cased")
   .setInputCols(["sentence", "token"])
   .setOutputCol("embeddings")
val ner = MedicalNerModel.pretrained("jsl_rd_ner_wip_greedy_biobert", "en", "clinical/models") 
  .setInputCols("sentence", "token", "embeddings")
  .setOutputCol("ner")
...
val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, embeddings_clinical, ner, ner_converter))
val data = Seq("Bilateral breast ultrasound was subsequently performed, which demonstrated an ovoid mass measuring approximately 0.5 x 0.5 x 0.4 cm in diameter located within the anteromedial aspect of the left shoulder. This mass demonstrates isoechoic echotexture to the adjacent muscle, with no evidence of internal color flow. This may represent benign fibrous tissue or a lipoma.").toDF("text")
val result = pipeline.fit(data).transform(data)

Results

|    | chunk                 | entity                    |
|---:|:----------------------|:--------------------------|
|  0 | Bilateral             | Direction                 |
|  1 | breast                | BodyPart                  |
|  2 | ultrasound            | ImagingTest               |
|  3 | ovoid mass            | ImagingFindings           |
|  4 | 0.5 x 0.5 x 0.4       | Measurements              |
|  5 | cm                    | Units                     |
|  6 | left                  | Direction                 |
|  7 | shoulder              | BodyPart                  |
|  8 | mass                  | ImagingFindings           |
|  9 | isoechoic echotexture | ImagingFindings           |
| 10 | muscle                | BodyPart                  |
| 11 | internal color flow   | ImagingFindings           |
| 12 | benign fibrous tissue | ImagingFindings           |
| 13 | lipoma                | Disease_Syndrome_Disorder |

Model Information

Model Name: jsl_rd_ner_wip_greedy_biobert
Compatibility: Spark NLP for Healthcare 3.1.3+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en

Data Source

Trained on Dataset annotated by John Snow Labs

Benchmarking

label	 tp	 fp	 fn	 prec	 rec	 f1
B-Units	 253	 7	 11	 0.97307694	 0.9583333	 0.96564883
B-Medical_Device	 382	 109	 74	 0.77800405	 0.8377193	 0.80675817
B-BodyPart	 2645	 347	 276	 0.8840241	 0.9055118	 0.89463896
I-BodyPart	 645	 142	 135	 0.819568	 0.8269231	 0.8232291
B-Imaging_Technique	 137	 36	 33	 0.7919075	 0.80588233	 0.79883385
B-Procedure	 260	 93	 130	 0.7365439	 0.6666667	 0.69986534
B-Direction	 1573	 136	 123	 0.9204213	 0.9274764	 0.92393535
I-ImagingTest	 30	 9	 32	 0.7692308	 0.48387095	 0.5940594
I-ManualFix	 0	 0	 2	 0.0	 0.0	 0.0
I-Test_Result	 2	 0	 0	 1.0	 1.0	 1.0
B-Measurements	 452	 24	 30	 0.94957983	 0.93775934	 0.9436326
B-OtherFindings	 9	 21	 55	 0.3	 0.140625	 0.19148935
B-ImagingFindings	 1929	 679	 542	 0.7396472	 0.7806556	 0.7595984
I-Units	 0	 0	 2	 0.0	 0.0	 0.0
B-Test_Result	 3	 7	 14	 0.3	 0.1764706	 0.22222224
B-Test	 146	 17	 49	 0.8957055	 0.74871796	 0.8156425
I-OtherFindings	 8	 6	 35	 0.5714286	 0.18604651	 0.28070176
B-ManualFix	 2	 0	 2	 1.0	 0.5	 0.6666667
I-Procedure	 147	 91	 106	 0.61764705	 0.5810277	 0.598778
I-Imaging_Technique	 75	 63	 26	 0.54347825	 0.7425743	 0.6276151
I-Measurements	 45	 3	 6	 0.9375	 0.88235295	 0.90909094
B-ImagingTest	 328	 36	 85	 0.9010989	 0.79418886	 0.84427285
I-Test	 26	 9	 34	 0.74285716	 0.43333334	 0.54736847
I-Symptom	 138	 62	 142	 0.69	 0.49285713	 0.575
I-ImagingFindings	 1348	 617	 662	 0.6860051	 0.6706468	 0.678239
B-Disease_Syndrome_Disorder	 1068	 298	 243	 0.7818448	 0.8146453	 0.79790807
B-Symptom	 523	 110	 190	 0.8262243	 0.7335203	 0.77711743
I-Disease_Syndrome_Disorder	 377	 168	 171	 0.69174314	 0.6879562	 0.6898445
I-Medical_Device	 369	 72	 62	 0.8367347	 0.8561485	 0.8463302
I-Direction	 352	 38	 41	 0.9025641	 0.8956743	 0.899106
tp: 13272 fp: 3200 fn: 3313 labels: 30
Macro-average	 prec: 0.7195612, rec: 0.64891946, f1: 0.68241704
Micro-average	 prec: 0.80573094, rec: 0.8002412, f1: 0.8029767