Detect Radiology Related Entities

Description

Pretrained named entity recognition deep learning model for radiology related texts and reports.

Predicted Entities

ImagingTest, Imaging_Technique, ImagingFindings, OtherFindings, BodyPart, Direction, Test, Symptom, Disease_Syndrome_Disorder, Medical_Device, Procedure, Measurements, Units

Live Demo Open in Colab Copy S3 URI

How to use

Use as part of an NLP pipeline with the following stages: DocumentAssembler, SentenceDetector, Tokenizer, WordEmbeddingsModel, NerDLModel. Add the NerConverter to the end of the pipeline to convert entity tokens into full entity chunks.

document_assembler = DocumentAssembler()\
	.setInputCol("text")\
	.setOutputCol("document")

sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models") \
	.setInputCols(["document"]) \
	.setOutputCol("sentence")

tokenizer = Tokenizer()\
	.setInputCols(["sentence"])\
	.setOutputCol("token")

word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
	.setInputCols(["sentence", "token"])\
	.setOutputCol("embeddings")

radiology_ner = NerDLModel.pretrained("ner_radiology", "en", "clinical/models") \
	.setInputCols(["sentence", "token", "embeddings"]) \
	.setOutputCol("ner")

ner_converter = NerConverter() \
	.setInputCols(["sentence", "token", "ner"]) \
	.setOutputCol("entities")

nlpPipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, word_embeddings, radiology_ner, ner_converter])

data = spark.createDataFrame([["Bilateral breast ultrasound was subsequently performed, which demonstrated an ovoid mass measuring approximately 0.5 x 0.5 x 0.4 cm in diameter located within the anteromedial aspect of the left shoulder. This mass demonstrates isoechoic echotexture to the adjacent muscle, with no evidence of internal color flow. This may represent benign fibrous tissue or a lipoma."]]).toDF("text")

results = nlpPipeline.fit(data).transform(data)
val document_assembler = new DocumentAssembler()
	.setInputCol("text")
	.setOutputCol("document")

val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models")
	.setInputCols("document")
	.setOutputCol("sentence")

val tokenizer = new Tokenizer()
	.setInputCols("sentence")
	.setOutputCol("token")

val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
	.setInputCols(Array("sentence", "token"))
	.setOutputCol("embeddings")

val radiology_ner = NerDLModel().pretrained("ner_radiology", "en", "clinical/models")
	.setInputCols(Array("sentence", "token", "embeddings"))
	.setOutputCol("ner")

val ner_converter = new NerConverter() 
	.setInputCols(Array("sentence", "token", "ner"))
	.setOutputCol("entities")

val nlpPipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, word_embeddings, radiology_ner, ner_converter))

val data = Seq("""Bilateral breast ultrasound was subsequently performed, which demonstrated an ovoid mass measuring approximately 0.5 x 0.5 x 0.4 cm in diameter located within the anteromedial aspect of the left shoulder. This mass demonstrates isoechoic echotexture to the adjacent muscle, with no evidence of internal color flow. This may represent benign fibrous tissue or a lipoma.""").toDS.toDF("text")

val result = nlpPipeline.fit(data).transform(data)
import nlu
nlu.load("en.med_ner.radiology").predict("""Bilateral breast ultrasound was subsequently performed, which demonstrated an ovoid mass measuring approximately 0.5 x 0.5 x 0.4 cm in diameter located within the anteromedial aspect of the left shoulder. This mass demonstrates isoechoic echotexture to the adjacent muscle, with no evidence of internal color flow. This may represent benign fibrous tissue or a lipoma.""")

Results

|    | chunks                | entities                  |
|----|-----------------------|---------------------------|
| 0  | Bilateral             | Direction                 |
| 1  | breast                | BodyPart                  |
| 2  | ultrasound            | ImagingTest               |
| 3  | ovoid mass            | ImagingFindings           |
| 4  | 0.5 x 0.5 x 0.4       | Measurements              |
| 5  | cm                    | Units                     |
| 6  | anteromedial aspect   | Direction                 |
| 7  | left                  | Direction                 |
| 8  | shoulder              | BodyPart                  |
| 9  | mass                  | ImagingFindings           |
| 10 | isoechoic echotexture | ImagingFindings           |
| 11 | muscle                | BodyPart                  |
| 12 | internal color flow   | ImagingFindings           |
| 13 | benign fibrous tissue | ImagingFindings           |
| 14 | lipoma                | Disease_Syndrome_Disorder |

Model Information

Model Name: ner_radiology
Type: ner
Compatibility: Spark NLP 2.7.0+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en
Dependencies: embeddings_clinical

Data Source

Trained on a custom dataset comprising of MIMIC-CXR and MT Radiology texts

Benchmarking

               label     tp    fp    fn  total precision recall     f1
       OtherFindings    8.0  15.0  63.0   71.0    0.3478 0.1127 0.1702
        Measurements  481.0  30.0  15.0  496.0    0.9413 0.9698 0.9553
           Direction  650.0 137.0  94.0  744.0    0.8259 0.8737 0.8491
     ImagingFindings 1345.0 355.0 324.0 1669.0    0.7912 0.8059 0.7985
            BodyPart 1942.0 335.0 290.0 2232.0    0.8529 0.8701 0.8614
      Medical_Device  236.0  75.0  64.0  300.0    0.7588 0.7867 0.7725
                Test  222.0  41.0  48.0  270.0    0.8441 0.8222  0.833
           Procedure  269.0 117.0 116.0  385.0    0.6969 0.6987 0.6978
         ImagingTest  263.0  50.0  43.0  306.0    0.8403 0.8595 0.8498
             Symptom  498.0 101.0 132.0  630.0    0.8314 0.7905 0.8104
Disease_Syndrome_... 1180.0 258.0 200.0 1380.0    0.8206 0.8551 0.8375
               Units  269.0  10.0   2.0  271.0    0.9642 0.9926 0.9782
   Imaging_Technique  140.0  38.0  25.0  165.0    0.7865 0.8485 0.8163
               macro     -     -     -      -      -      -     0.7524
               micro     -     -     -      -      -      -     0.8315