Detect Radiology Concepts (WIP)

Description

Extract clinical entities from Radiology reports using pretrained NER model.

Predicted Entities

Kidney_Disease, HDL, Diet, Test, Imaging_Technique, Triglycerides, Obesity, Duration, Weight, Social_History_Header, ImagingTest, Labour_Delivery, Disease_Syndrome_Disorder, Communicable_Disease, Overweight, Units, Smoking, Score, Substance_Quantity, Form, Race_Ethnicity, Modifier, Hyperlipidemia, ImagingFindings, Psychological_Condition, OtherFindings, Cerebrovascular_Disease, Date, Test_Result, VS_Finding, Employment, Death_Entity, Gender, Oncological, Heart_Disease, Medical_Device, Total_Cholesterol, ManualFix, Time, Route, Pulse, Admission_Discharge, RelativeDate, O2_Saturation, Frequency, RelativeTime, Hypertension, Alcohol, Allergen, Fetus_NewBorn, Birth_Entity, Age, Respiration, Medical_History_Header, Oxygen_Therapy, Section_Header, LDL, Treatment, Vital_Signs_Header, Direction, BMI, Pregnancy, Sexually_Active_or_Sexual_Orientation, Symptom, Clinical_Dept, Measurements, Height, Family_History_Header, Substance, Strength, Injury_or_Poisoning, Relationship_Status, Blood_Pressure, Drug, Temperature, EKG_Findings, Diabetes, BodyPart, Vaccine, Procedure, Dosage

Open in Colab Download

How to use


...
embeddings_clinical = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")  .setInputCols(["sentence", "token"])  .setOutputCol("embeddings")
clinical_ner = MedicalNerModel.pretrained("jsl_rd_ner_wip_greedy_clinical", "en", "clinical/models")   .setInputCols(["sentence", "token", "embeddings"])   .setOutputCol("ner")
...
nlpPipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, embeddings_clinical, clinical_ner, ner_converter])
model = nlpPipeline.fit(spark.createDataFrame([[""]]).toDF("text"))
results = model.transform(spark.createDataFrame([["EXAMPLE_TEXT"]]).toDF("text"))

...
val embeddings_clinical = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
  .setInputCols(Array("sentence", "token"))
  .setOutputCol("embeddings")
val ner = MedicalNerModel.pretrained("jsl_rd_ner_wip_greedy_clinical", "en", "clinical/models")
  .setInputCols(Array("sentence", "token", "embeddings"))
  .setOutputCol("ner")
...
val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, embeddings_clinical, ner, ner_converter))
val result = pipeline.fit(Seq.empty[String]).transform(data)

Model Information

Model Name: jsl_rd_ner_wip_greedy_clinical
Compatibility: Spark NLP for Healthcare 3.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en

Benchmarking

+-------------------------------------+-------+------+------+-------+---------+------+------+
|                               entity|     tp|    fp|    fn|  total|precision|recall|    f1|
+-------------------------------------+-------+------+------+-------+---------+------+------+
|                           VS_Finding|  306.0| 129.0| 119.0|  425.0|   0.7034|  0.72|0.7116|
|                            Direction| 8717.0| 678.0| 616.0| 9333.0|   0.9278| 0.934|0.9309|
|                          Respiration|  224.0|  28.0|  18.0|  242.0|   0.8889|0.9256|0.9069|
|              Cerebrovascular_Disease|  149.0|  57.0|  64.0|  213.0|   0.7233|0.6995|0.7112|
|                Family_History_Header|  315.0|   1.0|   3.0|  318.0|   0.9968|0.9906|0.9937|
|                        Heart_Disease| 1087.0| 198.0| 141.0| 1228.0|   0.8459|0.8852|0.8651|
|                      ImagingFindings| 5568.0|1112.0|1627.0| 7195.0|   0.8335|0.7739|0.8026|
|                         RelativeTime|  422.0| 138.0| 100.0|  522.0|   0.7536|0.8084|  0.78|
|                             Strength|   96.0|  51.0|  54.0|  150.0|   0.6531|  0.64|0.6465|
|                             BodyPart|20155.0|1698.0|1860.0|22015.0|   0.9223|0.9155|0.9189|
|                              Smoking|  151.0|  16.0|   5.0|  156.0|   0.9042|0.9679| 0.935|
|                       Medical_Device| 8162.0| 885.0| 821.0| 8983.0|   0.9022|0.9086|0.9054|
|                         EKG_Findings|  131.0|  37.0|  83.0|  214.0|   0.7798|0.6121|0.6859|
|                                Pulse|  382.0|  44.0|  50.0|  432.0|   0.8967|0.8843|0.8904|
|              Psychological_Condition|  195.0|  32.0|  43.0|  238.0|    0.859|0.8193|0.8387|
|                        Triglycerides|   18.0|   0.0|   0.0|   18.0|      1.0|   1.0|   1.0|
|                           Overweight|    6.0|   2.0|   1.0|    7.0|     0.75|0.8571|   0.8|
|                              Obesity|   68.0|   3.0|   5.0|   73.0|   0.9577|0.9315|0.9444|
|                  Admission_Discharge|  376.0|  26.0|  24.0|  400.0|   0.9353|  0.94|0.9377|
|                                  HDL|   11.0|   0.0|   5.0|   16.0|      1.0|0.6875|0.8148|
|                             Diabetes|  227.0|   9.0|  12.0|  239.0|   0.9619|0.9498|0.9558|
|                       Section_Header|13630.0| 476.0| 413.0|14043.0|   0.9663|0.9706|0.9684|
|                                  Age| 1174.0| 129.0|  94.0| 1268.0|    0.901|0.9259|0.9133|
|                        O2_Saturation|  122.0|  34.0|  29.0|  151.0|   0.7821|0.8079|0.7948|
|                                 Drug| 9391.0|1505.0| 928.0|10319.0|   0.8619|0.9101|0.8853|
|                       Kidney_Disease|  296.0|  28.0|  53.0|  349.0|   0.9136|0.8481|0.8796|
|                                 Test| 3980.0| 721.0| 925.0| 4905.0|   0.8466|0.8114|0.8286|
|                 Communicable_Disease|   40.0|  18.0|  12.0|   52.0|   0.6897|0.7692|0.7273|
|                         Hypertension|  163.0|  16.0|  10.0|  173.0|   0.9106|0.9422|0.9261|
|                       Oxygen_Therapy|  123.0|  36.0|  27.0|  150.0|   0.7736|  0.82|0.7961|
|                          Test_Result| 1607.0| 374.0| 458.0| 2065.0|   0.8112|0.7782|0.7944|
|                             Modifier| 1229.0| 435.0| 593.0| 1822.0|   0.7386|0.6745|0.7051|
|                                  BMI|   21.0|   4.0|   7.0|   28.0|     0.84|  0.75|0.7925|
|                      Labour_Delivery|  117.0|  38.0|  62.0|  179.0|   0.7548|0.6536|0.7006|
|                           Employment|  414.0|  65.0|  93.0|  507.0|   0.8643|0.8166|0.8398|
|                        Fetus_NewBorn|  118.0|  68.0|  87.0|  205.0|   0.6344|0.5756|0.6036|
|                        Clinical_Dept| 1937.0| 189.0| 133.0| 2070.0|   0.9111|0.9357|0.9233|
|                                 Time|  637.0|  43.0|  27.0|  664.0|   0.9368|0.9593|0.9479|
|                            Procedure| 7578.0| 953.0|1088.0| 8666.0|   0.8883|0.8745|0.8813|
|                          ImagingTest| 1712.0| 213.0| 281.0| 1993.0|   0.8894| 0.859|0.8739|
|                                 Diet|   79.0|  44.0|  82.0|  161.0|   0.6423|0.4907|0.5563|
|                          Oncological| 1088.0| 188.0| 103.0| 1191.0|   0.8527|0.9135| 0.882|
|                                  LDL|   20.0|   7.0|   1.0|   21.0|   0.7407|0.9524|0.8333|
|                              Symptom|15940.0|3662.0|3035.0|18975.0|   0.8132|0.8401|0.8264|
|                          Temperature|  240.0|  28.0|  25.0|  265.0|   0.8955|0.9057|0.9006|
|                   Vital_Signs_Header|  850.0|  34.0|  52.0|  902.0|   0.9615|0.9424|0.9518|
|                    Total_Cholesterol|   43.0|   6.0|   7.0|   50.0|   0.8776|  0.86|0.8687|
|                  Relationship_Status|   51.0|   3.0|   9.0|   60.0|   0.9444|  0.85|0.8947|
|                       Blood_Pressure|  353.0|  18.0| 117.0|  470.0|   0.9515|0.7511|0.8395|
|                  Injury_or_Poisoning| 1003.0| 311.0| 241.0| 1244.0|   0.7633|0.8063|0.7842|
|                            Treatment|  335.0|  98.0|  91.0|  426.0|   0.7737|0.7864|  0.78|
|                            Pregnancy|  214.0|  99.0|  86.0|  300.0|   0.6837|0.7133|0.6982|
|                              Vaccine|   29.0|   3.0|  10.0|   39.0|   0.9063|0.7436|0.8169|
|                               Height|  105.0|  10.0|  45.0|  150.0|    0.913|   0.7|0.7925|
|            Disease_Syndrome_Disorder| 8466.0|1568.0|1533.0| 9999.0|   0.8437|0.8467|0.8452|
|                            Frequency| 1263.0| 237.0| 173.0| 1436.0|    0.842|0.8795|0.8604|
|                                Route|  219.0|  35.0| 144.0|  363.0|   0.8622|0.6033|0.7099|
|                             Duration|  978.0| 199.0| 338.0| 1316.0|   0.8309|0.7432|0.7846|
|                         Death_Entity|   35.0|  17.0|  16.0|   51.0|   0.6731|0.6863|0.6796|
|                              Alcohol|  102.0|  24.0|  21.0|  123.0|   0.8095|0.8293|0.8193|
|                                 Date|  840.0|  43.0|  13.0|  853.0|   0.9513|0.9848|0.9677|
|                       Hyperlipidemia|   44.0|   4.0|   1.0|   45.0|   0.9167|0.9778|0.9462|
|                Social_History_Header|  284.0|   6.0|  27.0|  311.0|   0.9793|0.9132|0.9451|
|                            ManualFix|   50.0|   2.0|   7.0|   57.0|   0.9615|0.8772|0.9174|
|                    Imaging_Technique|  845.0| 240.0|  98.0|  943.0|   0.7788|0.8961|0.8333|
|                       Race_Ethnicity|  141.0|   0.0|   5.0|  146.0|      1.0|0.9658|0.9826|
|                         RelativeDate| 1691.0| 394.0| 194.0| 1885.0|    0.811|0.8971|0.8519|
|                               Gender| 6800.0| 105.0| 130.0| 6930.0|   0.9848|0.9812| 0.983|
|                               Dosage|  122.0|  67.0|  81.0|  203.0|   0.6455| 0.601|0.6224|
|               Medical_History_Header|  486.0|  10.0|  19.0|  505.0|   0.9798|0.9624| 0.971|
|Sexually_Active_or_Sexual_Orientation|   12.0|   0.0|   5.0|   17.0|      1.0|0.7059|0.8276|
|                            Substance|  102.0|  11.0|  22.0|  124.0|   0.9027|0.8226|0.8608|
|                               Weight|  346.0|  26.0|  65.0|  411.0|   0.9301|0.8418|0.8838|
+-------------------------------------+-------+------+------+-------+---------+------+------+

+------------------+
|             macro|
+------------------+
|0.8037673358587357|
+------------------+

+------------------+
|             micro|
+------------------+
|0.8792985493206746|
+------------------+