Detect Radiology Concepts (WIP)

Description

Extract clinical entities from Radiology reports using pretrained NER model.

Predicted Entities

Kidney_Disease, HDL, Diet, Test, Imaging_Technique, Triglycerides, Obesity, Duration, Weight, Social_History_Header, ImagingTest, Labour_Delivery, Disease_Syndrome_Disorder, Communicable_Disease, Overweight, Units, Smoking, Score, Substance_Quantity, Form, Race_Ethnicity, Modifier, Hyperlipidemia, ImagingFindings, Psychological_Condition, OtherFindings, Cerebrovascular_Disease, Date, Test_Result, VS_Finding, Employment, Death_Entity, Gender, Oncological, Heart_Disease, Medical_Device, Total_Cholesterol, ManualFix, Time, Route, Pulse, Admission_Discharge, RelativeDate, O2_Saturation, Frequency, RelativeTime, Hypertension, Alcohol, Allergen, Fetus_NewBorn, Birth_Entity, Age, Respiration, Medical_History_Header, Oxygen_Therapy, Section_Header, LDL, Treatment, Vital_Signs_Header, Direction, BMI, Pregnancy, Sexually_Active_or_Sexual_Orientation, Symptom, Clinical_Dept, Measurements, Height, Family_History_Header, Substance, Strength, Injury_or_Poisoning, Relationship_Status, Blood_Pressure, Drug, Temperature, EKG_Findings, Diabetes, BodyPart, Vaccine, Procedure, Dosage

Open in Colab Copy S3 URI

How to use

document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")
         
sentence_detector = SentenceDetector()\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

embeddings_clinical = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols(["sentence", "token"])\
    .setOutputCol("embeddings")

clinical_ner = MedicalNerModel.pretrained("jsl_rd_ner_wip_greedy_clinical", "en", "clinical/models")\
    .setInputCols(["sentence", "token", "embeddings"])\
    .setOutputCol("ner")

ner_converter = NerConverter()\
 	.setInputCols(["sentence", "token", "ner"])\
 	.setOutputCol("ner_chunk")

nlpPipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, embeddings_clinical, clinical_ner, ner_converter])

model = nlpPipeline.fit(spark.createDataFrame([[""]]).toDF("text"))

results = model.transform(spark.createDataFrame([["EXAMPLE_TEXT"]]).toDF("text"))
val document_assembler = new DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("document")
         
val sentence_detector = new SentenceDetector()
    .setInputCols("document")
    .setOutputCol("sentence")

val tokenizer = new Tokenizer()
    .setInputCols("sentence")
    .setOutputCol("token")

val embeddings_clinical = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
    .setInputCols(Array("sentence", "token"))
    .setOutputCol("embeddings")

val ner = MedicalNerModel.pretrained("jsl_rd_ner_wip_greedy_clinical", "en", "clinical/models")
    .setInputCols(Array("sentence", "token", "embeddings"))
    .setOutputCol("ner")

val ner_converter = new NerConverter()
 	.setInputCols(Array("sentence", "token", "ner"))
 	.setOutputCol("ner_chunk")

val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, embeddings_clinical, ner, ner_converter))

val result = pipeline.fit(data).transform(data)
import nlu
nlu.load("en.med_ner.jsl.wip.clinical.rd").predict("""Put your text here.""")

Model Information

Model Name: jsl_rd_ner_wip_greedy_clinical
Compatibility: Healthcare NLP 3.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en

Benchmarking

entity       tp      fp      fn    total  precision  recall      f1
VS_Finding    306.0   129.0   119.0    425.0     0.7034    0.72  0.7116
Direction   8717.0   678.0   616.0   9333.0     0.9278   0.934  0.9309
Respiration    224.0    28.0    18.0    242.0     0.8889  0.9256  0.9069
Cerebrovascular_Disease    149.0    57.0    64.0    213.0     0.7233  0.6995  0.7112
Family_History_Header    315.0     1.0     3.0    318.0     0.9968  0.9906  0.9937
Heart_Disease   1087.0   198.0   141.0   1228.0     0.8459  0.8852  0.8651
ImagingFindings   5568.0  1112.0  1627.0   7195.0     0.8335  0.7739  0.8026
RelativeTime    422.0   138.0   100.0    522.0     0.7536  0.8084    0.78
Strength     96.0    51.0    54.0    150.0     0.6531    0.64  0.6465
BodyPart  20155.0  1698.0  1860.0  22015.0     0.9223  0.9155  0.9189
Smoking    151.0    16.0     5.0    156.0     0.9042  0.9679   0.935
Medical_Device   8162.0   885.0   821.0   8983.0     0.9022  0.9086  0.9054
EKG_Findings    131.0    37.0    83.0    214.0     0.7798  0.6121  0.6859
Pulse    382.0    44.0    50.0    432.0     0.8967  0.8843  0.8904
Psychological_Condition    195.0    32.0    43.0    238.0      0.859  0.8193  0.8387
Triglycerides     18.0     0.0     0.0     18.0        1.0     1.0     1.0
Overweight      6.0     2.0     1.0      7.0       0.75  0.8571     0.8
Obesity     68.0     3.0     5.0     73.0     0.9577  0.9315  0.9444
Admission_Discharge    376.0    26.0    24.0    400.0     0.9353    0.94  0.9377
HDL     11.0     0.0     5.0     16.0        1.0  0.6875  0.8148
Diabetes    227.0     9.0    12.0    239.0     0.9619  0.9498  0.9558
Section_Header  13630.0   476.0   413.0  14043.0     0.9663  0.9706  0.9684
Age   1174.0   129.0    94.0   1268.0      0.901  0.9259  0.9133
O2_Saturation    122.0    34.0    29.0    151.0     0.7821  0.8079  0.7948
Drug   9391.0  1505.0   928.0  10319.0     0.8619  0.9101  0.8853
Kidney_Disease    296.0    28.0    53.0    349.0     0.9136  0.8481  0.8796
Test   3980.0   721.0   925.0   4905.0     0.8466  0.8114  0.8286
Communicable_Disease     40.0    18.0    12.0     52.0     0.6897  0.7692  0.7273
Hypertension    163.0    16.0    10.0    173.0     0.9106  0.9422  0.9261
Oxygen_Therapy    123.0    36.0    27.0    150.0     0.7736    0.82  0.7961
Test_Result   1607.0   374.0   458.0   2065.0     0.8112  0.7782  0.7944
Modifier   1229.0   435.0   593.0   1822.0     0.7386  0.6745  0.7051
BMI     21.0     4.0     7.0     28.0       0.84    0.75  0.7925
Labour_Delivery    117.0    38.0    62.0    179.0     0.7548  0.6536  0.7006
Employment    414.0    65.0    93.0    507.0     0.8643  0.8166  0.8398
Fetus_NewBorn    118.0    68.0    87.0    205.0     0.6344  0.5756  0.6036
Clinical_Dept   1937.0   189.0   133.0   2070.0     0.9111  0.9357  0.9233
Time    637.0    43.0    27.0    664.0     0.9368  0.9593  0.9479
Procedure   7578.0   953.0  1088.0   8666.0     0.8883  0.8745  0.8813
ImagingTest   1712.0   213.0   281.0   1993.0     0.8894   0.859  0.8739
Diet     79.0    44.0    82.0    161.0     0.6423  0.4907  0.5563
Oncological   1088.0   188.0   103.0   1191.0     0.8527  0.9135   0.882
LDL     20.0     7.0     1.0     21.0     0.7407  0.9524  0.8333
Symptom  15940.0  3662.0  3035.0  18975.0     0.8132  0.8401  0.8264
Temperature    240.0    28.0    25.0    265.0     0.8955  0.9057  0.9006
Vital_Signs_Header    850.0    34.0    52.0    902.0     0.9615  0.9424  0.9518
Total_Cholesterol     43.0     6.0     7.0     50.0     0.8776    0.86  0.8687
Relationship_Status     51.0     3.0     9.0     60.0     0.9444    0.85  0.8947
Blood_Pressure    353.0    18.0   117.0    470.0     0.9515  0.7511  0.8395
Injury_or_Poisoning   1003.0   311.0   241.0   1244.0     0.7633  0.8063  0.7842
Treatment    335.0    98.0    91.0    426.0     0.7737  0.7864    0.78
Pregnancy    214.0    99.0    86.0    300.0     0.6837  0.7133  0.6982
Vaccine     29.0     3.0    10.0     39.0     0.9063  0.7436  0.8169
Height    105.0    10.0    45.0    150.0      0.913     0.7  0.7925
Disease_Syndrome_Disorder   8466.0  1568.0  1533.0   9999.0     0.8437  0.8467  0.8452
Frequency   1263.0   237.0   173.0   1436.0      0.842  0.8795  0.8604
Route    219.0    35.0   144.0    363.0     0.8622  0.6033  0.7099
Duration    978.0   199.0   338.0   1316.0     0.8309  0.7432  0.7846
Death_Entity     35.0    17.0    16.0     51.0     0.6731  0.6863  0.6796
Alcohol    102.0    24.0    21.0    123.0     0.8095  0.8293  0.8193
Date    840.0    43.0    13.0    853.0     0.9513  0.9848  0.9677
Hyperlipidemia     44.0     4.0     1.0     45.0     0.9167  0.9778  0.9462
Social_History_Header    284.0     6.0    27.0    311.0     0.9793  0.9132  0.9451
ManualFix     50.0     2.0     7.0     57.0     0.9615  0.8772  0.9174
Imaging_Technique    845.0   240.0    98.0    943.0     0.7788  0.8961  0.8333
Race_Ethnicity    141.0     0.0     5.0    146.0        1.0  0.9658  0.9826
RelativeDate   1691.0   394.0   194.0   1885.0      0.811  0.8971  0.8519
Gender   6800.0   105.0   130.0   6930.0     0.9848  0.9812   0.983
Dosage    122.0    67.0    81.0    203.0     0.6455   0.601  0.6224
Medical_History_Header    486.0    10.0    19.0    505.0     0.9798  0.9624   0.971
Sexually_Active_or_Sexual_Orientation     12.0     0.0     5.0     17.0        1.0  0.7059  0.8276
Substance    102.0    11.0    22.0    124.0     0.9027  0.8226  0.8608
Weight    346.0    26.0    65.0    411.0     0.9301  0.8418  0.8838
macro      -       -       -        -         -       -     0.8038
micro      -       -       -        -         -       -     0.8793