Description
Pretrained named entity recognition deep learning model for clinical terminology. The SparkNLP deep learning model (MedicalNerModel) is inspired by a former state of the art model for NER: Chiu & Nicols, Named Entity Recognition with Bidirectional LSTM-CNN.
Predicted Entities
Test_Result
, Relationship_Status
, RelativeDate
, Blood_Pressure
, Triglycerides
, Smoking
, Pregnancy
, Medical_History_Header
, LDL
, Hypertension
, Hyperlipidemia
, Frequency
, BMI
, Internal_organ_or_component
, Allergen
, Fetus_NewBorn
, Substance_Quantity
, Time
, Temperature
, Procedure
, Strength
, Treatment
, HDL
, Alcohol
, Birth_Entity
, Diet
, Weight
, Oxygen_Therapy
, Injury_or_Poisoning
, Section_Header
, Obesity
, EKG_Findings
, Gender
, Height
, Social_History_Header
, Diabetes
, Route
, Race_Ethnicity
, Substance
, Drug
, External_body_part_or_region
, RelativeTime
, Admission_Discharge
, Psychological_Condition
, Total_Cholesterol
, Labour_Delivery
, Imaging_Technique
, Date
, Form
, Overweight
, Cerebrovascular_Disease
, Vital_Signs_Header
, Oncological
, ImagingFindings
, Communicable_Disease
, Duration
, Vaccine
, Kidney_Disease
, O2_Saturation
, Heart_Disease
, Employment
, Sexually_Active_or_Sexual_Orientation
, Test
, Disease_Syndrome_Disorder
, Respiration
, Direction
, Medical_Device
, Clinical_Dept
, Modifier
, Symptom
, Pulse
, Age
, Death_Entity
, Dosage
, Family_History_Header
, VS_Finding
Live Demo Open in Colab Copy S3 URI
How to use
documentAssembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models") \
.setInputCols(["document"]) \
.setOutputCol("sentence")
tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
embeddings = BertEmbeddings.pretrained("biobert_pubmed_base_cased")\
.setInputCols(["sentence", "token"]) \
.setOutputCol("embeddings")
jsl_ner = MedicalNerModel.pretrained("jsl_ner_wip_greedy_biobert", "en", "clinical/models") \
.setInputCols(["sentence", "token", "embeddings"]) \
.setOutputCol("jsl_ner")
jsl_ner_converter = NerConverter() \
.setInputCols(["sentence", "token", "jsl_ner"]) \
.setOutputCol("ner_chunk")
jsl_ner_pipeline = Pipeline().setStages([
documentAssembler,
sentenceDetector,
tokenizer,
embeddings,
jsl_ner,
jsl_ner_converter])
jsl_ner_model = jsl_ner_pipeline.fit(spark.createDataFrame([[""]]).toDF("text"))
data = spark.createDataFrame([["""The patient is a 21-day-old Caucasian male here for 2 days of congestion - mom has been suctioning yellow discharge from the patient's nares, plus she has noticed some mild problems with his breathing while feeding (but negative for any perioral cyanosis or retractions). One day ago, mom also noticed a tactile temperature and gave the patient Tylenol. Baby also has had some decreased p.o. intake. His normal breast-feeding is down from 20 minutes q.2h. to 5 to 10 minutes secondary to his respiratory congestion. He sleeps well, but has been more tired and has been fussy over the past 2 days. The parents noticed no improvement with albuterol treatments given in the ER. His urine output has also decreased; normally he has 8 to 10 wet and 5 dirty diapers per 24 hours, now he has down to 4 wet diapers per 24 hours. Mom denies any diarrhea. His bowel movements are yellow colored and soft in nature."""]]).toDF("text")
result = jsl_ner_model.transform(data)
val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models")
.setInputCols("document")
.setOutputCol("sentence")
val tokenizer = new Tokenizer()
.setInputCols("sentence")
.setOutputCol("token")
val embeddings = BertEmbeddings.pretrained("biobert_pubmed_base_cased")
.setInputCols(Array("sentence", "token"))
.setOutputCol("embeddings")
val jsl_ner = MedicalNerModel.pretrained("jsl_ner_wip_greedy_biobert", "en", "clinical/models")
.setInputCols(Array("sentence", "token", "embeddings"))
.setOutputCol("jsl_ner")
val jsl_ner_converter = new NerConverter()
.setInputCols(Array("sentence", "token", "jsl_ner"))
.setOutputCol("ner_chunk")
val jsl_ner_pipeline = new Pipeline().setStages(Array(
documentAssembler,
sentenceDetector,
tokenizer,
embeddings,
jsl_ner,
jsl_ner_converter))
val data = Seq("""The patient is a 21-day-old Caucasian male here for 2 days of congestion - mom has been suctioning yellow discharge from the patient's nares, plus she has noticed some mild problems with his breathing while feeding (but negative for any perioral cyanosis or retractions). One day ago, mom also noticed a tactile temperature and gave the patient Tylenol. Baby also has had some decreased p.o. intake. His normal breast-feeding is down from 20 minutes q.2h. to 5 to 10 minutes secondary to his respiratory congestion. He sleeps well, but has been more tired and has been fussy over the past 2 days. The parents noticed no improvement with albuterol treatments given in the ER. His urine output has also decreased; normally he has 8 to 10 wet and 5 dirty diapers per 24 hours, now he has down to 4 wet diapers per 24 hours. Mom denies any diarrhea. His bowel movements are yellow colored and soft in nature.""").toDS.toDF("text")
val result = jsl_ner_pipeline.fit(data).transform(data)
Results
| | chunk | entity |
|---:|:-----------------------------------------------|:-----------------------------|
| 0 | 21-day-old | Age |
| 1 | Caucasian | Race_Ethnicity |
| 2 | male | Gender |
| 3 | for 2 days | Duration |
| 4 | congestion | Symptom |
| 5 | mom | Gender |
| 6 | suctioning yellow discharge | Symptom |
| 7 | nares | External_body_part_or_region |
| 8 | she | Gender |
| 9 | mild problems with his breathing while feeding | Symptom |
| 10 | perioral cyanosis | Symptom |
| 11 | retractions | Symptom |
| 12 | One day ago | RelativeDate |
| 13 | mom | Gender |
| 14 | tactile temperature | Symptom |
| 15 | Tylenol | Drug |
| 16 | Baby | Age |
| 17 | decreased p.o. intake | Symptom |
| 18 | His | Gender |
| 19 | breast-feeding | External_body_part_or_region |
| 20 | q.2h | Frequency |
| 21 | to 5 to 10 minutes | Duration |
| 22 | his | Gender |
| 23 | respiratory congestion | Symptom |
| 24 | He | Gender |
| 25 | tired | Symptom |
| 26 | fussy | Symptom |
| 27 | over the past 2 days | RelativeDate |
| 28 | albuterol | Drug |
| 29 | ER | Clinical_Dept |
| 30 | His | Gender |
| 31 | urine output has also decreased | Symptom |
| 32 | he | Gender |
| 33 | per 24 hours | Frequency |
| 34 | he | Gender |
| 35 | per 24 hours | Frequency |
| 36 | Mom | Gender |
| 37 | diarrhea | Symptom |
| 38 | His | Gender |
| 39 | bowel | Internal_organ_or_component |
Model Information
Model Name: | jsl_ner_wip_greedy_biobert |
Compatibility: | Healthcare NLP 3.1.3+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Data Source
Trained on data gathered and manually annotated by John Snow Labs. https://www.johnsnowlabs.com/data/
Benchmarking
label tp fp fn prec rec f1
B-Oxygen_Therapy 47 11 10 0.8103448 0.8245614 0.81739134
B-Cerebrovascular_Disease 43 20 21 0.6825397 0.671875 0.6771653
B-Triglycerides 5 0 0 1.0 1.0 1.0
I-Cerebrovascular_Disease 25 12 27 0.6756757 0.48076922 0.56179774
B-Medical_Device 2704 531 364 0.8358578 0.88135594 0.85800415
B-Labour_Delivery 43 16 29 0.7288136 0.5972222 0.6564886
I-Vaccine 5 0 5 1.0 0.5 0.6666667
I-Obesity 6 4 1 0.6 0.85714287 0.70588243
I-Smoking 3 1 2 0.75 0.6 0.6666667
B-RelativeTime 67 36 51 0.65048546 0.5677966 0.60633487
B-Imaging_Technique 33 12 19 0.73333335 0.63461536 0.68041235
B-Heart_Disease 285 55 68 0.8382353 0.8073654 0.82251084
B-Procedure 1876 303 384 0.8609454 0.8300885 0.84523547
I-RelativeTime 105 43 53 0.7094595 0.664557 0.6862745
B-Drug 1803 299 265 0.8577545 0.87185687 0.8647482
B-Obesity 29 9 5 0.7631579 0.85294116 0.8055555
I-RelativeDate 617 167 107 0.7869898 0.8522099 0.8183024
B-O2_Saturation 27 8 6 0.7714286 0.8181818 0.7941177
B-Direction 2856 390 326 0.8798521 0.89754874 0.88861233
I-Alcohol 4 4 4 0.5 0.5 0.5
I-Oxygen_Therapy 25 7 6 0.78125 0.8064516 0.79365087
B-Diet 23 14 32 0.6216216 0.4181818 0.5
B-Dosage 35 26 29 0.57377046 0.546875 0.55999994
B-Injury_or_Poisoning 308 52 83 0.85555553 0.7877238 0.82023966
B-Hypertension 80 9 2 0.8988764 0.9756098 0.9356726
I-Test_Result 124 73 156 0.6294416 0.44285715 0.5199161
B-Alcohol 54 11 12 0.830 0.8181818 0.8244275
B-Height 14 5 5 0.7368421 0.7368421 0.7368421
I-Substance 18 8 8 0.6923077 0.6923077 0.6923077
B-RelativeDate 372 109 93 0.7733888 0.8 0.78646934
B-Admission_Discharge 218 22 14 0.90833336 0.9396552 0.9237288
B-Date 345 24 26 0.93495935 0.9299191 0.9324324
B-Kidney_Disease 63 10 20 0.8630137 0.7590361 0.8076923
I-Strength 22 17 13 0.5641026 0.62857145 0.59459466
I-Injury_or_Poisoning 301 93 98 0.7639594 0.75438595 0.75914246
I-Time 28 11 17 0.71794873 0.62222224 0.6666667
B-Substance 48 11 10 0.8135593 0.82758623 0.8205129
B-Total_Cholesterol 6 3 0 0.6666667 1.0 0.8
I-Vital_Signs_Header 276 28 8 0.90789473 0.97183096 0.93877554
I-Internal_organ_or_component 2907 518 490 0.8487591 0.8557551 0.8522427
B-Hyperlipidemia 28 3 0 0.9032258 1.0 0.9491525
B-Overweight 3 0 3 1.0 0.5 0.6666667
I-Sexually_Active_or_Sexual_Orientation 2 0 3 1.0 0.4 0.5714286
B-Sexually_Active_or_Sexual_Orientation 2 0 2 1.0 0.5 0.6666667
I-Fetus_NewBorn 50 38 58 0.5681818 0.46296296 0.5102041
B-BMI 6 0 1 1.0 0.85714287 0.9230769
B-ImagingFindings 52 41 61 0.5591398 0.460177 0.5048544
B-Test_Result 714 135 212 0.8409894 0.7710583 0.8045071
B-Section_Header 2140 79 65 0.9643984 0.97052157 0.96745026
I-Treatment 85 21 29 0.8018868 0.74561405 0.7727273
B-Clinical_Dept 638 82 77 0.88611114 0.8923077 0.88919866
I-Kidney_Disease 114 7 18 0.94214875 0.8636364 0.90118575
I-Pulse 189 27 42 0.875 0.8181818 0.84563756
B-Test 1589 320 315 0.83237296 0.83455884 0.83346444
B-Weight 54 12 13 0.8181818 0.80597013 0.81203
I-Respiration 114 4 17 0.9661017 0.870229 0.91566265
I-EKG_Findings 68 34 52 0.6666667 0.56666666 0.6126126
I-Section_Header 3828 168 77 0.957958 0.9802817 0.9689913
B-Strength 27 13 23 0.675 0.54 0.6
I-Social_History_Header 137 4 4 0.9716312 0.9716312 0.9716312
B-Vital_Signs_Header 183 18 7 0.9104478 0.9631579 0.9360614
B-Death_Entity 28 9 6 0.7567568 0.8235294 0.7887324
B-Modifier 302 90 282 0.77040815 0.5171233 0.6188525
B-Blood_Pressure 93 14 21 0.86915886 0.81578946 0.84162897
I-O2_Saturation 49 19 23 0.7205882 0.6805556 0.7
B-Frequency 437 77 68 0.8501946 0.86534655 0.8577036
I-Triglycerides 5 0 0 1.0 1.0 1.0
I-Duration 513 254 47 0.66883963 0.9160714 0.77317256
I-Diabetes 50 4 6 0.9259259 0.89285713 0.90909094
B-Race_Ethnicity 78 3 2 0.962963 0.975 0.9689441
I-Gender 114 2 17 0.98275864 0.870229 0.9230769
I-Height 43 13 10 0.76785713 0.8113208 0.78899086
B-Communicable_Disease 10 5 9 0.6666667 0.5263158 0.5882354
I-Family_History_Header 134 1 0 0.9925926 1.0 0.9962825
B-LDL 2 2 2 0.5 0.5 0.5
I-Race_Ethnicity 6 0 0 1.0 1.0 1.0
B-Psychological_Condition 103 21 17 0.83064514 0.85833335 0.84426236
I-Age 116 14 50 0.8923077 0.6987952 0.78378385
B-EKG_Findings 33 18 32 0.64705884 0.50769234 0.56896555
B-Employment 168 29 44 0.8527919 0.7924528 0.8215159
I-Oncological 358 38 17 0.9040404 0.9546667 0.9286641
B-Time 27 7 18 0.7941176 0.6 0.68354434
B-Treatment 93 31 41 0.75 0.69402987 0.7209303
B-Temperature 69 5 8 0.9324324 0.8961039 0.9139073
I-Procedure 2437 379 501 0.86541194 0.8294758 0.84706295
B-Relationship_Status 30 3 1 0.90909094 0.9677419 0.9375
B-Pregnancy 56 17 30 0.7671233 0.6511628 0.7044025
I-Route 8 4 7 0.6666667 0.53333336 0.59259266
I-Medical_History_Header 151 4 15 0.9741936 0.9096386 0.94080997
I-Imaging_Technique 25 5 20 0.8333333 0.5555556 0.66666675
B-Smoking 74 6 4 0.925 0.94871795 0.93670887
I-Labour_Delivery 36 8 18 0.8181818 0.6666667 0.7346939
I-Death_Entity 3 0 2 1.0 0.6 0.75
B-Diabetes 77 9 5 0.89534885 0.9390244 0.9166666
B-Gender 4479 82 111 0.9820215 0.97581697 0.9789094
B-Vaccine 6 1 9 0.85714287 0.4 0.54545456
I-Heart_Disease 393 61 89 0.8656388 0.8153527 0.8397436
I-Dosage 31 27 22 0.5344828 0.5849057 0.5585586
B-Social_History_Header 78 2 3 0.975 0.962963 0.9689441
B-External_body_part_or_region 1640 402 311 0.8031342 0.8405946 0.8214376
I-Clinical_Dept 546 59 47 0.90247935 0.920742 0.91151917
I-Test 1195 320 402 0.7887789 0.748278 0.7679949
I-Frequency 340 97 120 0.77803206 0.73913044 0.75808245
B-Age 454 35 57 0.9284254 0.888454 0.908
B-Pulse 90 11 17 0.8910891 0.8411215 0.8653846
I-Symptom 4265 2050 1232 0.6753761 0.7758778 0.72214705
I-Pregnancy 39 28 42 0.58208954 0.4814815 0.527027
I-LDL 5 0 4 1.0 0.5555556 0.71428573
I-Diet 33 14 25 0.70212764 0.5689655 0.6285714
I-Blood_Pressure 198 54 27 0.78571427 0.88 0.83018863
I-ImagingFindings 136 99 85 0.57872343 0.61538464 0.5964913
I-Date 203 13 10 0.9398148 0.9530516 0.946387
B-Route 84 23 47 0.78504676 0.64122134 0.7058824
B-Duration 204 110 26 0.6496815 0.8869565 0.74999994
B-Medical_History_Header 56 1 7 0.98245615 0.8888889 0.93333334
B-Respiration 55 4 6 0.9322034 0.90163934 0.9166667
I-External_body_part_or_region 314 105 167 0.74940336 0.65280664 0.6977778
I-BMI 15 0 1 1.0 0.9375 0.9677419
B-Internal_organ_or_component 4349 886 761 0.8307545 0.8510763 0.8407926
I-Weight 150 22 23 0.872093 0.867052 0.8695652
B-Disease_Syndrome_Disorder 1698 375 358 0.81910276 0.82587546 0.8224752
B-Symptom 4358 1002 932 0.8130597 0.8238185 0.8184037
B-VS_Finding 138 36 37 0.79310346 0.7885714 0.79083097
I-Disease_Syndrome_Disorder 1723 372 451 0.82243437 0.7925483 0.8072148
I-Drug 3282 838 493 0.79660195 0.86940396 0.8314123
I-Medical_Device 1864 418 242 0.81682736 0.88509023 0.84958977
B-Oncological 278 22 22 0.9266667 0.9266667 0.9266667
I-Temperature 111 8 6 0.9327731 0.94871795 0.94067794
I-Employment 92 27 19 0.77310926 0.8288288 0.8
I-Psychological_Condition 32 7 19 0.82051283 0.627451 0.7111111
B-Family_History_Header 68 0 0 1.0 1.0 1.0
I-Direction 311 91 144 0.7736318 0.6835165 0.72578764
Macro-average 65035 12855 11898 0.761429 0.70630085 0.7328297
Micro-average 65035 12855 11898 0.83495957 0.845346 0.8401207