Description
Pretrained named entity recognition deep learning model for clinical terminology. This NER model is trained with the embeddings_clinical word embeddings model, so be sure to use the same embeddings in the pipeline.. This model is augmented of ner_jsl
.
Predicted Entities
Injury_or_Poisoning
, Direction
, Test
, Admission_Discharge
, Death_Entity
, Relationship_Status
, Duration
, Respiration
, Hyperlipidemia
, Birth_Entity
, Age
, Labour_Delivery
, Family_History_Header
, BMI
, Temperature
, Alcohol
, Kidney_Disease
, Oncological
, Medical_History_Header
, Cerebrovascular_Disease
, Oxygen_Therapy
, O2_Saturation
, Psychological_Condition
, Heart_Disease
, Employment
, Obesity
, Disease_Syndrome_Disorder
, Pregnancy
, ImagingFindings
, Procedure
, Medical_Device
, Race_Ethnicity
, Section_Header
, Symptom
, Treatment
, Substance
, Route
, Drug_Ingredient
, Blood_Pressure
, Diet
, External_body_part_or_region
, LDL
, VS_Finding
, Allergen
, EKG_Findings
, Imaging_Technique
, Triglycerides
, RelativeTime
, Gender
, Pulse
, Social_History_Header
, Substance_Quantity
, Diabetes
, Modifier
, Internal_organ_or_component
, Clinical_Dept
, Form
, Drug_BrandName
, Strength
, Fetus_NewBorn
, RelativeDate
, Height
, Test_Result
, Sexually_Active_or_Sexual_Orientation
, Frequency
, Time
, Weight
, Vaccine
, Vaccine_Name
, Vital_Signs_Header
, Communicable_Disease
, Dosage
, Overweight
, Hypertension
, HDL
, Total_Cholesterol
, Smoking
, Date
Live Demo Open in Colab Copy S3 URI
How to use
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentence_detector = SentenceDetector()\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
.setInputCols(["sentence", "token"])\
.setOutputCol("embeddings")
clinical_ner = MedicalNerModel.pretrained("ner_jsl_langtest", "en", "clinical/models")\
.setInputCols(["sentence","token","embeddings"])\
.setOutputCol("ner")
ner_converter = NerConverterInternal()\
.setInputCols(["sentence", "token", "ner"])\
.setOutputCol("ner_chunk")
nlp_pipeline = Pipeline(
stages=[
document_assembler,
sentence_detector,
tokenizer,
word_embeddings,
clinical_ner,
ner_converter
])
text ="""The patient is a 21-day-old Caucasian male here for 2 days of congestion - mom has been suctioning yellow discharge from the patient's nares, plus she has noticed some mild problems with his breathing while feeding (but negative for any perioral cyanosis or retractions). Additionally, there is no side effect observed after Influenza vaccine. One day ago, mom also noticed a tactile temperature and gave the patient Tylenol. Baby also has had some decreased p.o. intake. His normal breast-feeding is down from 20 minutes q.2h. to 5 to 10 minutes secondary to his respiratory congestion. He sleeps well, but has been more tired and has been fussy over the past 2 days. The parents noticed no improvement with albuterol treatments given in the ER. His urine output has also decreased; normally he has 8 to 10 wet and 5 dirty diapers per 24 hours, now he has down to 4 wet diapers per 24 hours. Mom denies any diarrhea. His bowel movements are yellow colored and soft in nature."""
data = spark.createDataFrame([[text]]).toDF("text")
result = nlp_pipeline.fit(data).transform(data)
val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentence_detector = new SentenceDetector()
.setInputCols("document")
.setOutputCol("sentence")
val tokenizer = new Tokenizer()
.setInputCols("sentence")
.setOutputCol("token")
val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical_large", "en", "clinical/models")\
.setInputCols(Array("sentence", "token"))\
.setOutputCol("embeddings")
val jsl_ner_model = MedicalNerModel.pretrained("ner_jsl_langtest", "en", "clinical/models")
.setInputCols(Array("sentence", "token"))
.setOutputCol("posology_ner")
val jsl_ner_converter = new NerConverterInternal()
.setInputCols(Array("sentence", "token", "ner"))
.setOutputCol("posology_ner_chunk")
val jsl_pipeline = new PipelineModel().setStages(Array(document_assembler,
sentence_detector,
tokenizer,
word_embeddings,
jsl_ner_model,
jsl_ner_converter))
text = """The patient is a 21-day-old Caucasian male here for 2 days of congestion - mom has been suctioning yellow discharge from the patient's nares, plus she has noticed some mild problems with his breathing while feeding (but negative for any perioral cyanosis or retractions). Additionally, there is no side effect observed after Influenza vaccine. One day ago, mom also noticed a tactile temperature and gave the patient Tylenol. Baby also has had some decreased p.o. intake. His normal breast-feeding is down from 20 minutes q.2h. to 5 to 10 minutes secondary to his respiratory congestion. He sleeps well, but has been more tired and has been fussy over the past 2 days. The parents noticed no improvement with albuterol treatments given in the ER. His urine output has also decreased; normally he has 8 to 10 wet and 5 dirty diapers per 24 hours, now he has down to 4 wet diapers per 24 hours. Mom denies any diarrhea. His bowel movements are yellow colored and soft in nature."""
val data = Seq(text).toDS.toDF("text")
val result = jsl_pipeline.fit(data).transform(data)
Results
+--------------------+-----+---+--------------------+
| chunk|begin|end| ner_label|
+--------------------+-----+---+--------------------+
| 21-day-old| 17| 26| Age|
| Caucasian| 28| 36| Race_Ethnicity|
| male| 38| 41| Gender|
| for 2 days| 48| 57| Duration|
| congestion| 62| 71| Symptom|
| mom| 75| 77| Gender|
| discharge| 106|114| Admission_Discharge|
| nares| 135|139|External_body_par...|
| she| 147|149| Gender|
| mild| 168|171| Modifier|
|problems with his...| 173|213| Symptom|
| perioral cyanosis| 237|253| Symptom|
| retractions| 258|268| Symptom|
| Influenza vaccine| 325|341| Vaccine_Name|
| One day ago| 344|354| RelativeDate|
| mom| 357|359| Gender|
| Tylenol| 417|423| Drug_BrandName|
| Baby| 426|429| Age|
| decreased p.o| 449|461| Symptom|
| His| 472|474| Gender|
+--------------------+-----+---+--------------------+
Model Information
Model Name: | ner_jsl_langtest |
Compatibility: | Healthcare NLP 5.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Size: | 3.2 MB |
References
trained by in-house dataset
Benchmarking
label tp fp fn total precision recall f1
VS_Finding 181.0 48.0 84.0 265.0 0.7904 0.683 0.7328
Direction 3422.0 522.0 560.0 3982.0 0.8676 0.8594 0.8635
Respiration 57.0 2.0 16.0 73.0 0.9661 0.7808 0.8636
Cerebrovascular_D... 74.0 36.0 29.0 103.0 0.6727 0.7184 0.6948
Family_History_He... 79.0 6.0 2.0 81.0 0.9294 0.9753 0.9518
Heart_Disease 395.0 87.0 133.0 528.0 0.8195 0.7481 0.7822
ImagingFindings 38.0 55.0 137.0 175.0 0.4086 0.2171 0.2836
RelativeTime 102.0 70.0 90.0 192.0 0.593 0.5313 0.5604
Strength 598.0 48.0 58.0 656.0 0.9257 0.9116 0.9186
Smoking 124.0 2.0 5.0 129.0 0.9841 0.9612 0.9725
Medical_Device 2714.0 482.0 471.0 3185.0 0.8492 0.8521 0.8507
Allergen 1.0 4.0 13.0 14.0 0.2 0.0714 0.1053
EKG_Findings 28.0 19.0 37.0 65.0 0.5957 0.4308 0.5
Pulse 102.0 29.0 15.0 117.0 0.7786 0.8718 0.8226
Psychological_Con... 88.0 12.0 20.0 108.0 0.88 0.8148 0.8462
Overweight 2.0 2.0 4.0 6.0 0.5 0.3333 0.4
Triglycerides 2.0 0.0 1.0 3.0 1.0 0.6667 0.8
Obesity 41.0 7.0 6.0 47.0 0.8542 0.8723 0.8632
Admission_Discharge 309.0 28.0 9.0 318.0 0.9169 0.9717 0.9435
HDL 2.0 2.0 1.0 3.0 0.5 0.6667 0.5714
Diabetes 91.0 17.0 12.0 103.0 0.8426 0.8835 0.8626
Section_Header 3272.0 167.0 167.0 3439.0 0.9514 0.9514 0.9514
Age 558.0 83.0 95.0 653.0 0.8705 0.8545 0.8624
O2_Saturation 30.0 8.0 12.0 42.0 0.7895 0.7143 0.75
Kidney_Disease 94.0 12.0 24.0 118.0 0.8868 0.7966 0.8393
Test 2145.0 623.0 497.0 2642.0 0.7749 0.8119 0.793
Communicable_Disease 16.0 15.0 32.0 48.0 0.5161 0.3333 0.4051
Hypertension 139.0 8.0 9.0 148.0 0.9456 0.9392 0.9424
External_body_par... 2008.0 455.0 528.0 2536.0 0.8153 0.7918 0.8034
Oxygen_Therapy 54.0 15.0 27.0 81.0 0.7826 0.6667 0.72
Modifier 2045.0 480.0 677.0 2722.0 0.8099 0.7513 0.7795
Test_Result 753.0 203.0 272.0 1025.0 0.7877 0.7346 0.7602
BMI 3.0 0.0 3.0 6.0 1.0 0.5 0.6667
Labour_Delivery 49.0 24.0 41.0 90.0 0.6712 0.5444 0.6012
Employment 207.0 31.0 64.0 271.0 0.8697 0.7638 0.8134
Fetus_NewBorn 40.0 25.0 64.0 104.0 0.6154 0.3846 0.4734
Clinical_Dept 806.0 136.0 95.0 901.0 0.8556 0.8946 0.8747
Time 20.0 12.0 21.0 41.0 0.625 0.4878 0.5479
Procedure 2455.0 575.0 663.0 3118.0 0.8102 0.7874 0.7986
Diet 29.0 6.0 50.0 79.0 0.8286 0.3671 0.5088
Oncological 344.0 82.0 95.0 439.0 0.8075 0.7836 0.7954
LDL 0.0 0.0 3.0 3.0 0.0 0.0 0.0
Symptom 6190.0 1617.0 1426.0 7616.0 0.7929 0.8128 0.8027
Temperature 70.0 11.0 16.0 86.0 0.8642 0.814 0.8383
Vital_Signs_Header 183.0 26.0 22.0 205.0 0.8756 0.8927 0.8841
Total_Cholesterol 8.0 1.0 1.0 9.0 0.8889 0.8889 0.8889
Relationship_Status 40.0 4.0 5.0 45.0 0.9091 0.8889 0.8989
Blood_Pressure 126.0 36.0 24.0 150.0 0.7778 0.84 0.8077
Injury_or_Poisoning 380.0 112.0 151.0 531.0 0.7724 0.7156 0.7429
Drug_Ingredient 1439.0 177.0 167.0 1606.0 0.8905 0.896 0.8932
Treatment 109.0 28.0 82.0 191.0 0.7956 0.5707 0.6646
Pregnancy 101.0 20.0 75.0 176.0 0.8347 0.5739 0.6801
Vaccine 0.0 0.0 9.0 9.0 0.0 0.0 0.0
Disease_Syndrome_... 2543.0 688.0 643.0 3186.0 0.7871 0.7982 0.7926
Height 18.0 3.0 9.0 27.0 0.8571 0.6667 0.75
Frequency 504.0 108.0 155.0 659.0 0.8235 0.7648 0.7931
Route 737.0 114.0 79.0 816.0 0.866 0.9032 0.8842
Duration 277.0 100.0 121.0 398.0 0.7347 0.696 0.7148
Death_Entity 37.0 17.0 3.0 40.0 0.6852 0.925 0.7872
Internal_organ_or... 5461.0 1326.0 1107.0 6568.0 0.8046 0.8315 0.8178
Vaccine_Name 0.0 0.0 12.0 12.0 0.0 0.0 0.0
Alcohol 76.0 12.0 9.0 85.0 0.8636 0.8941 0.8786
Substance_Quantity 0.0 0.0 6.0 6.0 0.0 0.0 0.0
Date 440.0 16.0 13.0 453.0 0.9649 0.9713 0.9681
Hyperlipidemia 30.0 0.0 4.0 34.0 1.0 0.8824 0.9375
Social_History_He... 78.0 9.0 8.0 86.0 0.8966 0.907 0.9017
Imaging_Technique 26.0 12.0 37.0 63.0 0.6842 0.4127 0.5149
Race_Ethnicity 102.0 2.0 5.0 107.0 0.9808 0.9533 0.9668
Drug_BrandName 886.0 81.0 86.0 972.0 0.9162 0.9115 0.9139
RelativeDate 502.0 153.0 137.0 639.0 0.7664 0.7856 0.7759
Gender 5536.0 88.0 66.0 5602.0 0.9844 0.9882 0.9863
Form 183.0 49.0 66.0 249.0 0.7888 0.7349 0.7609
Dosage 235.0 44.0 82.0 317.0 0.8423 0.7413 0.7886
Medical_History_H... 100.0 7.0 10.0 110.0 0.9346 0.9091 0.9217
Birth_Entity 0.0 0.0 6.0 6.0 0.0 0.0 0.0
Substance 64.0 13.0 23.0 87.0 0.8312 0.7356 0.7805
Sexually_Active_o... 4.0 0.0 0.0 4.0 1.0 1.0 1.0
Weight 64.0 18.0 29.0 93.0 0.7805 0.6882 0.7314
macro - - - - - - 0.7224
micro - - - - - - 0.8377