Detect Diagnosis, Symptoms, Drugs, Labs and Demographics (ner_jsl_enriched)

Description

Pretrained named entity recognition deep learning model for clinical terminology. The SparkNLP deep learning model (MedicalNerModel) is inspired by a former state of the art model for NER: Chiu & Nicols, Named Entity Recognition with Bidirectional LSTM-CNN.

Definitions of Predicted Entities:

  • Age: All mention of ages, past or present, related to the patient or with anybody else.
  • Dosage: Quantity prescribed by the physician for an active ingredient; measurement units are available described by FDA.
  • Drug_Name: Commercial labeling name chosen by the labeler or the drug manufacturer for a drug containing a single or multiple drug active ingredients.
  • Frequency: Frequency of administration for a dose prescribed.
  • Gender: Gender-specific nouns and pronouns.
  • Symptom: All the symptoms mentioned in the document, of a patient or someone else.
  • Allergen: Allergen related extractions mentioned in the document.
  • Blood_Pressure: Systemic blood pressure, mean arterial pressure, systolic and/or diastolic are extracted.
  • Modifier: Terms that modify the symptoms, diseases or risk factors. If a modifier is included in ICD-10 name of a specific disease, the respective modifier is not extracted separately.
  • O2_Saturation: Systemic arterial, venous or peripheral oxygen saturation measurements.
  • Procedure: All mentions of invasive medical or surgical procedures or treatments.
  • Pulse: Peripheral heart rate, without advanced information like measurement location.
  • Respiration: Number of breaths per minute.
  • Route: Drug and medication administration routes available described by FDA.
  • Section_Header: All the section headers present in the text (Medical History, Family History, Social History, Physical Examination and Vital signs Headers are extracted separately with their specific labels).
  • Temperature: All mentions that refer to body temperature.
  • Weight: All mentions related to a patients weight.
  • Substance: All mentions of substance use related to the patient or someone else (recreational drugs, illicit drugs).

Predicted Entities

Age, Name, Procedure, Pulse_Rate, Temperature, Gender, Frequency, Route, Diagnosis, Allergenic_substance, Weight, Respiratory_Rate, Symptom_Name, Causative_Agents(Virus_and_Bacteria), Modifier, Blood_Pressure, Dosage, Drug_Name, Negation, Lab_Result, Section_Name, Maybe, Substance_Name, Procedure_Name, Lab_Name, O2_Saturation

Live Demo Open in Colab Copy S3 URI

How to use

documentAssembler = DocumentAssembler()\
		.setInputCol("text")\
		.setOutputCol("document")

sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models") \
		.setInputCols(["document"]) \
		.setOutputCol("sentence")

tokenizer = Tokenizer()\
		.setInputCols(["sentence"])\
		.setOutputCol("token")
	
embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
		.setInputCols(["sentence", "token"])\
		.setOutputCol("embeddings")

jsl_ner = MedicalNerModel.pretrained("ner_jsl_enriched", "en", "clinical/models") \
		.setInputCols(["sentence", "token", "embeddings"]) \
		.setOutputCol("jsl_ner")

jsl_ner_converter = NerConverter() \
		.setInputCols(["sentence", "token", "jsl_ner"]) \
		.setOutputCol("ner_chunk")

jsl_ner_pipeline = Pipeline().setStages([
				documentAssembler,
				sentenceDetector,
				tokenizer,
				embeddings,
				jsl_ner,
				jsl_ner_converter])


jsl_ner_model = jsl_ner_pipeline.fit(spark.createDataFrame([[""]]).toDF("text"))

data = spark.createDataFrame([["""The patient is a 21-day-old Caucasian male here for 2 days of congestion - mom has been suctioning yellow discharge from the patient's nares, plus she has noticed some mild problems with his breathing while feeding (but negative for any perioral cyanosis or retractions). One day ago, mom also noticed a tactile temperature and gave the patient Tylenol. Baby also has had some decreased p.o. intake. His normal breast-feeding is down from 20 minutes q.2h. to 5 to 10 minutes secondary to his respiratory congestion. He sleeps well, but has been more tired and has been fussy over the past 2 days. The parents noticed no improvement with albuterol treatments given in the ER. His urine output has also decreased; normally he has 8 to 10 wet and 5 dirty diapers per 24 hours, now he has down to 4 wet diapers per 24 hours. Mom denies any diarrhea. His bowel movements are yellow colored and soft in nature."""]]).toDF("text")

result = jsl_ner_model.transform(data)
val documentAssembler = new DocumentAssembler()
		.setInputCol("text")
		.setOutputCol("document")

val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models")
		.setInputCols("document") 
		.setOutputCol("sentence")

val tokenizer = new Tokenizer()
		.setInputCols("sentence")
		.setOutputCol("token")
	
val embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
		.setInputCols(Array("sentence", "token"))
	    .setOutputCol("embeddings")

val jsl_ner = MedicalNerModel.pretrained("ner_jsl_enriched", "en", "clinical/models")
		.setInputCols(Array("sentence", "token", "embeddings"))
		.setOutputCol("jsl_ner")

val jsl_ner_converter = new NerConverter()
		.setInputCols(Array("sentence", "token", "jsl_ner"))
		.setOutputCol("ner_chunk")

val jsl_ner_pipeline = new Pipeline().setStages(Array(
					documentAssembler, 
					sentenceDetector, 
					tokenizer, 
					embeddings, 
					jsl_ner, 
					jsl_ner_converter))


val data = Seq("""The patient is a 21-day-old Caucasian male here for 2 days of congestion - mom has been suctioning yellow discharge from the patient's nares, plus she has noticed some mild problems with his breathing while feeding (but negative for any perioral cyanosis or retractions). One day ago, mom also noticed a tactile temperature and gave the patient Tylenol. Baby also has had some decreased p.o. intake. His normal breast-feeding is down from 20 minutes q.2h. to 5 to 10 minutes secondary to his respiratory congestion. He sleeps well, but has been more tired and has been fussy over the past 2 days. The parents noticed no improvement with albuterol treatments given in the ER. His urine output has also decreased; normally he has 8 to 10 wet and 5 dirty diapers per 24 hours, now he has down to 4 wet diapers per 24 hours. Mom denies any diarrhea. His bowel movements are yellow colored and soft in nature.""").toDS.toDF("text")

val result = jsl_ner_pipeline.fit(data).transform(data)
import nlu
nlu.load("en.med_ner.jsl.enriched").predict("""The patient is a 21-day-old Caucasian male here for 2 days of congestion - mom has been suctioning yellow discharge from the patient's nares, plus she has noticed some mild problems with his breathing while feeding (but negative for any perioral cyanosis or retractions). One day ago, mom also noticed a tactile temperature and gave the patient Tylenol. Baby also has had some decreased p.o. intake. His normal breast-feeding is down from 20 minutes q.2h. to 5 to 10 minutes secondary to his respiratory congestion. He sleeps well, but has been more tired and has been fussy over the past 2 days. The parents noticed no improvement with albuterol treatments given in the ER. His urine output has also decreased; normally he has 8 to 10 wet and 5 dirty diapers per 24 hours, now he has down to 4 wet diapers per 24 hours. Mom denies any diarrhea. His bowel movements are yellow colored and soft in nature.""")

Results

+---------------------------+------------+
|chunk                      |ner         |
+---------------------------+------------+
|21-day-old                 |Age         |
|male                       |Gender      |
|congestion                 |Symptom_Name|
|mom                        |Gender      |
|suctioning yellow discharge|Symptom_Name|
|she                        |Gender      |
|problems with his breathing|Symptom_Name|
|perioral cyanosis          |Symptom_Name|
|retractions                |Symptom_Name|
|mom                        |Gender      |
|Tylenol                    |Drug_Name   |
|His                        |Gender      |
|his                        |Gender      |
|respiratory congestion     |Symptom_Name|
|He                         |Gender      |
|tired                      |Symptom_Name|
|fussy                      |Symptom_Name|
|albuterol                  |Drug_Name   |
+---------------------------+------------+

Model Information

Model Name: ner_jsl_enriched
Compatibility: Healthcare NLP 3.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en

Data Source

Trained on data gathered and manually annotated by John Snow Labs. https://www.johnsnowlabs.com/data/

Benchmarking

label                                      tp     fp     fn      prec       rec        f1
B-Pulse_Rate                               80     26      9  0.754717  0.898876  0.820513
I-Diagnosis                              2341   1644   1129  0.587453  0.67464   0.628035
I-Procedure_Name                         2209   1128   1085  0.661972  0.670613  0.666265
B-Lab_Result                              432    107    263  0.801484  0.621583  0.700162
B-Dosage                                  465    179     81  0.72205   0.851648  0.781513
I-Causative_Agents_(Virus_and_Bacteria)     9      3     10  0.75      0.473684  0.580645
B-Name                                    648    295    510  0.687169  0.559585  0.616849
I-Name                                    917    427    665  0.682292  0.579646  0.626794
B-Weight                                   52     25      9  0.675325  0.852459  0.753623
B-Symptom_Name                           4244   1911   1776  0.689521  0.704983  0.697166
I-Maybe                                    25     15     63  0.625     0.284091  0.390625
I-Symptom_Name                           1920   1584   2503  0.547945  0.434095  0.48442 
B-Modifier                               1399    704    942  0.66524   0.597608  0.629613
B-Blood_Pressure                           82     21      7  0.796117  0.921348  0.854167
B-Frequency                               290     93     97  0.75718   0.749354  0.753247
I-Gender                                   29     19     25  0.604167  0.537037  0.568627
I-Age                                       3      6     11  0.333333  0.214286  0.26087 
B-Drug_Name                              1762    500    271  0.778957  0.866699  0.820489
B-Substance_Name                          143     32     53  0.817143  0.729592  0.770889
B-Temperature                              58     23     11  0.716049  0.84058   0.773333
B-Section_Name                           2700    294    177  0.901804  0.938478  0.919775
I-Route                                   131    165    177  0.442568  0.425325  0.433775
B-Maybe                                   108     47    164  0.696774  0.397059  0.505855
B-Gender                                 5156    685     68  0.882726  0.986983  0.931948
I-Dosage                                  435    182     87  0.705024  0.833333  0.763828
B-Causative_Agents_(Virus_and_Bacteria)    21     17      6  0.552632  0.777778  0.646154
I-Frequency                               278    131    191  0.679707  0.592751  0.633257
B-Age                                     352     34     21  0.911917  0.9437    0.927536
I-Lab_Result                               27     20    170  0.574468  0.137056  0.221311
B-Negation                               1501    311    341  0.828366  0.814875  0.821565
B-Diagnosis                              2657   1281   1049  0.674708  0.716945  0.695186
I-Section_Name                           3876   1304    188  0.748263  0.95374   0.838598
B-Route                                   466    286    123  0.619681  0.791172  0.695004
I-Negation                                 80    152    190  0.344828  0.296296  0.318725
B-Procedure_Name                         1453    739    562  0.662865  0.721092  0.690754
I-Allergenic_substance                      6      1      7  0.857143  0.461538  0.6     
B-Allergenic_substance                     74     31     23  0.704762  0.762887  0.732673
I-Weight                                   46     43     17  0.516854  0.730159  0.605263
B-Lab_Name                                639    189    287  0.771739  0.690065  0.72862 
I-Modifier                                104    156    417  0.4       0.199616  0.266325
I-Temperature                               2      7     13  0.222222  0.133333  0.166667
I-Drug_Name                               334    237    290  0.584939  0.535256  0.558996
I-Lab_Name                                271    157    140  0.633178  0.659367  0.646007
B-Respiratory_Rate                         46      6      5  0.884615  0.901961  0.893204
Macro-average                           37896  15237  14343  0.621144  0.562248  0.59023 
Micro-average                           37896  15237  14343  0.713229  0.725435  0.71928