Description
This is a pretrained named entity recognition deep learning model for clinical terminology. It is based on ner_jsl
model, but with more generalised entities.
Definitions of Predicted Entities:
Death_Entity
: Mentions that indicate the death of a patient.Medical_Device
: All mentions related to medical devices and supplies.Vital_Signs
: Identifies Vital Signs of a patient.Header
: Identifies section headers that correspond to Vital Signs of a patient.Allergen
: Allergen related extractions mentioned in the document.Drug_BrandName
: Commercial labeling name chosen by the labeler or the drug manufacturer for a drug containing a single or multiple drug active ingredients.Clinical_Dept
: Terms that indicate the medical and/or surgical departments.Symptom
: All the symptoms mentioned in the document, of a patient or someone else.External_body_part_or_region
: All mentions related to external body parts or organs that can be examined by naked eye.Admission_Discharge
: Terms that indicate the admission and/or the discharge of a patient.Age
: All mention of ages, past or present, related to the patient or with anybody else.Birth_Entity
: Mentions that indicate giving birth.Oncological
: All the cancer, tumor or metastasis related extractions mentioned in the document, of the patient or someone else.Substance_Quantity
: All mentions of substance quantity (quantitative information related to illicit/recreational drugs).Test_Result
: Terms related to all the test results present in the document (clinical tests results are included).Test
: Mentions of laboratory, pathology, and radiological tests.Procedure
: All mentions of invasive medical or surgical procedures or treatments.Treatment
: Includes therapeutic and minimally invasive treatment and procedures (invasive treatments or procedures are extracted as “Procedure”).Disease_Syndrome_Disorder
: All the diseases mentioned in the document, of the patient or someone else (excluding diseases that are extracted with their specific labels, such as “Heart_Disease” etc.).
Predicted Entities
Death_Entity
, Medical_Device
, Vital_Sign
, Alergen
, Drug
, Clinical_Dept
, Lifestyle
, Symptom
, Body_Part
, Physical_Measurement
, Admission_Discharge
, Date_Time
, Age
, Birth_Entity
, Header
, Oncological
, Substance_Quantity
, Test_Result
, Test
, Procedure
, Treatment
, Disease_Syndrome_Disorder
, Pregnancy_Newborn
, Demographics
Live Demo Open in Colab Copy S3 URI
How to use
documentAssembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models") \
.setInputCols(["document"]) \
.setOutputCol("sentence")
tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
.setInputCols(["sentence", "token"])\
.setOutputCol("embeddings")
jsl_ner = MedicalNerModel.pretrained("ner_jsl_slim", "en", "clinical/models") \
.setInputCols(["sentence", "token", "embeddings"]) \
.setOutputCol("jsl_ner")
jsl_ner_converter = NerConverter() \
.setInputCols(["sentence", "token", "jsl_ner"]) \
.setOutputCol("ner_chunk")
jsl_ner_pipeline = Pipeline().setStages([
documentAssembler,
sentenceDetector,
tokenizer,
embeddings,
jsl_ner,
jsl_ner_converter])
jsl_ner_model = jsl_ner_pipeline.fit(spark.createDataFrame([[""]]).toDF("text"))
data = spark.createDataFrame([["""HISTORY: 30-year-old female presents for digital bilateral mammography secondary to a soft tissue lump palpated by the patient in the upper right shoulder. The patient has a family history of breast cancer within her mother at age 58. Patient denies personal history of breast cancer."""]]).toDF("text")
result = jsl_ner_model.transform(data)
val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models")
.setInputCols("document")
.setOutputCol("sentence")
val tokenizer = new Tokenizer()
.setInputCols("sentence")
.setOutputCol("token")
val embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
.setInputCols(Array("sentence", "token"))
.setOutputCol("embeddings")
val jsl_ner = MedicalNerModel.pretrained("ner_jsl_slim", "en", "clinical/models")
.setInputCols(Array("sentence", "token", "embeddings"))
.setOutputCol("jsl_ner")
val jsl_ner_converter = new NerConverter()
.setInputCols(Array("sentence", "token", "jsl_ner"))
.setOutputCol("ner_chunk")
val jsl_ner_pipeline = new Pipeline().setStages(Array(
documentAssembler,
sentenceDetector,
tokenizer,
embeddings,
jsl_ner,
jsl_ner_converter))
val data = Seq("""HISTORY: 30-year-old female presents for digital bilateral mammography secondary to a soft tissue lump palpated by the patient in the upper right shoulder. The patient has a family history of breast cancer within her mother at age 58. Patient denies personal history of breast cancer.""").toDS.toDF("text")
val result = jsl_ner_pipeline.fit(data).transform(data)
import nlu
nlu.load("en.med_ner.jsl_slim").predict("""HISTORY: 30-year-old female presents for digital bilateral mammography secondary to a soft tissue lump palpated by the patient in the upper right shoulder. The patient has a family history of breast cancer within her mother at age 58. Patient denies personal history of breast cancer.""")
Results
| | chunk | entity |
|---:|:-----------------|:-------------|
| 0 | HISTORY: | Header |
| 1 | 30-year-old | Age |
| 2 | female | Demographics |
| 3 | mammography | Test |
| 4 | soft tissue lump | Symptom |
| 5 | shoulder | Body_Part |
| 6 | breast cancer | Oncological |
| 7 | her mother | Demographics |
| 8 | age 58 | Age |
| 9 | breast cancer | Oncological |
Model Information
Model Name: | ner_jsl_slim |
Compatibility: | Healthcare NLP 3.2.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Data Source
Trained on data annotated by JSL.
Benchmarking
label tp fp fn prec rec f1
B-Medical_Device 2696 444 282 0.8585987 0.9053055 0.8813337
I-Physical_Measurement 220 16 34 0.9322034 0.8661417 0.8979592
B-Procedure 1800 239 281 0.8827857 0.8649688 0.8737864
B-Drug 1865 218 237 0.8953432 0.8872502 0.8912784
I-Test_Result 289 203 292 0.5873983 0.4974182 0.5386766
I-Pregnancy_Newborn 150 41 104 0.7853403 0.5905512 0.6741573
B-Admission_Discharge 255 35 6 0.8793103 0.9770115 0.9255898
B-Demographics 4609 119 123 0.9748308 0.9740068 0.9744186
I-Lifestyle 71 49 20 0.5916666 0.7802198 0.6729857
B-Header 2463 53 122 0.9789348 0.9528046 0.965693
I-Date_Time 928 184 191 0.8345324 0.8293119 0.8319139
B-Test_Result 866 198 262 0.8139097 0.7677305 0.7901459
I-Treatment 114 37 46 0.7549669 0.7125 0.733119
B-Clinical_Dept 688 83 76 0.8923476 0.9005235 0.8964169
B-Test 1920 333 313 0.8521970 0.8598298 0.8559965
B-Death_Entity 36 9 2 0.8 0.9473684 0.8674699
B-Lifestyle 268 58 50 0.8220859 0.8427673 0.8322981
B-Date_Time 823 154 176 0.8423746 0.8238238 0.8329959
I-Age 136 34 49 0.8 0.7351351 0.7661972
I-Oncological 345 41 19 0.8937824 0.9478022 0.9199999
I-Body_Part 3717 720 424 0.8377282 0.8976093 0.8666356
B-Pregnancy_Newborn 153 51 104 0.75 0.5953307 0.6637744
B-Treatment 169 41 58 0.8047619 0.7444933 0.7734553
I-Procedure 2302 326 417 0.8759513 0.8466348 0.8610435
B-Birth_Entity 6 5 7 0.5454545 0.4615384 0.5
I-Vital_Sign 639 197 93 0.7643540 0.8729508 0.815051
I-Header 4451 111 216 0.9756685 0.9537176 0.9645682
I-Death_Entity 2 0 0 1 1 1
I-Clinical_Dept 621 54 39 0.92 0.9409091 0.9303371
I-Test 1593 378 353 0.8082192 0.8186022 0.8133775
B-Age 472 43 51 0.9165048 0.9024856 0.9094413
I-Symptom 4227 1271 1303 0.7688250 0.7643761 0.7665941
I-Demographics 321 53 53 0.8582887 0.8582887 0.8582887
B-Body_Part 6312 912 809 0.8737541 0.8863923 0.8800279
B-Physical_Measurement 91 10 17 0.9009901 0.8425926 0.8708134
B-Disease_Syndrome_Disorder 2817 336 433 0.8934348 0.8667692 0.8799001
B-Symptom 4522 830 747 0.8449178 0.8582274 0.8515206
I-Disease_Syndrome_Disorder 2814 386 530 0.879375 0.8415072 0.8600244
I-Drug 3737 612 517 0.859278 0.8784673 0.8687667
I-Medical_Device 1825 331 131 0.8464749 0.9330266 0.8876459
B-Oncological 276 28 27 0.9078947 0.9108911 0.9093904
B-Vital_Sign 429 97 79 0.8155893 0.8444882 0.8297872
Macro-average 62038 9340 9110 0.7678277 0.7648211 0.7663215
Micro-average 62038 9340 9110 0.8691473 0.8719570 0.87055