Description
Pretrained named entity recognition deep learning model for clinical terms. The SparkNLP deep learning model (MedicalNerModel) is inspired by a former state of the art model for NER: Chiu & Nicols, Named Entity Recognition with Bidirectional LSTM-CNN.
Predicted Entities
PROBLEM
, TEST
, TREATMENT
.
Live Demo Open in Colab Copy S3 URI
How to use
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
.setInputCols(["sentence", "token"])\
.setOutputCol("embeddings")
clinical_ner = MedicalNerModel.pretrained("ner_clinical_large", "en", "clinical/models") \
.setInputCols(["sentence", "token", "embeddings"]) \
.setOutputCol("ner")
ner_converter = NerConverter()\
.setInputCols(["sentence", "token", "ner"])\
.setOutputCol("ner_chunk")
nlpPipeline = Pipeline(
stages=[
document_assembler,
sentence_detector,
tokenizer,
word_embeddings,
clinical_ner,
ner_converter])
data = spark.createDataFrame([["""Mr. ABC is a 60-year-old gentleman who had stress test earlier today in my office with severe chest pain after 5 minutes of exercise on the standard Bruce with horizontal ST depressions and moderate apical ischemia on stress imaging only. He required 3 sublingual nitroglycerin in total. The patient underwent cardiac catheterization with myself today which showed mild-to-moderate left main distal disease of 30%, a severe mid-LAD lesion of 99%, and a mid-left circumflex lesion of 80% with normal LV function and some mild luminal irregularities in the right coronary artery with some moderate stenosis seen in the mid to distal right PDA."""]]).toDF("text")
result = nlpPipeline.fit(data).transform(data)
val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentence_detector = new entenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")\
.setInputCols("document")
.setOutputCol("sentence")
val tokenizer = new Tokenizer()
.setInputCols("sentence")
.setOutputCol("token")
val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
.setInputCols(Array("sentence", "token"))
.setOutputCol("embeddings")
val ner = MedicalNerModel.pretrained("ner_clinical_large", "en", "clinical/models")
.setInputCols("sentence", "token", "embeddings")
.setOutputCol("ner")
val ner_converter = new NerConverter()
.setInputCols(Array("sentence", "token", "ner"))
.setOutputCol("ner_chunk")
val pipeline = new Pipeline().setStages(
Array(
document_assembler,
sentence_detector,
tokenizer,
word_embeddings,
ner,
ner_converter))
val data = Seq("""Mr. ABC is a 60-year-old gentleman who had stress test earlier today in my office with severe chest pain after 5 minutes of exercise on the standard Bruce with horizontal ST depressions and moderate apical ischemia on stress imaging only. He required 3 sublingual nitroglycerin in total. The patient underwent cardiac catheterization with myself today which showed mild-to-moderate left main distal disease of 30%, a severe mid-LAD lesion of 99%, and a mid-left circumflex lesion of 80% with normal LV function and some mild luminal irregularities in the right coronary artery with some moderate stenosis seen in the mid to distal right PDA.""").toDS().toDF("text")
val result = pipeline.fit(data).transform(data)
Results
+-------------------------------------------------------------+-----+---+---------+
|chunk |begin|end|ner_label|
+-------------------------------------------------------------+-----+---+---------+
|stress test |43 |53 |TEST |
|severe chest pain |87 |103|PROBLEM |
|horizontal ST depressions |160 |184|PROBLEM |
|moderate apical ischemia |190 |213|PROBLEM |
|stress imaging |218 |231|TEST |
|3 sublingual nitroglycerin |251 |276|TREATMENT|
|cardiac catheterization |310 |332|TEST |
|mild-to-moderate left main distal disease of 30% |365 |412|PROBLEM |
|a severe mid-LAD lesion |415 |437|PROBLEM |
|a mid-left circumflex lesion |451 |478|PROBLEM |
|some mild luminal irregularities in the right coronary artery|515 |575|PROBLEM |
|some moderate stenosis |582 |603|PROBLEM |
+-------------------------------------------------------------+-----+---+---------+
Model Information
Model Name: | ner_clinical_large |
Compatibility: | Healthcare NLP 3.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Data Source
Trained on augmented version of 2010 i2b2 challenge data with ‘embeddings_clinical’. https://portal.dbmi.hms.harvard.edu/projects/n2c2-nlp/
Benchmarking
| | label | tp | fp | fn | prec | rec | f1 |
|---:|--------------:|------:|------:|------:|---------:|---------:|---------:|
| 0 | I-TREATMENT | 6625 | 1187 | 1329 | 0.848054 | 0.832914 | 0.840416 |
| 1 | I-PROBLEM | 15142 | 1976 | 2542 | 0.884566 | 0.856254 | 0.87018 |
| 2 | B-PROBLEM | 11005 | 1065 | 1587 | 0.911765 | 0.873968 | 0.892466 |
| 3 | I-TEST | 6748 | 923 | 1264 | 0.879677 | 0.842237 | 0.86055 |
| 4 | B-TEST | 8196 | 942 | 1029 | 0.896914 | 0.888455 | 0.892665 |
| 5 | B-TREATMENT | 8271 | 1265 | 1073 | 0.867345 | 0.885167 | 0.876165 |
| 6 | Macro-average | 55987 | 7358 | 8824 | 0.881387 | 0.863166 | 0.872181 |
| 7 | Micro-average | 55987 | 7358 | 8824 | 0.883842 | 0.86385 | 0.873732 |