Description
Pretrained named entity recognition deep learning model for clinical terms. The SparkNLP deep learning model (MedicalNerModel) is inspired by a former state of the art model for NER: Chiu & Nicols, Named Entity Recognition with Bidirectional LSTM-CNN.
Predicted Entities
PROBLEM, TEST, TREATMENT
Live Demo Open in Colab Copy S3 URI
How to use
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentence_detector = SentenceDetector()\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
.setInputCols(["sentence", "token"])\
.setOutputCol("embeddings")
clinical_ner = MedicalNerModel.pretrained("ner_clinical_large_v2", "en", "clinical/models") \
.setInputCols(["sentence", "token", "embeddings"]) \
.setOutputCol("ner")
ner_converter = NerConverter() \
.setInputCols(["sentence", "token", "ner"]) \
.setOutputCol("ner_chunk")
nlpPipeline = Pipeline(stages=[
document_assembler,
sentence_detector,
tokenizer,
word_embeddings,
clinical_ner,
ner_converter])
model = nlpPipeline.fit(spark.createDataFrame([[""]]).toDF("text"))
text = """
Mr. ABC is a 60-year-old gentleman who had stress test earlier today in my office with severe chest pain after 5 minutes of exercise on the standard Bruce with horizontal ST depressions and moderate apical ischemia on stress imaging only.
He required 3 sublingual nitroglycerin in total.
The patient underwent cardiac catheterization with myself today which showed mild-to-moderate left main distal disease of 30%, a severe mid-LAD lesion of 99%, and a mid-left circumflex lesion of 80% with normal LV function and some mild luminal irregularities in the right coronary artery with some moderate stenosis seen in the mid to distal right PDA.
"""
results = model.transform(spark.createDataFrame([[text]]).toDF("text"))
document_assembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentence_detector = nlp.SentenceDetector()\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = nlp.Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
word_embeddings = nlp.WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
.setInputCols(["sentence", "token"])\
.setOutputCol("embeddings")
clinical_ner = medical.NerModel.pretrained("ner_clinical_large_v2", "en", "clinical/models") \
.setInputCols(["sentence", "token", "embeddings"]) \
.setOutputCol("ner")
ner_converter = medical.NerConverterInternal()\
.setInputCols(["sentence","token","ner"])\
.setOutputCol("ner_chunk")
nlpPipeline = nlp.Pipeline(stages=[
document_assembler,
sentence_detector,
tokenizer,
word_embeddings,
clinical_ner,
ner_converter])
model = nlpPipeline.fit(spark.createDataFrame([[""]]).toDF("text"))
text = """
Mr. ABC is a 60-year-old gentleman who had stress test earlier today in my office with severe chest pain after 5 minutes of exercise on the standard Bruce with horizontal ST depressions and moderate apical ischemia on stress imaging only.
He required 3 sublingual nitroglycerin in total.
The patient underwent cardiac catheterization with myself today which showed mild-to-moderate left main distal disease of 30%, a severe mid-LAD lesion of 99%, and a mid-left circumflex lesion of 80% with normal LV function and some mild luminal irregularities in the right coronary artery with some moderate stenosis seen in the mid to distal right PDA.
"""
results = model.transform(spark.createDataFrame([[text]]).toDF("text"))
val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentence_detector = new SentenceDetector()
.setInputCols("document")
.setOutputCol("sentence")
val tokenizer = new Tokenizer()
.setInputCols("sentence")
.setOutputCol("token")
val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
.setInputCols(Array("sentence", "token"))
.setOutputCol("embeddings")
val clinical_ner = MedicalNerModel.pretrained("ner_clinical_large_v2", "en", "clinical/models")
.setInputCols(Array("sentence", "token", "embeddings"))
.setOutputCol("ner")
val ner_converter = new NerConverterInternal()
.setInputCols(Array("sentence", "token", "ner"))
.setOutputCol("ner_chunk")
val pipeline = new Pipeline().setStages(Array(
document_assembler,
sentence_detector,
tokenizer,
word_embeddings,
clinical_ner,
ner_converter
))
val data = Seq("""
Mr. ABC is a 60-year-old gentleman who had stress test earlier today in my office with severe chest pain after 5 minutes of exercise on the standard Bruce with horizontal ST depressions and moderate apical ischemia on stress imaging only.
He required 3 sublingual nitroglycerin in total.
The patient underwent cardiac catheterization with myself today which showed mild-to-moderate left main distal disease of 30%, a severe mid-LAD lesion of 99%, and a mid-left circumflex lesion of 80% with normal LV function and some mild luminal irregularities in the right coronary artery with some moderate stenosis seen in the mid to distal right PDA.
""").toDF("text")
val results = pipeline.fit(data).transform(data)
Results
|chunk |begin|end|ner_label|
|--------------------------:|:----|:--|:--------|
|stress test |44 |54 |TEST |
|chest pain |95 |104|PROBLEM |
|ST depressions |172 |185|PROBLEM |
|ischemia |207 |214|PROBLEM |
|stress imaging |219 |232|TEST |
|cardiac catheterization |313 |335|TEST |
|disease |402 |408|PROBLEM |
|severe mid-LAD lesion |420 |440|PROBLEM |
|mild luminal irregularities|523 |549|PROBLEM |
|moderate stenosis |590 |606|PROBLEM |
Model Information
| Model Name: | ner_clinical_large_v2 |
| Compatibility: | Healthcare NLP 6.2.0+ |
| License: | Licensed |
| Edition: | Official |
| Input Labels: | [sentence, token, embeddings] |
| Output Labels: | [ner] |
| Language: | en |
| Size: | 14.5 MB |
References
Trained on merged ner oncology, ner posology and ner clinical datasets.
Benchmarking
| Label | Precision | Recall | F1-Score | Support |
|----------------|-----------|--------|----------|---------|
| B-PROBLEM | 0.8871 | 0.8788 | 0.8829 | 17,793 |
| B-TEST | 0.8488 | 0.8439 | 0.8464 | 8,433 |
| B-TREATMENT | 0.8787 | 0.8589 | 0.8687 | 10,841 |
| I-PROBLEM | 0.8410 | 0.8500 | 0.8455 | 10,730 |
| I-TEST | 0.7968 | 0.8301 | 0.8131 | 5,773 |
| I-TREATMENT | 0.8191 | 0.8170 | 0.8181 | 4,967 |
| O | 0.9908 | 0.9910 | 0.9909 | 506,113 |
| Accuracy | | | 0.9769 | 564,650 |
| Macro Avg | 0.8661 | 0.8671 | 0.8665 | 564,650 |
| Weighted Avg | 0.9769 | 0.9769 | 0.9769 | 564,650 |