Description
Assertion of Clinical Entities based on Deep Learning.
Predicted Entities
hypothetical
, present
, absent
, possible
, conditional
, associated_with_someone_else
.
How to use
Use as part of an nlp pipeline with the following stages: DocumentAssembler, SentenceDetector, Tokenizer, WordEmbeddingsModel, NerDLModel, AssertionDLModel.
documentAssembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentenceDetector = SentenceDetector()\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
word_embeddings = WordEmbeddingsModel.pretrained("embeddings_healthcare_100d", "en", "clinical/models")\
.setInputCols(["sentence", "token"])\
.setOutputCol("embeddings")
clinical_ner = MedicalNerModel.pretrained("ner_healthcare", "en", "clinical/models") \
.setInputCols(["sentence", "token", "embeddings"]) \
.setOutputCol("ner")
ner_converter = NerConverter() \
.setInputCols(["sentence", "token", "ner"]) \
.setOutputCol("ner_chunk")
clinical_assertion = AssertionDLModel.pretrained("assertion_dl_healthcare", "en", "clinical/models") \
.setInputCols(["sentence", "ner_chunk", "embeddings"]) \
.setOutputCol("assertion")
nlpPipeline = Pipeline(stages=[
documentAssembler,
sentenceDetector,
tokenizer,
word_embeddings,
clinical_ner,
ner_converter,
clinical_assertion
])
model = nlpPipeline.fit(spark.createDataFrame([['Patient has a headache for the last 2 weeks and appears anxious when she walks fast. No alopecia noted. She denies pain']]).toDF("text"))
results = model.transform(data)
val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentenceDetector = new SentenceDetector()
.setInputCols("document")
.setOutputCol("sentence")
val tokenizer = new Tokenizer()
.setInputCols("sentence")
.setOutputCol("token")
val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_healthcare_100d", "en", "clinical/models")
.setInputCols(Array("sentence", "token"))
.setOutputCol("embeddings")
val clinical_ner = MedicalNerModel.pretrained("ner_healthcare", "en", "clinical/models")
.setInputCols(Array("sentence", "token", "embeddings"))
.setOutputCol("ner")
val ner_converter = new NerConverter()
.setInputCols(Array("sentence", "token", "ner"))
.setOutputCol("ner_chunk")
val clinical_assertion = AssertionDLModel.pretrained("assertion_dl_healthcare","en","clinical/models")
.setInputCols("document","ner_chunk","embeddings")
.setOutputCol("assertion")
val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, word_embeddings, clinical_ner, ner_converter, clinical_assertion))
val data = Seq("Patient has a headache for the last 2 weeks and appears anxious when she walks fast. No alopecia noted. She denies pain").toDF("text")
val result = pipeline.fit(data).transform(data)
import nlu
nlu.load("en.assert.healthcare").predict("""Patient has a headache for the last 2 weeks and appears anxious when she walks fast. No alopecia noted. She denies pain""")
Result
| | chunks | entities| assertion |
|--:|-----------:|--------:|------------:|
| 0 | a headache | PROBLEM | present |
| 1 | anxious | PROBLEM | conditional |
| 2 | alopecia | PROBLEM | absent |
| 3 | pain | PROBLEM | absent |
Model Information
Name: | assertion_dl_healthcare | |
Type: | AssertionDLModel | |
Compatibility: | 2.6.0 | |
License: | Licensed | |
Edition: | Official | |
Input labels: | [document, chunk, word_embeddings] | |
Output labels: | [assertion] | |
Language: | en | |
Case sensitive: | False | |
Dependencies: | embeddings_healthcare_100d |
Data Source
Trained with augmented version of 2010 i2b2/VA dataset on concepts, assertions, and relations in clinical text with embeddings_clinical
.
https://portal.dbmi.hms.harvard.edu/projects/n2c2-nlp/
Benchmarking
label prec rec f1
absent 0.9289 0.9466 0.9377
present 0.9433 0.9559 0.9496
conditional 0.6888 0.5 0.5794
associated_with_someone_else 0.9285 0.9122 0.9203
hypothetical 0.9079 0.8654 0.8862
possible 0.7 0.6146 0.6545
macro-avg 0.8496 0.7991 0.8236
micro-avg 0.9245 0.9245 0.9245