Description
Assign assertion status to clinical entities.
Predicted Entities
present
, absent
, possible
How to use
document_assembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")
sentence_detector = SentenceDetector() \
.setInputCols(["document"]) \
.setOutputCol("sentence")
tokenizer = Tokenizer() \
.setInputCols(["sentence"]) \
.setOutputCol("token")
word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models") \
.setInputCols(["sentence", "token"]) \
.setOutputCol("embeddings")
clinical_ner = MedicalNerModel.pretrained("ner_clinical", "en", "clinical/models") \
.setInputCols(["sentence", "token", "embeddings"]) \
.setOutputCol("ner")
ner_converter = NerConverterInternal() \
.setInputCols(["sentence", "token", "ner"]) \
.setOutputCol("ner_chunk")
clinical_assertion = BertAssertionClassifier.pretrained("assertion_bert_classifier_jsl_slim", "en", "clinical/models") \
.setInputCols(["sentence", "ner_chunk"]) \
.setOutputCol("assertion")
pipeline = Pipeline().setStages([
document_assembler,
sentence_detector,
tokenizer,
word_embeddings,
clinical_ner,
ner_converter,
clinical_assertion
])
text = """Patient with severe fever and sore throat. He shows no stomach pain and he maintained on an epidural.
and PCA for pain control. He also became short of breath with climbing a flight of stairs. After CT,
lung tumor located at the right lower lobe. Father with Alzheimer."""
data = spark.createDataFrame([[text]]).toDF("text")
result_df = pipeline.fit(data).transform(data)
result_df.selectExpr("explode(assertion) as result")\
.select("result.metadata.ner_chunk", "result.begin", "result.end","result.metadata.ner_label", "result.result")\
.show(100, False)
val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentence_detector = new SentenceDetector()
.setInputCols("document")
.setOutputCol("sentence")
val tokenizer = new Tokenizer()
.setInputCols("sentence")
.setOutputCol("token")
val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
.setInputCols("sentence", "token")
.setOutputCol("embeddings")
val clinical_ner = MedicalNerModel.pretrained("ner_clinical", "en", "clinical/models")
.setInputCols("sentence", "token", "embeddings")
.setOutputCol("ner")
val ner_converter = new NerConverterInternal()
.setInputCols("sentence", "token", "ner")
.setOutputCol("ner_chunk")
val clinical_assertion = BertAssertionClassifier.pretrained("assertion_bert_classifier_jsl_slim", "en", "clinical/models")
.setInputCols("sentence", "ner_chunk")
.setOutputCol("assertion")
val pipeline = new Pipeline().setStages(Array(
document_assembler,
sentence_detector,
tokenizer,
word_embeddings,
clinical_ner,
ner_converter,
clinical_assertion
))
val text = """Patient with severe fever and sore throat. He shows no stomach pain and he maintained on an epidural.
|and PCA for pain control. He also became short of breath with climbing a flight of stairs. After CT,
|lung tumor located at the right lower lobe. Father with Alzheimer.""".stripMargin
val data = Seq(text).toDF("text")
val result_df = pipeline.fit(data).transform(data)
result_df.selectExpr("explode(assertion) as result")
.select("result.metadata.ner_chunk", "result.begin", "result.end","result.metadata.ner_label", "result.result")
.show(100, false)
Results
+---------------+-----+---+---------+-------+
|ner_chunk |begin|end|ner_label|result |
+---------------+-----+---+---------+-------+
|severe fever |13 |24 |PROBLEM |present|
|sore throat |30 |40 |PROBLEM |present|
|stomach pain |55 |66 |PROBLEM |absent |
|an epidural |89 |99 |TREATMENT|present|
|PCA |106 |108|TREATMENT|present|
|pain control |114 |125|TREATMENT|present|
|short of breath|143 |157|PROBLEM |present|
|CT |199 |200|TEST |present|
|lung tumor |203 |212|PROBLEM |present|
|Alzheimer |259 |267|PROBLEM |present|
+---------------+-----+---+---------+-------+
Model Information
Model Name: | assertion_bert_classifier_jsl_slim |
Compatibility: | Healthcare NLP 5.5.3+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [document, ner_chunk] |
Output Labels: | [assertion] |
Language: | en |
Size: | 406.3 MB |
Case sensitive: | false |
Benchmarking
label precision recall f1-score support
absent 0.988 0.931 0.959 2594
possible 0.730 0.755 0.742 652
present 0.964 0.979 0.971 8622
accuracy - - 0.956 11868
macro avg 0.894 0.888 0.891 11868
weighted avg 0.957 0.956 0.956 11868