Description
Logistic regression based named entity recognition model for assertions.
Predicted Labels
Hypothetical, Present, Absent, Possible, Conditional, Associated_with_someone_else
How to use
Use as part of an nlp pipeline with the following stages: DocumentAssembler, SentenceDetector, Tokenizer, WordEmbeddingsModel, NerDLModel, NerConverter, AssertionLogRegModel.
...
word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
.setInputCols(["sentence", "token"])\
.setOutputCol("embeddings")
clinical_ner = NerDLModel.pretrained("ner_clinical", "en", "clinical/models") \
.setInputCols(["sentence", "token", "embeddings"]) \
.setOutputCol("ner")
ner_converter = NerConverter() \
.setInputCols(["sentence", "token", "ner"]) \
.setOutputCol("ner_chunk")
clinical_assertion = AssertionDLModel.pretrained("assertion_ml", "en", "clinical/models") \
.setInputCols(["sentence", "ner_chunk", "embeddings"]) \
.setOutputCol("assertion")
nlpPipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, word_embeddings, clinical_ner, ner_converter, clinical_assertion])
model = nlpPipeline.fit(spark.createDataFrame([[""]]).toDF("text"))
light_result = LightPipeline(model).fullAnnotate('Patient has a headache for the last 2 weeks and appears anxious when she walks fast. No alopecia noted. She denies pain')[0]
...
val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
.setInputCols(Array("sentence", "token"))
.setOutputCol("embeddings")
val clinical_ner = NerDLModel.pretrained("ner_clinical", "en", "clinical/models")
.setInputCols(Array("sentence", "token", "embeddings"))
.setOutputCol("ner")
val ner_converter = NerConverter()
.setInputCols(Array("sentence", "token", "ner"))
.setOutputCol("ner_chunk")
val clinical_assertion_ml = AssertionLogRegModel.pretrained("assertion_ml", "en", "clinical/models")
.setInputCols("sentence", "ner_chunk", "embeddings")
.setOutputCol("assertion")
val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDetector, tokenizer, word_embeddings, clinical_ner, ner_converter, clinical_assertion_ml))
val data = Seq("Patient has a headache for the last 2 weeks and appears anxious when she walks fast. No alopecia noted. She denies pain").toDF("text")
val result = pipeline.fit(data).transform(data)
Results
The output is a dataframe with a sentence per row and an “assertion” column containing all of the assertion labels in the sentence. The assertion column also contains assertion character indices, and other metadata. To get only the entity chunks and assertion labels, without the metadata, select “ner_chunk.result” and “assertion.result” from your output dataframe.
| | chunks | entities | assertion |
|---|------------|----------|-------------|
| 0 | a headache | PROBLEM | present |
| 1 | anxious | PROBLEM | conditional |
| 2 | alopecia | PROBLEM | absent |
| 3 | pain | PROBLEM | absent |
Model Information
Model Name: | assertion_ml_en_2.4.0_2.4 |
Type: | ner |
Compatibility: | Spark NLP 2.4.0+ |
Edition: | Official |
License: | Licensed |
Input Labels: | [sentence, ner_chunk, embeddings] |
Output Labels: | [assertion] |
Language: | [en] |
Case sensitive: | false |
Data Source
Trained with augmented version of 2010 i2b2/VA dataset on concepts, assertions, and relations in clinical text with ‘embeddings_clinical’. https://portal.dbmi.hms.harvard.edu/projects/n2c2-nlp/