Assertion ML


Logistic regression based named entity recognition model for assertions.

Predicted Labels

Hypothetical, Present, Absent, Possible, Conditional, Associated_with_someone_else

Open in Colab Download

How to use

Use as part of an nlp pipeline with the following stages: DocumentAssembler, SentenceDetector, Tokenizer, WordEmbeddingsModel, NerDLModel, NerConverter, AssertionLogRegModel.

clinical_assertion_ml = AssertionLogRegModel.pretrained("assertion_ml", "en", "clinical/models") \
    .setInputCols(["sentence", "ner_chunk", "embeddings"]) \
nlpPipeline = Pipeline(stages=[clinical_assertion_ml])

empty_data = spark.createDataFrame([[""]]).toDF("text")

model =

val clinical_assertion_ml = AssertionLogRegModel.pretrained("assertion_ml", "en", "clinical/models") \
    .setInputCols(["sentence", "ner_chunk", "embeddings"]) \

val pipeline = new Pipeline().setStages(Array(clinical_assertion_ml))

val result =[String].toDS.toDF("text")).transform(data)


The output is a dataframe with a sentence per row and an “assertion” column containing all of the assertion labels in the sentence. The assertion column also contains assertion character indices, and other metadata. To get only the entity chunks and assertion labels, without the metadata, select “ner_chunk.result” and “assertion.result” from your output dataframe.


Model Information

Model Name: assertion_ml_en_2.4.0_2.4
Type: ner
Compatibility: Spark NLP 2.4.0+
Edition: Official
License: Licensed
Input Labels: [sentence, ner_chunk, embeddings]
Output Labels: [assertion]
Language: [en]
Case sensitive: false

Data Source

Trained on 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text with ‘embeddings_clinical’.