Detect Assertion Status from Menopause Entities

Description

This model detects the assertion status of menopause-related entities.

Predicted Entities

Present, Absent, Possible, Past, Hypothetical, Planned, Family, Menarche_Age, Menopause_Age

Copy S3 URI

How to use

document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "en")\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

clinical_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols(["sentence", "token"])\
    .setOutputCol("embeddings")

ner_model = MedicalNerModel.pretrained("ner_menopause_core", "en", "clinical/models")\
    .setInputCols(["sentence", "token","embeddings"])\
    .setOutputCol("ner")

ner_converter = NerConverterInternal()\
    .setInputCols(["sentence", "token", "ner"])\
    .setOutputCol("ner_chunk")

assertion = AssertionDLModel.pretrained("assertion_menopause_wip", "en", "clinical/models")\
    .setInputCols(["sentence", "ner_chunk", "embeddings"]) \
    .setOutputCol("assertion")

pipeline = Pipeline(stages=[
    document_assembler, 
    sentence_detector,
    tokenizer,
    clinical_embeddings,
    ner_model,
    ner_converter,
    assertion
    ])

sample_texts = ["""A 50-year-old woman, G2P1, presents with symptoms of perimenopause including night sweats, irregular menstruation, and fatigue.She has previously been diagnosed with hypertension. She is taking hormone replacement therapy with estradiol and norethindrone acetate. Recent tests included a bone density scan, which confirmed osteoporosis and showed elevated FSH levels. She also underwent a vaginal swab test for routine screening. Her mother has a history of breast cancer. Her menarche age was 11."""]


data = spark.createDataFrame(sample_texts, StringType()).toDF("text")

result = pipeline.fit(data).transform(data)
val document_assembler = new DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("document")

val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "en")
    .setInputCols("document")
    .setOutputCol("sentence")

val tokenizer = new Tokenizer()
    .setInputCols("sentence")
    .setOutputCol("token")

val clinical_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
    .setInputCols(Array("sentence", "token"))
    .setOutputCol("embeddings")

val ner_model = MedicalNerModel.pretrained("ner_menopause_core", "en", "clinical/models")
    .setInputCols(Array("sentence", "token","embeddings"))
    .setOutputCol("ner")

val ner_converter = new NerConverterInternal()
    .setInputCols(Array("sentence", "token", "ner"))
    .setOutputCol("ner_chunk")

val assertion = AssertionDLModel.pretrained("assertion_menopause_wip", "en", "clinical/models")
    .setInputCols(Array("sentence", "ner_chunk", "embeddings"))
    .setOutputCol("assertion")

val pipeline = new Pipeline().setStages(Array(
    document_assembler, 
    sentence_detector,
    tokenizer,
    clinical_embeddings,
    ner_model,
    ner_converter,
    assertion
))

val sample_texts = Seq("""A 50-year-old woman, G2P1, presents with symptoms of perimenopause including night sweats, irregular menstruation, and fatigue.She has previously been diagnosed with hypertension. She is taking hormone replacement therapy with estradiol and norethindrone acetate. Recent tests included a bone density scan, which confirmed osteoporosis and showed elevated FSH levels. She also underwent a vaginal swab test for routine screening. Her mother has a history of breast cancer. Her menarche age was 11.""").toDF("text")


val result = pipeline.fit(sample_texts).transform(sample_texts)

Results

+---------------------------+-----+---+---------------------------+------------+----------+
|chunk                      |begin|end|ner_label                  |assertion   |confidence|
+---------------------------+-----+---+---------------------------+------------+----------+
|G2P1                       |21   |24 |G_P                        |Present     |0.9999    |
|perimenopause              |53   |65 |Perimenopause              |Present     |0.9999    |
|night sweats               |77   |88 |Other_Symptom              |Present     |0.9997    |
|irregular menstruation     |91   |112|Irregular_Menstruation     |Present     |0.9997    |
|fatigue                    |119  |125|Other_Symptom              |Present     |0.9954    |
|hypertension               |166  |177|Hypertension               |Past        |0.9916    |
|hormone replacement therapy|194  |220|Hormone_Replacement_Therapy|Present     |0.9988    |
|estradiol                  |227  |235|Hormone_Replacement_Therapy|Present     |0.9696    |
|norethindrone acetate      |241  |261|Hormone_Replacement_Therapy|Present     |0.9984    |
|osteoporosis               |323  |334|Osteoporosis               |Present     |1.0       |
|elevated                   |347  |354|Test_Result                |Present     |1.0       |
|FSH                        |356  |358|Hormone_Testing            |Present     |0.9999    |
|vaginal swab               |389  |400|Vaginal_Swab               |Present     |1.0       |
|breast cancer              |458  |470|Oncological                |Family      |0.9843    |
|11                         |494  |495|Age                        |Menarche_Age|0.9891    |
+---------------------------+-----+---+---------------------------+------------+----------+

Model Information

Model Name: assertion_menopause_wip
Compatibility: Healthcare NLP 5.3.3+
License: Licensed
Edition: Official
Input Labels: [document, ner_chunk, embeddings]
Output Labels: [assertion_pred]
Language: en
Size: 2.5 MB

References

The datasets employed in training this model were meticulously curated and annotated using our in-house capabilities.

Benchmarking

        label  precision    recall  f1-score   support
       Absent       0.83      0.84      0.84       177
       Family       0.47      0.73      0.57        11
 Hypothetical       0.80      0.71      0.75       413
 Menarche_Age       1.00      1.00      1.00         5
Menopause_Age       0.85      0.94      0.89        18
         Past       0.46      0.41      0.43        44
      Planned       0.80      0.67      0.73         6
     Possible       0.59      0.46      0.52        28
      Present       0.89      0.92      0.90      1558
     accuracy         -         -       0.86      2260
    macro-avg       0.74      0.74      0.74      2260
 weighted-avg       0.86      0.86      0.86      2260