SDOH Tobacco Usage For Classification

Description

This Generic Classifier model is intended for detecting tobacco use in clinical notes and trained by using GenericClassifierApproach annotator. Present: if the patient was a current consumer of tobacco. Past: the patient was a consumer in the past and had quit. Never: if the patient had never consumed tobacco. None: if there was no related text.

Predicted Entities

Present, Past, Never, None

Live Demo Open in Colab Copy S3 URI

How to use

document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")
        
sentence_embeddings = BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli", 'en','clinical/models')\
    .setInputCols(["document"])\
    .setOutputCol("sentence_embeddings")

features_asm = FeaturesAssembler()\
    .setInputCols(["sentence_embeddings"])\
    .setOutputCol("features")

generic_classifier = GenericClassifierModel.pretrained("genericclassifier_sdoh_tobacco_usage_sbiobert_cased_mli", 'en', 'clinical/models')\
    .setInputCols(["features"])\
    .setOutputCol("class")

pipeline = Pipeline(stages=[
    document_assembler,
    sentence_embeddings,
    features_asm,
    generic_classifier    
])

text_list = ["Retired schoolteacher, now substitutes. Lives with wife in location 1439. Has a 27 yo son and a 25 yo daughter. He uses alcohol and cigarettes",
             "The patient quit smoking approximately two years ago with an approximately a 40 pack year history, mostly cigar use. He also reports 'heavy alcohol use', quit 15 months ago.",
             "The patient denies any history of smoking or alcohol abuse. She lives with her one daughter.",
             "She was previously employed as a hairdresser, though says she hasnt worked in 4 years. Not reported by patient, but there is apparently a history of alochol abuse."]

df = spark.createDataFrame(text_list, StringType()).toDF("text")

result = pipeline.fit(df).transform(df)

result.select("text", "class.result").show(truncate=100)
val document_assembler = new DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("document")
        
val sentence_embeddings = BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli", "en", "clinical/models")
    .setInputCols("document")
    .setOutputCol("sentence_embeddings")

val features_asm = new FeaturesAssembler()
    .setInputCols("sentence_embeddings")
    .setOutputCol("features")

val generic_classifier = GenericClassifierModel.pretrained("genericclassifier_sdoh_tobacco_usage_sbiobert_cased_mli", "en", "clinical/models")
    .setInputCols("features")
    .setOutputCol("class")

val pipeline = new PipelineModel().setStages(Array(
    document_assembler,
    sentence_embeddings,
    features_asm,
    generic_classifier))

val data = Seq("Retired schoolteacher, now substitutes. Lives with wife in location 1439. Has a 27 yo son and a 25 yo daughter. He uses alcohol and cigarettes.").toDS.toDF("text")

val result = pipeline.fit(data).transform(data)
import nlu
nlu.load("en.classify.generic.sdoh_tobacco_sbiobert_cased").predict("""The patient quit smoking approximately two years ago with an approximately a 40 pack year history, mostly cigar use. He also reports 'heavy alcohol use', quit 15 months ago.""")

Results


+----------------------------------------------------------------------------------------------------+---------+
|                                                                                                text|   result|
+----------------------------------------------------------------------------------------------------+---------+
|Retired schoolteacher, now substitutes. Lives with wife in location 1439. Has a 27 yo son and a 2...|[Present]|
|The patient quit smoking approximately two years ago with an approximately a 40 pack year history...|   [Past]|
|        The patient denies any history of smoking or alcohol abuse. She lives with her one daughter.|  [Never]|
|She was previously employed as a hairdresser, though says she hasnt worked in 4 years. Not report...|   [None]|
+----------------------------------------------------------------------------------------------------+---------+

Model Information

Model Name: genericclassifier_sdoh_tobacco_usage_sbiobert_cased_mli
Compatibility: Healthcare NLP 4.2.4+
License: Licensed
Edition: Official
Input Labels: [features]
Output Labels: [prediction]
Language: en
Size: 3.4 MB

Benchmarking


       label  precision    recall  f1-score   support
       Never       0.89      0.90      0.90       487
        None       0.86      0.78      0.82       269
        Past       0.87      0.79      0.83       415
     Present       0.63      0.82      0.71       203
    accuracy        -         -        0.83      1374
   macro-avg       0.81      0.82      0.81      1374
weighted-avg       0.84      0.83      0.83      1374