Description
This Generic Classifier model is intended for detecting alcohol use in clinical notes and trained by using GenericClassifierApproach annotator. Present:
if the patient was a current consumer of alcohol or the patient was a consumer in the past and had quit. Never:
if the patient had never consumed alcohol. None:
if there was no related text.
Predicted Entities
Present
, Never
, None
Live Demo Open in Colab Copy S3 URI
How to use
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentence_embeddings = BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli", 'en','clinical/models')\
.setInputCols(["document"])\
.setOutputCol("sentence_embeddings")
features_asm = FeaturesAssembler()\
.setInputCols(["sentence_embeddings"])\
.setOutputCol("features")
generic_classifier = GenericClassifierModel.pretrained("genericclassifier_sdoh_alcohol_usage_binary_sbiobert_cased_mli", 'en', 'clinical/models')\
.setInputCols(["features"])\
.setOutputCol("class")
pipeline = Pipeline(stages=[
document_assembler,
sentence_embeddings,
features_asm,
generic_classifier
])
text_list = ["Retired schoolteacher, now substitutes. Lives with wife in location 1439. Has a 27 yo son and a 25 yo daughter. He uses alcohol and cigarettes",
"Employee in neuro departmentin at the Center Hospital 18. Widower since 2001. Current smoker since 20 years. No EtOH or illicits.",
"Patient smoked 4 ppd x 37 years, quitting 22 years ago. He is widowed, lives alone, has three children."]
df = spark.createDataFrame(text_list, StringType()).toDF("text")
result = pipeline.fit(df).transform(df)
result.select("text", "class.result").show(truncate=100)
val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentence_embeddings = BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli", "en", "clinical/models")
.setInputCols("document")
.setOutputCol("sentence_embeddings")
val features_asm = new FeaturesAssembler()
.setInputCols("sentence_embeddings")
.setOutputCol("features")
val generic_classifier = GenericClassifierModel.pretrained("genericclassifier_sdoh_alcohol_usage_binary_sbiobert_cased_mli", "en", "clinical/models")
.setInputCols("features")
.setOutputCol("class")
val pipeline = new PipelineModel().setStages(Array(
document_assembler,
sentence_embeddings,
features_asm,
generic_classifier))
val data = Seq("Retired schoolteacher, now substitutes. Lives with wife in location 1439. Has a 27 yo son and a 25 yo daughter. He uses alcohol and cigarettes.").toDS.toDF("text")
val result = pipeline.fit(data).transform(data)
import nlu
nlu.load("en.classify.generic.sdoh_alchol_binary_sbiobert_cased").predict("""Retired schoolteacher, now substitutes. Lives with wife in location 1439. Has a 27 yo son and a 25 yo daughter. He uses alcohol and cigarettes""")
Results
+----------------------------------------------------------------------------------------------------+---------+
| text| result|
+----------------------------------------------------------------------------------------------------+---------+
|Retired schoolteacher, now substitutes. Lives with wife in location 1439. Has a 27 yo son and a 2...|[Present]|
|Employee in neuro departmentin at the Center Hospital 18. Widower since 2001. Current smoker sinc...| [Never]|
|Patient smoked 4 ppd x 37 years, quitting 22 years ago. He is widowed, lives alone, has three chi...| [None]|
+----------------------------------------------------------------------------------------------------+---------+
Model Information
Model Name: | genericclassifier_sdoh_alcohol_usage_binary_sbiobert_cased_mli |
Compatibility: | Healthcare NLP 4.2.4+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [features] |
Output Labels: | [prediction] |
Language: | en |
Size: | 3.4 MB |
Benchmarking
label precision recall f1-score support
Never 0.85 0.86 0.85 523
None 0.81 0.82 0.81 341
Present 0.88 0.86 0.87 516
accuracy - - 0.85 1380
macro-avg 0.85 0.85 0.85 1380
weighted-avg 0.85 0.85 0.85 1380