Description
This model is a PHS-BERT based classifier that can classify public health mentions in social media text. Mentions are classified into three labels about personal health situation, figurative mention and other mentions. More detailed information about classes as follows:
health_mention
: The text contains a health mention that specifically indicating someone’s health situation. This means someone has a certain disease or symptoms including death. e.g.; My PCR test is positive. I have a severe joint pain, mucsle pain and headache right now.
other_mention
: The text contains a health mention; however does not states a spesific person’s situation. General health mentions like informative mentions, discussion about disease etc. e.g.; Aluminum is a light metal that causes dementia and Alzheimer’s disease.
figurative_mention
: The text mention specific disease or symptom but it is used metaphorically, does not contain health-related information. e.g.; I don’t wanna fall in love. If I ever did that, I think I’d have a heart attack.
Predicted Entities
figurative_mention
, other_mention
, health_mention
How to use
document_assembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("sentence")
tokenizer = Tokenizer() \
.setInputCols(["sentence"]) \
.setOutputCol("token")
bert_embeddings = BertEmbeddings.pretrained("bert_embeddings_phs_bert", "en", "public/models")\
.setInputCols(["sentence", "token"])\
.setOutputCol("embeddings")\
embeddingsSentence = SentenceEmbeddings() \
.setInputCols(["sentence", "embeddings"]) \
.setOutputCol("sentence_embeddings") \
.setPoolingStrategy("AVERAGE")
classifierdl = ClassifierDLModel.pretrained('classifierdl_health_mentions', 'en', 'clinical/models')\
.setInputCols(['sentence', 'token', 'sentence_embeddings'])\
.setOutputCol('class')
clf_pipeline = Pipeline(
stages = [
document_assembler,
tokenizer,
bert_embeddings,
embeddingsSentence,
classifierdl
])
data = spark.createDataFrame([["I feel a bit drowsy & have a little blurred vision after taking an insulin."]]).toDF("text")
result = clf_pipeline.fit(data).transform(data)
result.select("text", "class.result").show(truncate=False)
val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("sentence")
val tokenizer = new Tokenizer()
.setInputCols("sentence")
.setOutputCol("token")
val embeddings = BertEmbeddings.pretrained("biobert_pubmed_base_cased", "en")
.setInputCols(Array("sentence", "token"))
.setOutputCol("word_embeddings")
val sentence_embeddings = SentenceEmbeddings()
.setInputCols(Array("sentence", "word_embeddings"))
.setOutputCol("sentence_embeddings")
.setPoolingStrategy("AVERAGE")
val classifier = ClassifierDLModel.pretrained("classifierdl_health_mentions", "en", "clinical/models")
.setInputCols(Array("sentence", "token", "sentence_embeddings"))
.setOutputCol("class")
val pipeline = new Pipeline().setStages(Array(document_assembler, tokenizer, embeddings, sentence_embeddings, classifier))
val data = Seq("I feel a bit drowsy & have a little blurred vision after taking an insulin.").toDF("text")
val result = pipeline.fit(data).transform(data)
import nlu
nlu.load("en.classify.health").predict("""I feel a bit drowsy & have a little blurred vision after taking an insulin.""")
Results
+---------------------------------------------------------------------------+----------------+
|text |class |
+---------------------------------------------------------------------------+----------------+
|I feel a bit drowsy & have a little blurred vision after taking an insulin.|[health_mention]|
+---------------------------------------------------------------------------+----------------+
Model Information
Model Name: | classifierdl_health_mentions |
Compatibility: | Healthcare NLP 4.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence_embeddings] |
Output Labels: | [class] |
Language: | en |
Size: | 24.1 MB |
References
Curated from several academic and in-house datasets.
Benchmarking
precision recall f1-score support
health_mention 0.77 0.83 0.80 1375
other_mention 0.84 0.81 0.83 2102
figurative_mention 0.79 0.78 0.79 1412
accuracy - - 0.81 4889
macro-avg 0.80 0.81 0.80 4889
weighted-avg 0.81 0.81 0.81 4889