Self Report Age Classifier (BioBERT)

Description

This model is a BioBERT based classifier that can classify self-report the exact age into social media data.

Predicted Entities

self_report_age, no_report

Live Demo Open in Colab Copy S3 URI

How to use

document_assembler = DocumentAssembler() \
    .setInputCol('text') \
    .setOutputCol('document')

tokenizer = Tokenizer() \
    .setInputCols(['document']) \
    .setOutputCol('token')

sequenceClassifier = MedicalBertForSequenceClassification.pretrained("bert_sequence_classifier_self_reported_age_tweet", "en", "clinical/models")\
    .setInputCols(["document",'token'])\
    .setOutputCol("class")

pipeline = Pipeline(stages=[
    document_assembler, 
    tokenizer,
    sequenceClassifier
])

data = spark.createDataFrame(["Who knew I would spend my Saturday mornings at 21 still watching Disney channel",
                              "My girl, Fancy, just turned 17. She’s getting up there, but she still has the energy of a puppy"], StringType()).toDF("text")
                              
result = pipeline.fit(data).transform(data)

# Checking results
result.select("text", "class.result").show(truncate=False)
val documenter = new DocumentAssembler() 
    .setInputCol("text") 
    .setOutputCol("document")

val tokenizer = new Tokenizer()
    .setInputCols(Array("document"))
    .setOutputCol("token")

val sequenceClassifier = MedicalBertForSequenceClassification.pretrained("bert_sequence_classifier_self_reported_age_tweet", "en", "clinical/models")
    .setInputCols(Array("document","token"))
    .setOutputCol("class")

val pipeline = new Pipeline().setStages(Array(documenter, tokenizer, sequenceClassifier))

val data = Seq(Array("Who knew I would spend my Saturday mornings at 21 still watching Disney channel",
                      "My girl, Fancy, just turned 17. She’s getting up there, but she still has the energy of a puppy")).toDS.toDF("text")

val result = pipeline.fit(data).transform(data)
import nlu
nlu.load("en.classify.self_reported_age").predict("""My girl, Fancy, just turned 17. She’s getting up there, but she still has the energy of a puppy""")

Results

+-----------------------------------------------------------------------------------------------+-----------------+
|text                                                                                           |result           |
+-----------------------------------------------------------------------------------------------+-----------------+
|Who knew I would spend my Saturday mornings at 21 still watching Disney channel                |[self_report_age]|
|My girl, Fancy, just turned 17. She’s getting up there, but she still has the energy of a puppy|[no_report]      |
+-----------------------------------------------------------------------------------------------+-----------------+

Model Information

Model Name: bert_sequence_classifier_self_reported_age_tweet
Compatibility: Healthcare NLP 4.0.0+
License: Licensed
Edition: Official
Input Labels: [document, token]
Output Labels: [class]
Language: en
Size: 406.5 MB
Case sensitive: true
Max sentence length: 128

Benchmarking

          label  precision    recall  f1-score  support
      no_report   0.939016  0.900332  0.919267     1505
self_report_age   0.801849  0.873381  0.836088      695
       accuracy   -         -         0.891818     2200
      macro-avg   0.870433  0.886857  0.877678     2200
   weighted-avg   0.895684  0.891818  0.892990     2200