Self Report Age Classifier (BioBERT - Reddit)


This model is a BioBERT based classifier that can classify self-report the exact age into social media forum (Reddit) posts.

Predicted Entities

self_report_age, no_report

Live Demo Open in Colab Copy S3 URI

How to use

document_assembler = DocumentAssembler() \
    .setInputCol('text') \

tokenizer = Tokenizer() \
    .setInputCols(['document']) \

sequenceClassifier = MedicalBertForSequenceClassification.pretrained("bert_sequence_classifier_exact_age_reddit", "en", "clinical/models")\

pipeline = Pipeline(stages=[

data = spark.createDataFrame(["Is it bad for a 19 year old it's been getting worser.",
                              "I was about 10. So not quite as young as you but young."], StringType()).toDF("text")
result ="text", "class.result").show(truncate=False)
val documenter = new DocumentAssembler() 

val tokenizer = new Tokenizer()

val sequenceClassifier = MedicalBertForSequenceClassification.pretrained("bert_sequence_classifier_exact_age_reddit", "en", "clinical/models")

val pipeline = new Pipeline().setStages(Array(documenter, tokenizer, sequenceClassifier))

val data = Seq(Array("Is it bad for a 19 year old it's been getting worser.",
                     "I was about 10. So not quite as young as you but young.")).toDS.toDF("text")

val result =
import nlu
nlu.load("en.classify.exact_age").predict("""I was about 10. So not quite as young as you but young.""")


|text                                                   |result           |
|Is it bad for a 19 year old it's been getting worser.  |[self_report_age]|
|I was about 10. So not quite as young as you but young.|[no_report]      |

Model Information

Model Name: bert_sequence_classifier_exact_age_reddit
Compatibility: Healthcare NLP 4.0.0+
License: Licensed
Edition: Official
Input Labels: [document, token]
Output Labels: [class]
Language: en
Size: 406.5 MB
Case sensitive: true
Max sentence length: 128


The dataset is disease-specific and consists of posts collected via a series of keywords associated with dry eye disease.


          label  precision    recall  f1-score   support
      no_report     0.9324    0.9577    0.9449      1325
self_report_age     0.9124    0.8637    0.8874       675
       accuracy     -         -         0.9260      2000
      macro-avg     0.9224    0.9107    0.9161      2000
   weighted-avg     0.9256    0.9260    0.9255      2000