Description
The Fraitly classifier employs MedicalBertForSequenceClassification embeddings within a robust classifier architecture. Trained on a diverse dataset, this model provides accurate label assignments and confidence scores for its predictions. The primary goal of this model is to categorize text into two key labels: Frailty_Vulnerability
and No_Or_Unknown
.
-
Frailty_Vulnerability
: This category includes statements that highlight concerns, signs or symptoms associated with frailty and/or vulnerability conditions. -
No_Or_Unknown
: This category encompasses statements that either do not present any identifiable concerns related to frailty/vulnerability or where the presence or extent of frailty/ vulnerability is indeterminate.
Predicted Entities
Frailty_Vulnerability
, No_Or_Unknown
Live Demo Open in Colab Copy S3 URI
How to use
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
tokenizer = Tokenizer()\
.setInputCols(["document"])\
.setOutputCol("token")
sequenceClassifier = MedicalBertForSequenceClassification.pretrained("bert_sequence_classifier_sdoh_frailty_vulnerability", "en", "clinical/models")\
.setInputCols(["document","token"])\
.setOutputCol("prediction")
pipeline = Pipeline(
stages=[
document_assembler,
tokenizer,
sequenceClassifier
])
sample_texts = [
["Patient B is a 40-year-old female who was diagnosed with breast cancer. She has received a treatment plan that includes surgery, chemotherapy, and radiation therapy."],
["Post-chemotherapy, the patient was under regular surveillance for osteosarcoma. Recent imaging showed no signs of local recurrence or distant metastasis. Whereas the recovery was challenging, current evaluation confirms patient is in remission."],
["The patient was diagnosed with stage II colon cancer and will be undergoing a treatment regimen that includes both chemotherapy and radiation therapy."],
["Thyroid nodules detected during routine examination; fine-needle aspiration was conducted. Cytology results indicated no malignancy, consistent with a benign thyroid adenoma. However, patient is advised for a follow-up ultrasound in 12 months to monitor nodule size."],
["The patient's persistent lymphadenopathy led to further tests, which confirmed a diagnosis of AIDS."],
["Female patient presented with pelvic discomfort. Ovarian cysts were found during ultrasound; however, CA-125 levels are within normal range, and repeat imaging has shown consistent cyst size. No features of ovarian cancer were present, and a follow-up is scheduled in six months."]
]
sample_data = spark.createDataFrame(sample_texts).toDF("text")
result = pipeline.fit(sample_data).transform(sample_data)
result.select("text", "prediction.result").show(truncate=100)
val documenter = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val tokenizer = new Tokenizer()
.setInputCols("document")
.setOutputCol("token")
val sequenceClassifier = MedicalBertForSequenceClassification.pretrained("bert_sequence_classifier_sdoh_frailty_vulnerability", "en", "clinical/models")
.setInputCols(Array("document","token"))
.setOutputCol("prediction")
val pipeline = new Pipeline().setStages(Array(documenter, tokenizer, sequenceClassifier))
val data = Seq(Array("Patient B is a 40-year-old female who was diagnosed with breast cancer. She has received a treatment plan that includes surgery, chemotherapy, and radiation therapy.",
"Post-chemotherapy, the patient was under regular surveillance for osteosarcoma. Recent imaging showed no signs of local recurrence or distant metastasis. Whereas the recovery was challenging, current evaluation confirms patient is in remission.",
"The patient was diagnosed with stage II colon cancer and will be undergoing a treatment regimen that includes both chemotherapy and radiation therapy.",
"Thyroid nodules detected during routine examination; fine-needle aspiration was conducted. Cytology results indicated no malignancy, consistent with a benign thyroid adenoma. However, patient is advised for a follow-up ultrasound in 12 months to monitor nodule size.",
"The patient's persistent lymphadenopathy led to further tests, which confirmed a diagnosis of AIDS.",
"Female patient presented with pelvic discomfort. Ovarian cysts were found during ultrasound; however, CA-125 levels are within normal range, and repeat imaging has shown consistent cyst size. No features of ovarian cancer were present, and a follow-up is scheduled in six months."
)).toDF("text")
val result = pipeline.fit(data).transform(data)
Results
+----------------------------------------------------------------------------------------------------+-----------------------+
| text| result|
+----------------------------------------------------------------------------------------------------+-----------------------+
|Patient B is a 40-year-old female who was diagnosed with breast cancer. She has received a treatm...|[Frailty_Vulnerability]|
|Post-chemotherapy, the patient was under regular surveillance for osteosarcoma. Recent imaging sh...| [No_Or_Unknown]|
|The patient was diagnosed with stage II colon cancer and will be undergoing a treatment regimen t...|[Frailty_Vulnerability]|
|Thyroid nodules detected during routine examination; fine-needle aspiration was conducted. Cytolo...| [No_Or_Unknown]|
| The patient's persistent lymphadenopathy led to further tests, which confirmed a diagnosis of AIDS.|[Frailty_Vulnerability]|
|Female patient presented with pelvic discomfort. Ovarian cysts were found during ultrasound; howe...| [No_Or_Unknown]|
+----------------------------------------------------------------------------------------------------+-----------------------+
Model Information
Model Name: | bert_sequence_classifier_sdoh_frailty_vulnerability |
Compatibility: | Healthcare NLP 5.1.4+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [document, token] |
Output Labels: | [prediction] |
Language: | en |
Size: | 406.4 MB |
Case sensitive: | false |
Max sentence length: | 512 |
References
Trained with the in-house dataset
Benchmarking
label precision recall f1-score support
Frailty_Vulnerability 0.982014 0.971530 0.976744 281
No_Or_Unknown 0.960976 0.975248 0.968059 202
accuracy - - 0.973085 483
macro-avg 0.971495 0.973389 0.972402 483
weighted-avg 0.973216 0.973085 0.973112 483