Bert For Sequence Classification (Metastasis)

Description

This model is a BioBERT based metastasis classification model that can determine whether the clinical sentences include terms related to metastasis or not.

  • 1: Contains metastasis related terms.
  • 0: Doesn’t contain metastasis related terms.

Predicted Entities

True, False

Copy S3 URI

How to use


document_assembler = DocumentAssembler()\
    .setInputCol('text')\
    .setOutputCol('document')

sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = Tokenizer()\
    .setInputCols(['sentence'])\
    .setOutputCol('token')

sequenceClassifier = MedicalBertForSequenceClassification.pretrained("bert_sequence_classifier_metastasis","en","clinical/models")\
    .setInputCols(["sentence",'token'])\
    .setOutputCol("prediction")

pipeline = Pipeline(stages=[
    document_assembler,
    sentence_detector,
    tokenizer,
    sequenceClassifier
])

sample_texts = [
                ["Contrast MRI confirmed the findings of meningeal carcinomatosis."],
                ["A 62-year-old male presents with weight loss, persistent cough, and episodes of hemoptysis."],
                ["The primary tumor (T) is staged as T3 due to its size and local invasion, there is no nodal involvement (N0), and due to multiple bone and liver lesions, it is classified as M1, reflecting distant metastatic foci."] ,
                ["After all procedures done and reviewing the findings, biochemical results and screening, the TNM classification is determined."],
                ["The oncologist noted that the tumor had spread to the liver, indicating advanced stage cancer."],
                ["The patient's care plan is adjusted to focus on symptom management and slowing the progression of the disease."],
                ]

sample_data = spark.createDataFrame(sample_texts).toDF("text")

result = pipeline.fit(sample_data).transform(sample_data)

result.select("text", "prediction.result").show(truncate=False)


val documentAssembler = new DocumentAssembler()
  .setInputCol(Array("text"))
  .setOutputCol("document")

val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")
  .setInputCols(Array("document"))
  .setOutputCol("sentence")

val tokenizer = new Tokenizer()
  .setInputCols(Array("sentence"))
  .setOutputCol("token")

val sequenceClassifier = MedicalBertForSequenceClassification.pretrained("bert_sequence_classifier_metastasis","en","clinical/models")
  .setInputCols(Array("sentence", "token"))
  .setOutputCol("prediction")

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  sentenceDetector,
  tokenizer,
  sequenceClassifier
))


val data = Seq(Array("Contrast MRI confirmed the findings of meningeal carcinomatosis.",
                     "A 62-year-old male presents with weight loss, persistent cough, and episodes of hemoptysis.",
                     "The primary tumor (T) is staged as T3 due to its size and local invasion, there is no nodal involvement (N0), and due to multiple bone and liver lesions, it is classified as M1, reflecting distant metastatic foci." ,
                     "After all procedures done and reviewing the findings, biochemical results and screening, the TNM classification is determined.",
                     "The oncologist noted that the tumor had spread to the liver, indicating advanced stage cancer.",
                     "The patient's care plan is adjusted to focus on symptom management and slowing the progression of the disease."
                    )).toDF("text")

val result = pipeline.fit(data).transform(data)

Results


+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+
|text                                                                                                                                                                                                                 |result|
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+
|Contrast MRI confirmed the findings of meningeal carcinomatosis.                                                                                                                                                     |[1]   |
|A 62-year-old male presents with weight loss, persistent cough, and episodes of hemoptysis.                                                                                                                          |[0]   |
|The primary tumor (T) is staged as T3 due to its size and local invasion, there is no nodal involvement (N0), and due to multiple bone and liver lesions, it is classified as M1, reflecting distant metastatic foci.|[1]   |
|After all procedures done and reviewing the findings, biochemical results and screening, the TNM classification is determined.                                                                                       |[0]   |
|The oncologist noted that the tumor had spread to the liver, indicating advanced stage cancer.                                                                                                                       |[1]   |
|The patient's care plan is adjusted to focus on symptom management and slowing the progression of the disease.                                                                                                       |[0]   |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+

Model Information

Model Name: bert_sequence_classifier_metastasis
Compatibility: Healthcare NLP 5.4.0+
License: Licensed
Edition: Official
Input Labels: [document, token]
Output Labels: [prediction]
Language: en
Size: 406.4 MB
Case sensitive: false
Max sentence length: 512

Benchmarking

       label  precision    recall  f1-score   support
           0     0.9979    0.9986    0.9983      4357
           1     0.9944    0.9916    0.9930      1072
    accuracy        -         -      0.9972      5429
   macro-avg     0.9962    0.9951    0.9956      5429
weighted-avg     0.9972    0.9972    0.9972      5429