Classifier for Metastasis

Description

This model is a metastasis classification model that determines whether clinical sentences include terms related to metastasis.

  • True: Contains metastasis related terms.
  • False: Doesn’t contain metastasis related terms.

Predicted Entities

True, False

Copy S3 URI

How to use


document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

tokenizer = Tokenizer()\
    .setInputCols("document")\
    .setOutputCol("token")

word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical","en","clinical/models")\
    .setInputCols(["document","token"])\
    .setOutputCol("word_embeddings")

sentence_embeddings = SentenceEmbeddings()\
    .setInputCols(["document", "word_embeddings"])\
    .setOutputCol("sentence_embeddings")\
    .setPoolingStrategy("AVERAGE")

classifier_dl = ClassifierDLModel.pretrained("classifierdl_metastasis","en","clinical/models")\
    .setInputCols(["sentence_embeddings"])\
    .setOutputCol("prediction")

clf_Pipeline = Pipeline(
  stages=[
    document_assembler, 
    tokenizer,
    word_embeddings,
    sentence_embeddings,
    classifier_dl])

data = spark.createDataFrame([['A 62-year-old male presents with weight loss, persistent cough, and episodes of hemoptysis.'],
 ['The primary tumor (T) is staged as T3 due to its size and local invasion, there is no nodal involvement (N0), and due to multiple bone and liver lesions, it is classified as M1, reflecting distant metastatic foci.'],
 ['After all procedures done and reviewing the findings, biochemical results and screening, the TNM classification is determined.'],
['The oncologist noted that the tumor had spread to the liver, indicating advanced stage cancer.']]).toDF("text")

result = clf_Pipeline.fit(data).transform(data)


val documentAssembler = new DocumentAssembler()
  .setInputCol(Array("text"))
  .setOutputCol("document")

val tokenizer = new Tokenizer()
  .setInputCols(Array("document"))
  .setOutputCol("token")

val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical","en","clinical/models")
  .setInputCols(Array("document","token"))
  .setOutputCol("word_embeddings")

val sentence_embeddings = new SentenceEmbeddings()
  .setInputCols(Array("document", "word_embeddings"))
  .setOutputCol("sentence_embeddings")
  .setPoolingStrategy("AVERAGE")

val classifier_dl = ClassifierDLModel.pretrained("classifierdl_metastasis","en","clinical/models")
  .setInputCols(Array("sentence_embeddings"))
  .setOutputCol("prediction")

val clf_Pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  tokenizer,
  word_embeddings,
  sentence_embeddings,
  classifier_dl
))


val data = Seq([['A 62-year-old male presents with weight loss, persistent cough, and episodes of hemoptysis.'],
 ['The primary tumor (T) is staged as T3 due to its size and local invasion, there is no nodal involvement (N0), and due to multiple bone and liver lesions, it is classified as M1, reflecting distant metastatic foci.'],
 ['After all procedures done and reviewing the findings, biochemical results and screening, the TNM classification is determined.'],
 ['The oncologist noted that the tumor had spread to the liver, indicating advanced stage cancer.']]).toDF("text")

val result = clf_Pipeline.fit(data).transform(data)

Results


+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
|text                                                                                                                                                                                                                 |result |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
|A 62-year-old male presents with weight loss, persistent cough, and episodes of hemoptysis.                                                                                                                          | False |
|The primary tumor (T) is staged as T3 due to its size and local invasion, there is no nodal involvement (N0), and due to multiple bone and liver lesions, it is classified as M1, reflecting distant metastatic foci.| True  |
|After all procedures done and reviewing the findings, biochemical results and screening, the TNM classification is determined.                                                                                       | False |
|The oncologist noted that the tumor had spread to the liver, indicating advanced stage cancer.                                                                                                                       | True  |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+

Model Information

Model Name: classifierdl_metastasis
Compatibility: Healthcare NLP 5.4.0+
License: Licensed
Edition: Official
Input Labels: [sentence_embeddings]
Output Labels: [class]
Language: en
Size: 21.1 MB

Benchmarking

       label  precision    recall  f1-score   support
       False       0.99      0.99      0.99      4365
        True       0.95      0.94      0.95      1094
    accuracy          -         -      0.98      5459
   macro-avg       0.97      0.96      0.97      5459
weighted-avg       0.98      0.98      0.98      5459