Generic Classifier for for Metastasis (LogReg)

Description

This model is trained with the Generic Classifier annotator and the Logistic Regression algorithm and classifies text/sentence into two categories.

  • True: Contains metastasis related terms.
  • False: Doesn’t contain metastasis related terms.

Predicted Entities

True, False

Copy S3 URI

How to use


document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

tokenizer = Tokenizer()\
    .setInputCols("document")\
    .setOutputCol("token")

word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical","en","clinical/models")\
    .setInputCols(["document","token"])\
    .setOutputCol("word_embeddings")

sentence_embeddings = SentenceEmbeddings()\
    .setInputCols(["document", "word_embeddings"])\
    .setOutputCol("sentence_embeddings")\
    .setPoolingStrategy("AVERAGE")

features_asm = FeaturesAssembler()\
    .setInputCols(["sentence_embeddings"])\
    .setOutputCol("features")

generic_classifier = GenericLogRegClassifierModel.pretrained("generic_logreg_classifier_metastasis","en","clinical/models")\
    .setInputCols(["features"])\
    .setOutputCol("prediction")

clf_Pipeline = Pipeline(
  stages=[
    document_assembler,
    tokenizer,
    word_embeddings,
    sentence_embeddings,
    features_asm,
    generic_classifier])

data = spark.createDataFrame([['A 62-year-old male presents with weight loss, persistent cough, and episodes of hemoptysis.'],
['The primary tumor (T) is staged as T3 due to its size and local invasion, there is no nodal involvement (N0), and due to multiple bone and liver lesions, it is classified as M1, reflecting distant metastatic foci.'],
 ['After all procedures done and reviewing the findings, biochemical results and screening, the TNM classification is determined.'],
['The oncologist noted that the tumor had spread to the liver, indicating advanced stage cancer.']]).toDF("text")

result = clf_Pipeline.fit(data).transform(data)


val documentAssembler = new DocumentAssembler()
  .setInputCol(Array("text"))
  .setOutputCol("document")

val tokenizer = new Tokenizer()
  .setInputCols(Array("document"))
  .setOutputCol("token")

val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical","en","clinical/models")
  .setInputCols(Array("document","token"))
  .setOutputCol("word_embeddings")

val sentence_embeddings = new SentenceEmbeddings()
  .setInputCols(Array("document", "word_embeddings"))
  .setOutputCol("sentence_embeddings")
  .setPoolingStrategy("AVERAGE")

val features_asm = new FeaturesAssembler()
  .setInputCols(Array("sentence_embeddings"))
  .setOutputCol("features")

val generic_classifier = GenericLogRegClassifierModel.pretrained("generic_logreg_classifier_metastasis","en","clinical/models")
  .setInputCols(Array("features"))
  .setOutputCol("prediction")

val clf_Pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  tokenizer,
  word_embeddings,
  sentence_embeddings,
  features_asm,
  generic_classifier
))


val data = Seq([['A 62-year-old male presents with weight loss, persistent cough, and episodes of hemoptysis.'],
['The primary tumor (T) is staged as T3 due to its size and local invasion, there is no nodal involvement (N0), and due to multiple bone and liver lesions, it is classified as M1, reflecting distant metastatic foci.'],
 ['After all procedures done and reviewing the findings, biochemical results and screening, the TNM classification is determined.'],
 ['The oncologist noted that the tumor had spread to the liver, indicating advanced stage cancer.']]).toDF("text")

val result = clf_Pipeline.fit(data).transform(data)

Results


+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
|text                                                                                                                                                                                                                 |result |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
|A 62-year-old male presents with weight loss, persistent cough, and episodes of hemoptysis.                                                                                                                          | False |
|The primary tumor (T) is staged as T3 due to its size and local invasion, there is no nodal involvement (N0), and due to multiple bone and liver lesions, it is classified as M1, reflecting distant metastatic foci.| True  |
|After all procedures done and reviewing the findings, biochemical results and screening, the TNM classification is determined.                                                                                       | False |
|The oncologist noted that the tumor had spread to the liver, indicating advanced stage cancer.                                                                                                                       | True  |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+

Model Information

Model Name: generic_logreg_classifier_metastasis
Compatibility: Healthcare NLP 5.4.0+
License: Licensed
Edition: Official
Input Labels: [feature_vector]
Output Labels: [prediction]
Language: en
Size: 14.1 KB

Benchmarking

       label  precision    recall  f1-score   support
       False       0.97      0.98      0.98      4365
        True       0.91      0.88      0.90      1094
    accuracy          -         -      0.96      5459
   macro-avg       0.94      0.93      0.94      5459
weighted-avg       0.96      0.96      0.96      5459