Generic Classifier for Oncology

Description

This model is a oncology classification model that determines whether clinical sentences include terms related to oncology.

  • True: Contains oncology related terms.
  • False: Doesn’t contain oncology related terms.

Predicted Entities

True, False

Copy S3 URI

How to use


document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

tokenizer = Tokenizer()\
    .setInputCols("document")\
    .setOutputCol("token")

word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical","en","clinical/models")\
    .setInputCols(["document","token"])\
    .setOutputCol("word_embeddings")

sentence_embeddings = SentenceEmbeddings()\
    .setInputCols(["document", "word_embeddings"])\
    .setOutputCol("sentence_embeddings")\
    .setPoolingStrategy("AVERAGE")

features_asm = FeaturesAssembler()\
    .setInputCols(["sentence_embeddings"])\
    .setOutputCol("features")

generic_classifier = GenericClassifierModel.pretrained("generic_classifier_oncology","en","clinical/models")\
    .setInputCols(["features"])\
    .setOutputCol("prediction")

clf_Pipeline = Pipeline(
  stages=[
    document_assembler,
    tokenizer,
    word_embeddings,
    sentence_embeddings,
    features_asm,
    generic_classifier])

data = spark.createDataFrame([
["The patient was diagnosed with a malignant tumor, and surgery was promptly scheduled to remove the mass."],
["Following this adjustment, the patient's ECG remained in sinus rhythm, with heart rates varying between 45 and 70 bpm and no significant QTc prolongation."],
["During the treatment review, the oncologist discussed the progression of metastases from the primary lesion to nearby lymph nodes."],
["Functional MRI (fMRI) showed increased activation in the motor cortex during the finger-tapping task."]
]).toDF("text")

result = clf_Pipeline.fit(data).transform(data)


val documentAssembler = new DocumentAssembler()
  .setInputCol(Array("text"))
  .setOutputCol("document")

val tokenizer = new Tokenizer()
  .setInputCols(Array("document"))
  .setOutputCol("token")

val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical","en","clinical/models")
  .setInputCols(Array("document","token"))
  .setOutputCol("word_embeddings")

val sentence_embeddings = new SentenceEmbeddings()
  .setInputCols(Array("document", "word_embeddings"))
  .setOutputCol("sentence_embeddings")
  .setPoolingStrategy("AVERAGE")

val features_asm = new FeaturesAssembler()
  .setInputCols(Array("sentence_embeddings"))
  .setOutputCol("features")

val generic_classifier = GenericClassifierModel.pretrained("generic_classifier_oncology","en","clinical/models")
  .setInputCols(Array("features"))
  .setOutputCol("prediction")

val clf_Pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  tokenizer,
  word_embeddings,
  sentence_embeddings,
  features_asm,
  generic_classifier
))

val data = Seq([
["The patient was diagnosed with a malignant tumor, and surgery was promptly scheduled to remove the mass."],
["Following this adjustment, the patient's ECG remained in sinus rhythm, with heart rates varying between 45 and 70 bpm and no significant QTc prolongation."],
["During the treatment review, the oncologist discussed the progression of metastases from the primary lesion to nearby lymph nodes."],
["Functional MRI (fMRI) showed increased activation in the motor cortex during the finger-tapping task."]
]).toDF("text")

val result = clf_Pipeline.fit(data).transform(data)

Results

+----------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
|text                                                                                                                                                      |result |
+----------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
|The patient was diagnosed with a malignant tumor, and surgery was promptly scheduled to remove the mass.                                                  | True  |
|Following this adjustment, the patient's ECG remained in sinus rhythm, with heart rates varying between 45 and 70 bpm and no significant QTc prolongation.| False |
|During the treatment review, the oncologist discussed the progression of metastases from the primary lesion to nearby lymph nodes.                        | True  |
|Functional MRI (fMRI) showed increased activation in the motor cortex during the finger-tapping task.                                                     | False |
+----------------------------------------------------------------------------------------------------------------------------------------------------------+-------+

Model Information

Model Name: generic_classifier_oncology
Compatibility: Healthcare NLP 5.4.0+
License: Licensed
Edition: Official
Input Labels: [features]
Output Labels: [prediction]
Language: en
Size: 1.5 MB

Benchmarking

       label  precision    recall  f1-score   support
       False       0.90      0.86      0.88      2093
        True       0.89      0.93      0.91      2714
    accuracy          -         -      0.90      4807
   macro-avg       0.90      0.89      0.89      4807
weighted-avg       0.90      0.90      0.90      4807