Extract Mentions of Response to Cancer Treatment

Description

This model extracts entities related to the patient”s response to the oncology treatment, including clinical response and changes in tumor size.

Definitions of Predicted Entities:

  • Line_Of_Therapy: Explicit references to the line of therapy of an oncological therapy (e.g. “first-line treatment”).
  • Response_To_Treatment: Terms related to clinical progress of the patient related to cancer treatment, including “recurrence”, “bad response” or “improvement”.
  • Size_Trend: Terms related to the changes in the size of the tumor (such as “growth” or “reduced in size”).

Predicted Entities

Line_Of_Therapy, Response_To_Treatment, Size_Trend

Live Demo Open in Colab Copy S3 URI

How to use

document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = Tokenizer() \
    .setInputCols(["sentence"]) \
    .setOutputCol("token")

word_embeddings = WordEmbeddingsModel().pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols(["sentence", "token"]) \
    .setOutputCol("embeddings")                

ner = MedicalNerModel.pretrained("ner_oncology_response_to_treatment_wip", "en", "clinical/models") \
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("ner")

ner_converter = NerConverter() \
    .setInputCols(["sentence", "token", "ner"]) \
    .setOutputCol("ner_chunk")
pipeline = Pipeline(stages=[document_assembler,
                            sentence_detector,
                            tokenizer,
                            word_embeddings,
                            ner,
                            ner_converter])

data = spark.createDataFrame([["She completed her first-line therapy, but some months later there was recurrence of the breast cancer. "]]).toDF("text")

result = pipeline.fit(data).transform(data)
val document_assembler = new DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("document")
    
val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")
    .setInputCols(Array("document"))
    .setOutputCol("sentence")
    
val tokenizer = new Tokenizer()
    .setInputCols(Array("sentence"))
    .setOutputCol("token")
    
val word_embeddings = WordEmbeddingsModel().pretrained("embeddings_clinical", "en", "clinical/models")
    .setInputCols(Array("sentence", "token"))
    .setOutputCol("embeddings")                
    
val ner = MedicalNerModel.pretrained("ner_oncology_response_to_treatment_wip", "en", "clinical/models")
    .setInputCols(Array("sentence", "token", "embeddings"))
    .setOutputCol("ner")
    
val ner_converter = new NerConverter()
    .setInputCols(Array("sentence", "token", "ner"))
    .setOutputCol("ner_chunk")

        
val pipeline = new Pipeline().setStages(Array(document_assembler,
                            sentence_detector,
                            tokenizer,
                            word_embeddings,
                            ner,
                            ner_converter))    

val data = Seq("She completed her first-line therapy, but some months later there was recurrence of the breast cancer. ").toDS.toDF("text")

val result = pipeline.fit(data).transform(data)
import nlu
nlu.load("en.med_ner.oncology_response_to_treatment_wip").predict("""She completed her first-line therapy, but some months later there was recurrence of the breast cancer. """)

Results

| chunk      | ner_label             |
|:-----------|:----------------------|
| first-line | Line_Of_Therapy       |
| recurrence | Response_To_Treatment |

Model Information

Model Name: ner_oncology_response_to_treatment_wip
Compatibility: Healthcare NLP 4.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en
Size: 848.8 KB

References

In-house annotated oncology case reports.

Benchmarking

                label    tp    fp    fn  total  precision  recall   f1
Response_To_Treatment 233.0  81.0 120.0  353.0       0.74    0.66 0.70
           Size_Trend  31.0  34.0  45.0   76.0       0.48    0.41 0.44
      Line_Of_Therapy  82.0  11.0   5.0   87.0       0.88    0.94 0.91
            macro_avg 346.0 126.0 170.0  516.0       0.70    0.67 0.68
            micro_avg   NaN   NaN   NaN    NaN       0.73    0.67 0.70