Description
This model extracts entities related to oncology therapies using granular labels, including mentions of treatments, posology information and line of therapy.
Definitions of Predicted Entities:
Cancer_Surgery
: Terms that indicate surgery as a form of cancer treatment.Chemotherapy
: Mentions of chemotherapy drugs, or unspecific words such as “chemotherapy”.Cycle_Count
: The total number of cycles being administered of an oncological therapy (e.g. “5 cycles”).Cycle_Day
: References to the day of the cycle of oncological therapy (e.g. “day 5”).Cycle_Number
: The number of the cycle of an oncological therapy that is being applied (e.g. “third cycle”).Dosage
: The quantity prescribed by the physician for an active ingredient.Duration
: Words indicating the duration of a treatment (e.g. “for 2 weeks”).Frequency
: Words indicating the frequency of treatment administration (e.g. “daily” or “bid”).Hormonal_Therapy
: Mentions of hormonal drugs used to treat cancer, or unspecific words such as “hormonal therapy”.Immunotherapy
: Mentions of immunotherapy drugs, or unspecific words such as “immunotherapy”.Line_Of_Therapy
: Explicit references to the line of therapy of an oncological therapy (e.g. “first-line treatment”).Radiotherapy
: Terms that indicate the use of Radiotherapy.Radiation_Dose
: Dose used in radiotherapy.Response_To_Treatment
: Terms related to clinical progress of the patient related to cancer treatment, including “recurrence”, “bad response” or “improvement”.Route
: Words indicating the type of administration route (such as “PO” or “transdermal”).Targeted_Therapy
: Mentions of targeted therapy drugs, or unspecific words such as “targeted therapy”.Unspecific_Therapy
: Terms that indicate a known cancer therapy but that is not specific to any other therapy entity (e.g. “chemoradiotherapy” or “adjuvant therapy”).
Predicted Entities
Cancer_Surgery
, Chemotherapy
, Cycle_Count
, Cycle_Day
, Cycle_Number
, Dosage
, Duration
, Frequency
, Hormonal_Therapy
, Immunotherapy
, Line_Of_Therapy
, Radiotherapy
, Radiation_Dose
, Response_To_Treatment
, Route
, Targeted_Therapy
, Unspecific_Therapy
Live Demo Open in Colab Copy S3 URI
How to use
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = Tokenizer() \
.setInputCols(["sentence"]) \
.setOutputCol("token")
word_embeddings = WordEmbeddingsModel().pretrained("embeddings_clinical", "en", "clinical/models")\
.setInputCols(["sentence", "token"]) \
.setOutputCol("embeddings")
ner = MedicalNerModel.pretrained("ner_oncology_therapy_wip", "en", "clinical/models") \
.setInputCols(["sentence", "token", "embeddings"]) \
.setOutputCol("ner")
ner_converter = NerConverter() \
.setInputCols(["sentence", "token", "ner"]) \
.setOutputCol("ner_chunk")
pipeline = Pipeline(stages=[document_assembler,
sentence_detector,
tokenizer,
word_embeddings,
ner,
ner_converter])
data = spark.createDataFrame([["The had previously undergone a left mastectomy and an axillary lymph node dissection for a left breast cancer twenty years ago.The tumor was positive for ER and PR. Postoperatively, radiotherapy was administered to her breast.The cancer recurred as a right lung metastasis 13 years later. The patient underwent a regimen consisting of adriamycin (60 mg/m2) and cyclophosphamide (600 mg/m2) over six courses, as first line therapy."]]).toDF("text")
result = pipeline.fit(data).transform(data)
val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")
.setInputCols(Array("document"))
.setOutputCol("sentence")
val tokenizer = new Tokenizer()
.setInputCols(Array("sentence"))
.setOutputCol("token")
val word_embeddings = WordEmbeddingsModel().pretrained("embeddings_clinical", "en", "clinical/models")
.setInputCols(Array("sentence", "token"))
.setOutputCol("embeddings")
val ner = MedicalNerModel.pretrained("ner_oncology_therapy_wip", "en", "clinical/models")
.setInputCols(Array("sentence", "token", "embeddings"))
.setOutputCol("ner")
val ner_converter = new NerConverter()
.setInputCols(Array("sentence", "token", "ner"))
.setOutputCol("ner_chunk")
val pipeline = new Pipeline().setStages(Array(document_assembler,
sentence_detector,
tokenizer,
word_embeddings,
ner,
ner_converter))
val data = Seq("The had previously undergone a left mastectomy and an axillary lymph node dissection for a left breast cancer twenty years ago.
The tumor was positive for ER and PR. Postoperatively, radiotherapy was administered to her breast.
The cancer recurred as a right lung metastasis 13 years later. The patient underwent a regimen consisting of adriamycin (60 mg/m2) and cyclophosphamide (600 mg/m2) over six courses, as first line therapy.").toDS.toDF("text")
val result = pipeline.fit(data).transform(data)
import nlu
nlu.load("en.med_ner.oncology_therapy_wip").predict("""The had previously undergone a left mastectomy and an axillary lymph node dissection for a left breast cancer twenty years ago.The tumor was positive for ER and PR. Postoperatively, radiotherapy was administered to her breast.The cancer recurred as a right lung metastasis 13 years later. The patient underwent a regimen consisting of adriamycin (60 mg/m2) and cyclophosphamide (600 mg/m2) over six courses, as first line therapy.""")
Results
| chunk | ner_label |
|:-------------------------------|:----------------------|
| mastectomy | Cancer_Surgery |
| axillary lymph node dissection | Cancer_Surgery |
| radiotherapy | Radiotherapy |
| recurred | Response_To_Treatment |
| adriamycin | Chemotherapy |
| 60 mg/m2 | Dosage |
| cyclophosphamide | Chemotherapy |
| 600 mg/m2 | Dosage |
| six courses | Cycle_Count |
| first line | Line_Of_Therapy |
Model Information
Model Name: | ner_oncology_therapy_wip |
Compatibility: | Healthcare NLP 4.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Size: | 869.8 KB |
References
In-house annotated oncology case reports.
Benchmarking
label tp fp fn total precision recall f1
Cycle_Number 58.0 18.0 15.0 73.0 0.76 0.79 0.78
Response_To_Treatment 249.0 80.0 180.0 429.0 0.76 0.58 0.66
Cycle_Count 151.0 48.0 24.0 175.0 0.76 0.86 0.81
Unspecific_Therapy 167.0 88.0 67.0 234.0 0.65 0.71 0.68
Chemotherapy 535.0 30.0 83.0 618.0 0.95 0.87 0.90
Targeted_Therapy 144.0 9.0 35.0 179.0 0.94 0.80 0.87
Radiotherapy 188.0 17.0 34.0 222.0 0.92 0.85 0.88
Cancer_Surgery 526.0 60.0 119.0 645.0 0.90 0.82 0.85
Line_Of_Therapy 73.0 14.0 14.0 87.0 0.84 0.84 0.84
Hormonal_Therapy 95.0 1.0 21.0 116.0 0.99 0.82 0.90
Immunotherapy 90.0 58.0 21.0 111.0 0.61 0.81 0.69
Cycle_Day 149.0 33.0 34.0 183.0 0.82 0.81 0.82
Frequency 287.0 35.0 62.0 349.0 0.89 0.82 0.86
Route 82.0 17.0 15.0 97.0 0.83 0.85 0.84
Duration 399.0 95.0 148.0 547.0 0.81 0.73 0.77
Dosage 718.0 38.0 109.0 827.0 0.95 0.87 0.91
Radiation_Dose 84.0 15.0 12.0 96.0 0.85 0.88 0.86
macro_avg 3995.0 656.0 993.0 4988.0 0.84 0.81 0.82
micro_avg NaN NaN NaN NaN 0.86 0.80 0.83