Cancer Diagnosis Text Matcher

Description

This model extracts cancer diagnoses in clinical notes using a rule-based TextMatcherInternal annotator.

Predicted Entities

Copy S3 URI

How to use

documentAssembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

text_matcher = TextMatcherInternalModel.pretrained("cancer_diagnosis_matcher","en","clinical/models") \
    .setInputCols(["sentence", "token"])\
    .setOutputCol("cancer_dx")\
    .setMergeOverlapping(True)

mathcer_pipeline = Pipeline().setStages([
    documentAssembler,
    sentenceDetector,
    tokenizer,
    text_matcher])

data = spark.createDataFrame([["""A 65-year-old woman had a history of debulking surgery, bilateral oophorectomy with omentectomy,
 total anterior hysterectomy with radical pelvic lymph nodes dissection due to ovarian carcinoma (mucinous-type carcinoma, stage Ic) 1 year ago.
 The patient's medical compliance was poor and failed to complete her chemotherapy (cyclophosphamide 750 mg/m2, carboplatin 300 mg/m2). Recently, she noted a palpable right breast mass, 15 cm in size which nearly occupied the whole right breast in 2 months. Core needle biopsy revealed metaplastic carcinoma. Neoadjuvant chemotherapy with the regimens of Taxotere (75 mg/m2), Epirubicin (75 mg/m2), and Cyclophosphamide (500 mg/m2) was given for 6 cycles with poor response, followed by a modified radical mastectomy (MRM) with dissection of axillary lymph nodes and skin grafting. Postoperatively, radiotherapy was done with 5000 cGy in 25 fractions. The histopathologic examination revealed a metaplastic carcinoma with squamous differentiation associated with adenomyoepithelioma. Immunohistochemistry study showed that the tumor cells are positive for epithelial markers-cytokeratin (AE1/AE3) stain, and myoepithelial markers, including cytokeratin 5/6 (CK 5/6), p63, and S100 stains.
 Expressions of hormone receptors, including ER, PR, and Her-2/Neu, were all negative."""]]).toDF("text")

result = mathcer_pipeline.fit(data).transform(data)
val documentAssembler = new DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("document")

val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")
    .setInputCols(Array("document"))
    .setOutputCol("sentence")

val tokenizer = new Tokenizer()
    .setInputCols(Array("sentence"))
    .setOutputCol("token")

val text_matcher = TextMatcherInternalModel.pretrained("cancer_diagnosis_matcher","en","clinical/models")
    .setInputCols(Array("sentence","token"))
    .setOutputCol("cancer_dx")
    .setMergeOverlapping(true)

val mathcer_pipeline = new Pipeline()
    .setStages(Array(
    documentAssembler,
    sentenceDetector,
    tokenizer,
    text_matcher))

val data = Seq("""A 65-year-old woman had a history of debulking surgery, bilateral oophorectomy with omentectomy,
 total anterior hysterectomy with radical pelvic lymph nodes dissection due to ovarian carcinoma (mucinous-type carcinoma, stage Ic) 1 year ago.
 The patient's medical compliance was poor and failed to complete her chemotherapy (cyclophosphamide 750 mg/m2, carboplatin 300 mg/m2). Recently, she noted a palpable right breast mass, 15 cm in size which nearly occupied the whole right breast in 2 months. Core needle biopsy revealed metaplastic carcinoma. Neoadjuvant chemotherapy with the regimens of Taxotere (75 mg/m2), Epirubicin (75 mg/m2), and Cyclophosphamide (500 mg/m2) was given for 6 cycles with poor response, followed by a modified radical mastectomy (MRM) with dissection of axillary lymph nodes and skin grafting. Postoperatively, radiotherapy was done with 5000 cGy in 25 fractions. The histopathologic examination revealed a metaplastic carcinoma with squamous differentiation associated with adenomyoepithelioma. Immunohistochemistry study showed that the tumor cells are positive for epithelial markers-cytokeratin (AE1/AE3) stain, and myoepithelial markers, including cytokeratin 5/6 (CK 5/6), p63, and S100 stains.
 Expressions of hormone receptors, including ER, PR, and Her-2/Neu, were all negative.""") .toDF("text")

val result = mathcer_pipeline.fit(data).transform(data)

Results

+-----------------------+-----+----+---------+
|                  chunk|begin| end|    label|
+-----------------------+-----+----+---------+
|      ovarian carcinoma|  176| 192|Cancer_dx|
|mucinous-type carcinoma|  195| 217|Cancer_dx|
|  metaplastic carcinoma|  528| 548|Cancer_dx|
|  metaplastic carcinoma|  937| 957|Cancer_dx|
|    adenomyoepithelioma| 1005|1023|Cancer_dx|
+-----------------------+-----+----+---------+

Model Information

Model Name: cancer_diagnosis_matcher
Compatibility: Healthcare NLP 5.3.3+
License: Licensed
Edition: Official
Input Labels: [sentence, token]
Output Labels: [cancer_name]
Language: en
Size: 43.8 KB
Case sensitive: false