Description
This pipeline is designed to extract all entities mappable to ICD-O codes.
2 NER models and a Text Matcher are used to achieve those tasks.
How to use
from sparknlp.pretrained import PretrainedPipeline
ner_pipeline = PretrainedPipeline("ner_icdo_pipeline", "en", "clinical/models")
result = ner_pipeline.annotate("""
TRAF6 is a putative oncogene in a variety of cancers including
bladder cancer and skin cancer . WWP2 appears to regulate the expression of the well characterized
tumor suppressor phosphatase and tensin homolog (PTEN) in endometrial cancer and squamous cell carcinoma.
""")
import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline
val ner_pipeline = PretrainedPipeline("ner_icdo_pipeline", "en", "clinical/models")
val result = ner_pipeline.annotate("""
TRAF6 is a putative oncogene in a variety of cancers including
bladder cancer and skin cancer . WWP2 appears to regulate the expression of the well characterized
tumor suppressor phosphatase and tensin homolog (PTEN) in endometrial cancer and squamous cell carcinoma.
""")
Results
| | chunks | begin | end | entities |
|---:|:------------------------|--------:|------:|:------------|
| 0 | cancers | 47 | 53 | Cancer_dx |
| 1 | bladder cancer | 66 | 79 | Cancer_dx |
| 2 | skin cancer | 85 | 95 | Cancer_dx |
| 3 | tumor | 165 | 169 | Oncological |
| 4 | endometrial cancer | 224 | 241 | Cancer_dx |
| 5 | squamous cell carcinoma | 247 | 269 | Cancer_dx |
Model Information
Model Name: | ner_icdo_pipeline |
Type: | pipeline |
Compatibility: | Healthcare NLP 6.0.2+ |
License: | Licensed |
Edition: | Official |
Language: | en |
Size: | 1.7 GB |
Included Models
- DocumentAssembler
- SentenceDetectorDLModel
- TokenizerModel
- WordEmbeddingsModel
- MedicalNerModel
- NerConverterInternalModel
- MedicalNerModel
- NerConverterInternalModel
- TextMatcherInternalModel
- ChunkMergeModel