Description
Entity Resolution model Based on KNN using Word Embeddings + Word Movers Distance
Predicted Entities
ICD-O Codes and their normalized definition with clinical_embeddings
.
Live Demo Open in Colab Copy S3 URI
How to use
...
model = ChunkEntityResolverModel.pretrained("chunkresolve_icdo_clinical","en","clinical/models")
.setInputCols("token","chunk_embeddings")
.setOutputCol("entity")
pipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, embeddings, clinical_ner_model, clinical_ner_chunker, chunk_embeddings, model])
data = ["""DIAGNOSIS: Left breast adenocarcinoma stage T3 N1b M0, stage IIIA.
She has been found more recently to have stage IV disease with metastatic deposits and recurrence involving the chest wall and lower left neck lymph nodes.
PHYSICAL EXAMINATION
NECK: On physical examination palpable lymphadenopathy is present in the left lower neck and supraclavicular area. No other cervical lymphadenopathy or supraclavicular lymphadenopathy is present.
RESPIRATORY: Good air entry bilaterally. Examination of the chest wall reveals a small lesion where the chest wall recurrence was resected. No lumps, bumps or evidence of disease involving the right breast is present.
ABDOMEN: Normal bowel sounds, no hepatomegaly. No tenderness on deep palpation. She has just started her last cycle of chemotherapy today, and she wishes to visit her daughter in Brooklyn, New York. After this she will return in approximately 3 to 4 weeks and begin her radiotherapy treatment at that time."""]
pipeline_model = pipeline.fit(spark.createDataFrame([[""]]).toDF("text"))
light_pipeline = LightPipeline(pipeline_model)
result = light_pipeline.annotate(data)
...
val model = ChunkEntityResolverModel.pretrained("chunkresolve_icdo_clinical","en","clinical/models")
.setInputCols("token","chunk_embeddings")
.setOutputCol("entity")
val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, embeddings, clinical_ner_model, clinical_ner_chunker, chunk_embeddings, model))
val data = Seq("DIAGNOSIS: Left breast adenocarcinoma stage T3 N1b M0, stage IIIA. She has been found more recently to have stage IV disease with metastatic deposits and recurrence involving the chest wall and lower left neck lymph nodes. PHYSICAL EXAMINATION NECK: On physical examination palpable lymphadenopathy is present in the left lower neck and supraclavicular area. No other cervical lymphadenopathy or supraclavicular lymphadenopathy is present. RESPIRATORY: Good air entry bilaterally. Examination of the chest wall reveals a small lesion where the chest wall recurrence was resected. No lumps, bumps or evidence of disease involving the right breast is present. ABDOMEN: Normal bowel sounds, no hepatomegaly. No tenderness on deep palpation. She has just started her last cycle of chemotherapy today, and she wishes to visit her daughter in Brooklyn, New York. After this she will return in approximately 3 to 4 weeks and begin her radiotherapy treatment at that time.").toDF("text")
val result = pipeline.fit(data).transform(data)
Results
| | chunk | begin | end | entity | idco_description | icdo_code |
|---|----------------------------|-------|-----|--------|---------------------------------------------|-----------|
| 0 | Left breast adenocarcinoma | 11 | 36 | Cancer | Intraductal carcinoma, noninfiltrating, NOS | 8500/2 |
| 1 | T3 N1b M0 | 44 | 52 | Cancer | Kaposi sarcoma | 9140/3 |
Model Information
Name: | chunkresolve_icdo_clinical | |
Type: | ChunkEntityResolverModel | |
Compatibility: | Spark NLP 2.4.2+ | |
License: | Licensed | |
Edition: | Official | |
Input labels: | token, chunk_embeddings | |
Output labels: | entity | |
Language: | en | |
Case sensitive: | True | |
Dependencies: | embeddings_clinical |
Data Source
Trained on ICD-O Histology Behaviour dataset https://apps.who.int/iris/bitstream/handle/10665/96612/9789241548496_eng.pdf