# ICD10CM Poison Entity Resolver

## Description

Entity Resolution model Based on KNN using Word Embeddings + Word Movers Distance.

## Predicted Entities

ICD10-CM Codes and their normalized definition with clinical_embeddings.

## How to use

...
model = ChunkEntityResolverModel.pretrained("chunkresolve_icd10cm_poison_ext_clinical","en","clinical/models")\
.setInputCols("token","chunk_embeddings")\
.setOutputCol("icd10_code")

pipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, embeddings, ner_model, ner_chunker, chunk_embeddings, model])

light_pipeline  = LightPipeline(pipeline.fit(spark.createDataFrame([['']]).toDF("text")))

light_pipeline.fullAnnotate("""The patient is a 5-month-old infant who presented initially on Monday with a cold, cough, and runny nose for 2 days. She had no difficulty breathing and her cough was described as dry and hacky. At that time, physical exam showed a right TM, which was red. Left TM was okay. She was fairly congested but looked happy and playful. She was started on Amoxil and Aldex and we told to recheck in 2 weeks to recheck her ear. Mom returned to clinic again today because she got much worse overnight. She was having difficulty breathing. She was much more congested and her appetite had decreased significantly today. She also spiked a temperature yesterday of 102.6 and always having trouble sleeping secondary to congestion.""")


...
val model = ChunkEntityResolverModel.pretrained("chunkresolve_icd10cm_poison_ext_clinical","en","clinical/models")
.setInputCols("token","chunk_embeddings")
.setOutputCol("icd10_code")

val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, embeddings, ner_model, ner_chunker, chunk_embeddings, model))

val result = pipeline.fit(Seq.empty["""The patient is a 5-month-old infant who presented initially on Monday with a cold, cough, and runny nose for 2 days. She had no difficulty breathing and her cough was described as dry and hacky. At that time, physical exam showed a right TM, which was red. Left TM was okay. She was fairly congested but looked happy and playful. She was started on Amoxil and Aldex and we told to recheck in 2 weeks to recheck her ear. Mom returned to clinic again today because she got much worse overnight. She was having difficulty breathing. She was much more congested and her appetite had decreased significantly today. She also spiked a temperature yesterday of 102.6 and always having trouble sleeping secondary to congestion."""].toDS.toDF("text")).transform(data)


## Result

| # |                chunk | begin | end |  entity |                                 icd10_description | icd10_code |
|--:|---------------------:|------:|----:|--------:|--------------------------------------------------:|------------|
| 0 |        a cold, cough |    75 |  87 | PROBLEM | Chronic obstructive pulmonary disease, unspeci... |       J449 |
| 1 |           runny nose |    94 | 103 | PROBLEM |                                  Nasal congestion |      R0981 |
| 2 | difficulty breathing |   210 | 229 | PROBLEM |                               Shortness of breath |      R0602 |
| 3 |            her cough |   235 | 243 | PROBLEM |                                             Cough |        R05 |
| 4 |     fairly congested |   365 | 380 | PROBLEM |                                Edema, unspecified |       R609 |
| 5 | difficulty breathing |   590 | 609 | PROBLEM |                               Shortness of breath |      R0602 |
| 6 |       more congested |   625 | 638 | PROBLEM |                                Edema, unspecified |       R609 |
| 7 |     trouble sleeping |   759 | 774 | PROBLEM |                                Activity, sleeping |      Y9384 |
| 8 |           congestion |   789 | 798 | PROBLEM |                                  Nasal congestion |      R0981 |


## Model Information

 Name: chunkresolve_icd10cm_poison_ext_clinical Type: ChunkEntityResolverModel Compatibility: Spark NLP 2.4.5+ License: Licensed Edition: Official Input labels: [token, chunk_embeddings] Output labels: [icd10_code] Language: en Case sensitive: True Dependencies: embeddings_clinical

## Data Source

Trained on ICD10CM Dataset Range: T1500XA-T879 https://www.icd10data.com/ICD10CM/Codes/S00-T88