ICD10CM Entity Resolver

Description

Entity Resolution model Based on KNN using Word Embeddings + Word Movers Distance

Predicted Entities

ICD10-CM Codes and their normalized definition with clinical_embeddings.

Live Demo Open in Colab Copy S3 URI

How to use

...
icd10cm_resolution = ChunkEntityResolverModel.pretrained("chunkresolve_icd10cm_clinical", "en", "clinical/models") \
  .setInputCols(["ner_token", "chunk_embeddings"]) \
  .setOutputCol("icd10cm_code") \
  .setDistanceFunction("COSINE")  \
  .setNeighbours(5)

pipeline_icd10cm = Pipeline(stages = [documentAssembler, sentenceDetector, tokenizer, word_embeddings, clinical_ner, ner_converter, chunk_embeddings, chunk_tokenizer, icd10cm_resolution])

pipeline_model = pipeline_icd10cm.fit(spark.createDataFrame([["""A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus (T2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, associated with an acute hepatitis, and obesity with a body mass index (BMI) of 33.5 kg/m2, presented with a one-week history of polyuria, polydipsia, poor appetite, and vomiting. Two weeks prior to presentation, she was treated with a five-day course of amoxicillin for a respiratory tract infection. She was on metformin, glipizide, and dapagliflozin for T2DM and atorvastatin and gemfibrozil for HTG."""]]).toDF("text"))

result = pipeline_model.transform(data)
...
val icd10cm_resolution = ChunkEntityResolverModel.pretrained("chunkresolve_icd10cm_clinical", "en", "clinical/models")
  .setInputCols("ner_token", "chunk_embeddings")
  .setOutputCol("icd10cm_code")
  .setDistanceFunction("COSINE")  
  .setNeighbours(5)

val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDetector, tokenizer, word_embeddings, clinical_ner, ner_converter, chunk_embeddings, chunk_tokenizer, icd10cm_resolution))

val data = Seq("A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus (T2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, associated with an acute hepatitis, and obesity with a body mass index (BMI) of 33.5 kg/m2, presented with a one-week history of polyuria, polydipsia, poor appetite, and vomiting. Two weeks prior to presentation, she was treated with a five-day course of amoxicillin for a respiratory tract infection. She was on metformin, glipizide, and dapagliflozin for T2DM and atorvastatin and gemfibrozil for HTG.").toDF("text")
val result = pipeline.fit(data).transform(data)

Results

|   | chunk                       | entity    | resolved_text                                      | code   | cms                                               |
|---|-----------------------------|-----------|----------------------------------------------------|--------|---------------------------------------------------|
| 0 | T2DM),                      | PROBLEM   | Type 2 diabetes mellitus with diabetic nephrop...  | E1121  | Type 2 diabetes mellitus with diabetic nephrop... |
| 1 | T2DM                        | PROBLEM   | Type 2 diabetes mellitus with diabetic nephrop...  | E1121  | Type 2 diabetes mellitus with diabetic nephrop... |
| 2 | polydipsia                  | PROBLEM   | Polydipsia                                         | R631   | Polydipsia:::Anhedonia:::Galactorrhea             |
| 3 | interference from turbidity | PROBLEM   | Non-working side interference                      | M2656  | Non-working side interference:::Hemoglobinuria... |
| 4 | polyuria                    | PROBLEM   | Other polyuria                                     | R358   | Other polyuria:::Polydipsia:::Generalized edem... |
| 5 | lipemia                     | PROBLEM   | Glycosuria                                         | R81    | Glycosuria:::Pure hyperglyceridemia:::Hyperchy... |
| 6 | starvation ketosis          | PROBLEM   | Propionic acidemia                                 | E71121 | Propionic acidemia:::Bartter's syndrome:::Hypo... |
| 7 | HTG                         | PROBLEM   | Pure hyperglyceridemia                             | E781   | Pure hyperglyceridemia:::Familial hypercholest... |

Model Information

Name: chunkresolve_icd10cm_clinical  
Type: ChunkEntityResolverModel  
Compatibility: Spark NLP 2.4.2+  
License: Licensed  
Edition: Official  
Input labels: token, chunk_embeddings  
Output labels: entity  
Language: en  
Case sensitive: True  
Dependencies: embeddings_clinical  

Data Source

Trained on ICD10 Clinical Modification dataset with tenths of variations per code. https://www.icd10data.com/ICD10CM/Codes/