ICD10CM ChunkResolver

Description

Entity Resolution model Based on KNN using Word Embeddings + Word Movers Distance.

Predicted Entities

ICD10-CM Codes and their normalized definition with clinical_embeddings.

Live Demo Open in Colab Copy S3 URI

How to use

...

icd10cmResolver = ChunkEntityResolverModel.pretrained('chunkresolve_icd10cm_diseases_clinical', 'en', "clinical/models")\
    .setEnableLevenshtein(True)\
    .setNeighbours(200).setAlternatives(5).setDistanceWeights([3,3,2,0,0,7])\
    .setInputCols('token', 'chunk_embs_jsl')\
    .setOutputCol('icd10cm_resolution')

pipeline_icd10 = Pipeline(stages = [documentAssembler, sentenceDetector, tokenizer, stopwords, word_embeddings, jslNer, drugNer, jslConverter, drugConverter, jslChunkEmbeddings, drugChunkEmbeddings, icd10cmResolver])

empty_df = spark.createDataFrame([[""]]).toDF("text")

data = ["""This is an 82 - year-old male with a history of prior tobacco use , hypertension , chronic renal insufficiency , COPD , gastritis , and TIA who initially presented to Braintree with a non-ST elevation MI and Guaiac positive stools , transferred to St . Margaret's Center for Women & Infants for cardiac catheterization with PTCA to mid LAD lesion complicated by hypotension and bradycardia requiring Atropine , IV fluids and transient dopamine possibly secondary to vagal reaction , subsequently transferred to CCU for close monitoring , hemodynamically stable at the time of admission to the CCU"""]

pipeline_model = pipeline_icd10.fit(empty_df)

light_pipeline = LightPipeline(pipeline_model)

result = light_pipeline.annotate(data)
...

val icd10cmResolver = ChunkEntityResolverModel.pretrained('chunkresolve_icd10cm_diseases_clinical', 'en', "clinical/models")
    .setEnableLevenshtein(True)
    .setNeighbours(200).setAlternatives(5).setDistanceWeights(Array(3,3,2,0,0,7))
    .setInputCols('token', 'chunk_embs_jsl')
    .setOutputCol('icd10cm_resolution')

val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDetector, tokenizer, stopwords, word_embeddings, jslNer, drugNer, jslConverter, drugConverter, jslChunkEmbeddings, drugChunkEmbeddings, icd10cmResolver))

val data = Seq("This is an 82 - year-old male with a history of prior tobacco use , hypertension , chronic renal insufficiency , COPD , gastritis , and TIA who initially presented to Braintree with a non-ST elevation MI and Guaiac positive stools , transferred to St . Margaret's Center for Women & Infants for cardiac catheterization with PTCA to mid LAD lesion complicated by hypotension and bradycardia requiring Atropine , IV fluids and transient dopamine possibly secondary to vagal reaction , subsequently transferred to CCU for close monitoring , hemodynamically stable at the time of admission to the CCU").toDF("text")
val result = pipeline.fit(data).transform(data)

Results

|   | coords      | chunk                       | entity    | icd10cm_opts                                                                              |
|---|-------------|-----------------------------|-----------|-------------------------------------------------------------------------------------------|
| 0 | 2::499::506 | insomnia                    | Diagnosis | [(G4700, Insomnia, unspecified), (G4709, Other insomnia), (F5102, Adjustment insomnia)...]|
| 1 | 4::83::109  | chronic renal insufficiency | Diagnosis | [(N185, Chronic kidney disease, stage 5), (N181, Chronic kidney disease, stage 1), (N1...]|
| 2 | 4::120::128 | gastritis                   | Diagnosis | [(K2970, Gastritis, unspecified, without bleeding), (B9681, Helicobacter pylori [H. py...]|

Model Information

Name: chunkresolve_icd10cm_diseases_clinical  
Type: ChunkEntityResolverModel  
Compatibility: Spark NLP 2.4.5+  
License: Licensed  
Edition: Official  
Input labels: [token, chunk_embeddings]  
Output labels: [entity]  
Language: en  
Case sensitive: True  
Dependencies: embeddings_clinical  

Data Source

Trained on ICD10CM Dataset Range: A000-N989 Except Neoplasms and Musculoskeletal https://www.icd10data.com/ICD10CM/Codes/