Description
Entity Resolution model Based on KNN using Word Embeddings + Word Movers Distance
Predicted Entities
ICD10-CM Codes and their normalized definition with clinical_embeddings
.
Live Demo Open in Colab Copy S3 URI
How to use
...
icd10cm_resolution = ChunkEntityResolverModel.pretrained("chunkresolve_icd10cm_clinical", "en", "clinical/models") \
.setInputCols(["ner_token", "chunk_embeddings"]) \
.setOutputCol("icd10cm_code") \
.setDistanceFunction("COSINE") \
.setNeighbours(5)
pipeline_icd10cm = Pipeline(stages = [documentAssembler, sentenceDetector, tokenizer, word_embeddings, clinical_ner, ner_converter, chunk_embeddings, chunk_tokenizer, icd10cm_resolution])
pipeline_model = pipeline_icd10cm.fit(spark.createDataFrame([["""A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus (T2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, associated with an acute hepatitis, and obesity with a body mass index (BMI) of 33.5 kg/m2, presented with a one-week history of polyuria, polydipsia, poor appetite, and vomiting. Two weeks prior to presentation, she was treated with a five-day course of amoxicillin for a respiratory tract infection. She was on metformin, glipizide, and dapagliflozin for T2DM and atorvastatin and gemfibrozil for HTG."""]]).toDF("text"))
result = pipeline_model.transform(data)
...
val icd10cm_resolution = ChunkEntityResolverModel.pretrained("chunkresolve_icd10cm_clinical", "en", "clinical/models")
.setInputCols("ner_token", "chunk_embeddings")
.setOutputCol("icd10cm_code")
.setDistanceFunction("COSINE")
.setNeighbours(5)
val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDetector, tokenizer, word_embeddings, clinical_ner, ner_converter, chunk_embeddings, chunk_tokenizer, icd10cm_resolution))
val data = Seq("A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus (T2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, associated with an acute hepatitis, and obesity with a body mass index (BMI) of 33.5 kg/m2, presented with a one-week history of polyuria, polydipsia, poor appetite, and vomiting. Two weeks prior to presentation, she was treated with a five-day course of amoxicillin for a respiratory tract infection. She was on metformin, glipizide, and dapagliflozin for T2DM and atorvastatin and gemfibrozil for HTG.").toDF("text")
val result = pipeline.fit(data).transform(data)
Results
| | chunk | entity | resolved_text | code | cms |
|---|-----------------------------|-----------|----------------------------------------------------|--------|---------------------------------------------------|
| 0 | T2DM), | PROBLEM | Type 2 diabetes mellitus with diabetic nephrop... | E1121 | Type 2 diabetes mellitus with diabetic nephrop... |
| 1 | T2DM | PROBLEM | Type 2 diabetes mellitus with diabetic nephrop... | E1121 | Type 2 diabetes mellitus with diabetic nephrop... |
| 2 | polydipsia | PROBLEM | Polydipsia | R631 | Polydipsia:::Anhedonia:::Galactorrhea |
| 3 | interference from turbidity | PROBLEM | Non-working side interference | M2656 | Non-working side interference:::Hemoglobinuria... |
| 4 | polyuria | PROBLEM | Other polyuria | R358 | Other polyuria:::Polydipsia:::Generalized edem... |
| 5 | lipemia | PROBLEM | Glycosuria | R81 | Glycosuria:::Pure hyperglyceridemia:::Hyperchy... |
| 6 | starvation ketosis | PROBLEM | Propionic acidemia | E71121 | Propionic acidemia:::Bartter's syndrome:::Hypo... |
| 7 | HTG | PROBLEM | Pure hyperglyceridemia | E781 | Pure hyperglyceridemia:::Familial hypercholest... |
Model Information
Name: | chunkresolve_icd10cm_clinical | |
Type: | ChunkEntityResolverModel | |
Compatibility: | Spark NLP 2.4.2+ | |
License: | Licensed | |
Edition: | Official | |
Input labels: | token, chunk_embeddings | |
Output labels: | entity | |
Language: | en | |
Case sensitive: | True | |
Dependencies: | embeddings_clinical |
Data Source
Trained on ICD10 Clinical Modification dataset with tenths of variations per code. https://www.icd10data.com/ICD10CM/Codes/