Description
This model maps extracted medical entities to ICD10-CM codes using sbiobert_base_cased_mli
Sentence Bert Embeddings. Also, it has been augmented with synonyms for making it more accurate.
Predicted Entities
Live Demo Open in Colab Copy S3 URI
How to use
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
.setInputCols(["sentence","token"])\
.setOutputCol("embeddings")
clinical_ner = MedicalNerModel.pretrained("ner_clinical", "en", "clinical/models")\
.setInputCols(["sentence","token","embeddings"])\
.setOutputCol("ner")
ner_converter = NerConverter()\
.setInputCols(["sentence","token","ner"])\
.setOutputCol("ner_chunk")\
.setWhiteList(['PROBLEM'])
chunk2doc = Chunk2Doc().setInputCols("ner_chunk").setOutputCol("ner_chunk_doc")
sbert_embedder = BertSentenceEmbeddings\
.pretrained("sbiobert_base_cased_mli","en","clinical/models")\
.setInputCols(["ner_chunk_doc"])\
.setOutputCol("sbert_embeddings")
icd10_resolver = SentenceEntityResolverModel\
.pretrained("sbiobertresolve_icd10cm_augmented","en", "clinical/models") \
.setInputCols(["sbert_embeddings"]) \
.setOutputCol("resolution")\
.setDistanceFunction("EUCLIDEAN")
nlpPipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, word_embeddings, clinical_ner, ner_converter, chunk2doc, sbert_embedder, icd10_resolver])
data_ner = spark.createDataFrame([["A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus (T2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, associated with acute hepatitis, and obesity with a body mass index (BMI) of 33.5 kg/m2, presented with a one-week history of polyuria, polydipsia, poor appetite, and vomiting. Two weeks prior to presentation, she was treated with a five-day course of amoxicillin for a respiratory tract infection."]]).toDF("text")
results = nlpPipeline.fit(data_ner).transform(data_ner)
...
val chunk2doc = Chunk2Doc().setInputCols("ner_chunk").setOutputCol("ner_chunk_doc")
val sbert_embedder = BertSentenceEmbeddings
.pretrained("sbiobert_base_cased_mli","en","clinical/models")
.setInputCols(Array("ner_chunk_doc"))
.setOutputCol("sbert_embeddings")
val icd10_resolver = SentenceEntityResolverModel
.pretrained("sbiobertresolve_icd10cm_augmented","en", "clinical/models")
.setInputCols(Array("sbert_embeddings"))
.setOutputCol("resolution")
.setDistanceFunction("EUCLIDEAN")
val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, word_embeddings, clinical_ner, ner_converter, chunk2doc, sbert_embedder, icd10_resolver))
val data = Seq("A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus (T2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, associated with acute hepatitis, and obesity with a body mass index (BMI) of 33.5 kg/m2, presented with a one-week history of polyuria, polydipsia, poor appetite, and vomiting. Two weeks prior to presentation, she was treated with a five-day course of amoxicillin for a respiratory tract infection.").toDF("text")
val result = pipeline.fit(data).transform(data)
import nlu
nlu.load("en.resolve.icd10cm.augmented").predict("""A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus (T2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, associated with acute hepatitis, and obesity with a body mass index (BMI) of 33.5 kg/m2, presented with a one-week history of polyuria, polydipsia, poor appetite, and vomiting. Two weeks prior to presentation, she was treated with a five-day course of amoxicillin for a respiratory tract infection.""")
Results
+-------------------------------------+-------+------------+----------------------------------------------------------------------+----------------------------------------------------------------------+
| ner_chunk| entity|icd10cm_code| resolutions| all_codes|
+-------------------------------------+-------+------------+----------------------------------------------------------------------+----------------------------------------------------------------------+
| gestational diabetes mellitus|PROBLEM| O2441|gestational diabetes mellitus:::postpartum gestational diabetes mel...| O2441:::O2443:::Z8632:::Z875:::O2431:::O2411:::O244:::O241:::O2481|
|subsequent type two diabetes mellitus|PROBLEM| O2411|pre-existing type 2 diabetes mellitus:::disorder associated with ty...|O2411:::E118:::E11:::E139:::E119:::E113:::E1144:::Z863:::Z8639:::E1...|
| T2DM|PROBLEM| E11|type 2 diabetes mellitus:::disorder associated with type 2 diabetes...|E11:::E118:::E119:::O2411:::E109:::E139:::E113:::E8881:::Z833:::D64...|
| HTG-induced pancreatitis|PROBLEM| K8520|alcohol-induced pancreatitis:::drug-induced acute pancreatitis:::he...|K8520:::K853:::K8590:::F102:::K852:::K859:::K8580:::K8591:::K858:::...|
| acute hepatitis|PROBLEM| K720|acute hepatitis:::acute hepatitis a:::acute infectious hepatitis:::...|K720:::B15:::B179:::B172:::Z0389:::B159:::B150:::B16:::K752:::K712:...|
| obesity|PROBLEM| E669|obesity:::abdominal obesity:::obese:::central obesity:::overweight ...|E669:::E668:::Z6841:::Q130:::E66:::E6601:::Z8639:::E349:::H3550:::Z...|
| a body mass index|PROBLEM| Z6841|finding of body mass index:::observation of body mass index:::mass ...|Z6841:::E669:::R229:::Z681:::R223:::R221:::Z68:::R222:::R220:::R418...|
| polyuria|PROBLEM| R35|polyuria:::nocturnal polyuria:::polyuric state:::polyuric state (di...|R35:::R3581:::R358:::E232:::R31:::R350:::R8299:::N401:::E723:::O048...|
| polydipsia|PROBLEM| R631|polydipsia:::psychogenic polydipsia:::primary polydipsia:::psychoge...|R631:::F6389:::E232:::F639:::O40:::G475:::M7989:::R632:::R061:::H53...|
| poor appetite|PROBLEM| R630|poor appetite:::poor feeding:::bad taste in mouth:::unpleasant tast...|R630:::P929:::R438:::R432:::E86:::R196:::F520:::Z724:::R0689:::Z768...|
| vomiting|PROBLEM| R111|vomiting:::intermittent vomiting:::vomiting symptoms:::periodic vom...| R111:::R11:::R1110:::G43A1:::P921:::P9209:::G43A:::R1113:::R110|
| a respiratory tract infection|PROBLEM| J988|respiratory tract infection:::upper respiratory tract infection:::b...|J988:::J069:::A499:::J22:::J209:::Z593:::T17:::J0410:::Z1383:::J189...|
+-------------------------------------+-------+------------+----------------------------------------------------------------------+----------------------------------------------------------------------+
Model Information
Model Name: | sbiobertresolve_icd10cm_augmented |
Compatibility: | Healthcare NLP 3.3.1+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence_embeddings] |
Output Labels: | [icd10cm_code] |
Language: | en |
Size: | 1.4 GB |
Case sensitive: | false |
Dependencies: | embeddings_clinical |
Data Source
Trained on ICD10CM 2022 Codes dataset: https://www.cdc.gov/nchs/icd/icd10cm.htm