Sentence Entity Resolver for ICD10-CM (sbiobert_base_cased_mli embeddings)

Description

This model maps extracted medical entities to ICD10-CM codes using sentence embeddings.

Predicted Entities

ICD10-CM Codes and their normalized definition with sbiobert_base_cased_mli sentence embeddings.

Live Demo Open in Colab Copy S3 URI

How to use

...
chunk2doc = Chunk2Doc().setInputCols("ner_chunk").setOutputCol("ner_chunk_doc")

sbert_embedder = BertSentenceEmbeddings\
.pretrained("sbiobert_base_cased_mli","en","clinical/models")\
.setInputCols(["ner_chunk_doc"])\
.setOutputCol("sbert_embeddings")

icd10_resolver = SentenceEntityResolverModel.pretrained("sbiobertresolve_icd10cm","en", "clinical/models") \
.setInputCols(["sbert_embeddings"]) \
.setOutputCol("resolution")\
.setDistanceFunction("EUCLIDEAN")

nlpPipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, word_embeddings, clinical_ner, ner_converter, chunk2doc, sbert_embedder, icd10_resolver])

data = spark.createDataFrame([["This is an 82 - year-old male with a history of prior tobacco use , hypertension , chronic renal insufficiency , COPD , gastritis , and TIA who initially presented to Braintree with a non-ST elevation MI and Guaiac positive stools , transferred to St . Margaret\'s Center for Women & Infants for cardiac catheterization with PTCA to mid LAD lesion complicated by hypotension and bradycardia requiring Atropine , IV fluids and transient dopamine possibly secondary to vagal reaction , subsequently transferred to CCU for close monitoring , hemodynamically stable at the time of admission to the CCU ."]]).toDF("text")

results = nlpPipeline.fit(data).transform(data)

chunk2doc = Chunk2Doc().setInputCols("ner_chunk").setOutputCol("ner_chunk_doc")

val sbert_embedder = BertSentenceEmbeddings
.pretrained("sbiobert_base_cased_mli","en","clinical/models")
.setInputCols(Array("ner_chunk_doc"))
.setOutputCol("sbert_embeddings")

val icd10_resolver = SentenceEntityResolverModel
.pretrained("sbiobertresolve_icd10cm","en", "clinical/models")
.setInputCols(Array("ner_chunk", "sbert_embeddings"))
.setOutputCol("resolution")
.setDistanceFunction("EUCLIDEAN")

val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, word_embeddings, clinical_ner, ner_converter, chunk2doc, sbert_embedder, icd10_resolver))

val data = Seq("This is an 82 - year-old male with a history of prior tobacco use , hypertension , chronic renal insufficiency , COPD , gastritis , and TIA who initially presented to Braintree with a non-ST elevation MI and Guaiac positive stools , transferred to St . Margaret\'s Center for Women & Infants for cardiac catheterization with PTCA to mid LAD lesion complicated by hypotension and bradycardia requiring Atropine , IV fluids and transient dopamine possibly secondary to vagal reaction , subsequently transferred to CCU for close monitoring , hemodynamically stable at the time of admission to the CCU .").toDF("text")

val result = pipeline.fit(data).transform(data)

Results

+--------------------+-----+---+---------+------+----------+--------------------+--------------------+
|               chunk|begin|end|   entity|  code|confidence|   all_k_resolutions|         all_k_codes|
+--------------------+-----+---+---------+------+----------+--------------------+--------------------+
|        hypertension|   68| 79|  PROBLEM|  I150|    0.2606|Renovascular hype...|I150:::K766:::I10...|
|chronic renal ins...|   83|109|  PROBLEM|  N186|    0.2059|End stage renal d...|N186:::D631:::P96...|
|                COPD|  113|116|  PROBLEM| I2781|    0.2132|Cor pulmonale (ch...|I2781:::J449:::J4...|
|           gastritis|  120|128|  PROBLEM| K5281|    0.1425|Eosinophilic gast...|K5281:::K140:::K9...|
|                 TIA|  136|138|  PROBLEM|  G459|    0.1152|Transient cerebra...|G459:::I639:::T79...|
|a non-ST elevatio...|  182|202|  PROBLEM|  I214|    0.0889|Non-ST elevation ...|I214:::I256:::M62...|
|Guaiac positive s...|  208|229|  PROBLEM|  K626|    0.0631|Ulcer of anus and...|K626:::K380:::R15...|
|cardiac catheteri...|  295|317|     TEST|  Z950|    0.2549|Presence of cardi...|Z950:::Z95811:::I...|
|                PTCA|  324|327|TREATMENT| Z9861|    0.1268|Coronary angiopla...|Z9861:::Z9862:::I...|
|      mid LAD lesion|  332|345|  PROBLEM|L02424|    0.1117|Furuncle of left ...|L02424:::Q202:::L...|
+--------------------+-----+---+---------+------+----------+--------------------+--------------------+

Model Information

Name: sbiobertresolve_icd10cm
Type: SentenceEntityResolverModel
Compatibility: Spark NLP 2.6.4 +
License: Licensed
Edition: Official
Input labels: [ner_chunk, chunk_embeddings]
Output labels: [resolution]
Language: en
Dependencies: sbiobert_base_cased_mli

Data Source

Trained on ICD10 Clinical Modification dataset with sbiobert_base_cased_mli sentence embeddings. https://www.icd10data.com/ICD10CM/Codes/