Sentence Entity Resolver for RxNorm (``sbiobert_base_cased_mli`` embeddings)

Description

This model maps extracted medical entities to RxNorm codes using chunk embeddings.

Predicted Entities

RxNorm Codes and their normalized definition with sbiobert_base_cased_mli embeddings.

Open in Colab Download

How to use

...
chunk2doc = Chunk2Doc().setInputCols("ner_chunk").setOutputCol("ner_chunk_doc")
 
sbert_embedder = BertSentenceEmbeddings\
     .pretrained("sbiobert_base_cased_mli",'en','clinical/models')\
     .setInputCols(["ner_chunk_doc"])\
     .setOutputCol("sbert_embeddings")
 
rxnorm_resolver = SentenceEntityResolverModel.pretrained("sbiobertresolve_rxnorm","en", "clinical/models") \
     .setInputCols(["ner_chunk", "sbert_embeddings"]) \
     .setOutputCol("resolution")\
     .setDistanceFunction("EUCLIDEAN")

nlpPipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, word_embeddings, clinical_ner, ner_converter, chunk2doc, sbert_embedder, rxnorm_resolver])

model = nlpPipeline.fit(spark.createDataFrame([["This is an 82 - year-old male with a history of prior tobacco use , hypertension , chronic renal insufficiency , COPD , gastritis , and TIA who initially presented to Braintree with a non-ST elevation MI and Guaiac positive stools , transferred to St . Margaret\'s Center for Women & Infants for cardiac catheterization with PTCA to mid LAD lesion complicated by hypotension and bradycardia requiring Atropine , IV fluids and transient dopamine possibly secondary to vagal reaction , subsequently transferred to CCU for close monitoring , hemodynamically stable at the time of admission to the CCU ."]]).toDF("text"))

results = model.transform(data)

...
val chunk2doc = Chunk2Doc().setInputCols("ner_chunk").setOutputCol("ner_chunk_doc")
 
val sbert_embedder = BertSentenceEmbeddings
     .pretrained("sbiobert_base_cased_mli",'en','clinical/models')
     .setInputCols(Array("ner_chunk_doc"))
     .setOutputCol("sbert_embeddings")
 
val rxnorm_resolver = SentenceEntityResolverModel.pretrained("sbiobertresolve_rxnorm","en", "clinical/models")
     .setInputCols(Array("ner_chunk", "sbert_embeddings"))
     .setOutputCol("resolution")
     .setDistanceFunction("EUCLIDEAN")

val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, word_embeddings, clinical_ner, ner_converter, chunk2doc, sbert_embedder, rxnorm_resolver))

val result = pipeline.fit(Seq.empty["This is an 82 - year-old male with a history of prior tobacco use , hypertension , chronic renal insufficiency , COPD , gastritis , and TIA who initially presented to Braintree with a non-ST elevation MI and Guaiac positive stools , transferred to St . Margaret\'s Center for Women & Infants for cardiac catheterization with PTCA to mid LAD lesion complicated by hypotension and bradycardia requiring Atropine , IV fluids and transient dopamine possibly secondary to vagal reaction , subsequently transferred to CCU for close monitoring , hemodynamically stable at the time of admission to the CCU ."].toDS.toDF("text")).transform(data)

Results

+--------------------+-----+---+---------+-------+----------+-----------------------------------------------+--------------------+
|               chunk|begin|end|   entity|   code|confidence|                                    resolutions|               codes|
+--------------------+-----+---+---------+-------+----------+-----------------------------------------------+--------------------+
|        hypertension|   68| 79|  PROBLEM| 386165|    0.1567|hypercal:::hypersed:::hypertears:::hyperstat...|386165:::217667::...|
|chronic renal ins...|   83|109|  PROBLEM| 218689|    0.1036|nephro calci:::dialysis solutions:::creatini...|218689:::3310:::2...|
|                COPD|  113|116|  PROBLEM|1539999|    0.1644|broncomar dm:::acne medication:::carbon mono...|1539999:::214981:...|
|           gastritis|  120|128|  PROBLEM| 225965|    0.1983|gastroflux:::gastroflux oral product:::uceri...|225965:::1176661:...|
|                 TIA|  136|138|  PROBLEM|1089812|    0.0625|thera tears:::thiotepa injection:::nature's ...|1089812:::1660003...|
|a non-ST elevatio...|  182|202|  PROBLEM| 218767|    0.1007|non-aspirin pm:::aspirin-free:::non aspirin ...|218767:::215440::...|
|Guaiac positive s...|  208|229|  PROBLEM|1294361|    0.0820|anusol rectal product:::anusol hc rectal pro...|1294361:::1166715...|
|cardiac catheteri...|  295|317|     TEST| 385247|    0.1566|cardiacap:::cardiology pack:::cardizem:::car...|385247:::545063::...|
|                PTCA|  324|327|TREATMENT|   8410|    0.0867|alteplase:::reteplase:::pancuronium:::tripe ...|8410:::76895:::78...|
|      mid LAD lesion|  332|345|  PROBLEM| 151672|    0.0549|dulcolax:::lazerformalyde:::linaclotide:::du...|151672:::217985::...|
+--------------------+-----+---+---------+-------+----------+-----------------------------------------------+--------------------+

Model Information

Name: sbiobertresolve_rxnorm
Type: SentenceEntityResolverModel
Compatibility: Spark NLP 2.6.5 +
License: Licensed
Edition: Official
Input labels: [ner_chunk, chunk_embeddings]
Output labels: [resolution]
Language: en
Dependencies: sbiobert_base_cased_mli

Data Source

Trained on November 2020 RxNorm Clinical Drugs ontology graph with sbiobert_base_cased_mli embeddings. https://www.nlm.nih.gov/pubs/techbull/nd20/brief/nd20_rx_norm_november_release.html