Sentence Entity Resolver for RxCUI (sbiobert_base_cased_mli embeddings)

Description

This model maps clinical entities and concepts (like drugs/ingredients) to RxCUI codes codes using sbiobert_base_cased_mli Sentence Bert Embeddings, and has faster load time, with a speedup of about 6X when compared to previous versions. Also the load process now is more memory friendly meaning that the maximum memory required during load time is smaller, reducing the chances of OOM exceptions, and thus relaxing hardware requirements.

Predicted Entities

Predicts RxCUI Codes and their normalized definition for each chunk.

Live Demo Open in Colab Download

How to use

sbiobertresolve_rxcui resolver model must be used with sbiobert_base_cased_mli as embeddings ner_posology as NER model. DRUG set in .setWhiteList().

...
chunk2doc = Chunk2Doc().setInputCols("ner_chunk").setOutputCol("ner_chunk_doc")
 
sbert_embedder = BertSentenceEmbeddings\
     .pretrained("sbiobert_base_cased_mli","en","clinical/models")\
     .setInputCols(["ner_chunk_doc"])\
     .setOutputCol("sbert_embeddings")

rxcui_resolver = SentenceEntityResolverModel.pretrained("sbiobertresolve_rxcui","en", "clinical/models") \
     .setInputCols(["ner_chunk", "sbert_embeddings"]) \
     .setOutputCol("resolution")\
     .setDistanceFunction("EUCLIDEAN")
     
nlpPipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, word_embeddings, clinical_ner, ner_converter, chunk2doc, sbert_embedder, rxcui_resolver])

data = spark.createDataFrame([["He was seen by the endocrinology service and she was discharged on 50 mg of eltrombopag oral at night, 5 mg amlodipine with meals, and metformin 1000 mg two times a day"]]).toDF("text")

results = nlpPipeline.fit(data).transform(data)
...
val chunk2doc = Chunk2Doc().setInputCols("ner_chunk").setOutputCol("ner_chunk_doc")
 
val sbert_embedder = BertSentenceEmbeddings
     .pretrained("sbiobert_base_cased_mli","en","clinical/models")
     .setInputCols(Array("ner_chunk_doc"))
     .setOutputCol("sbert_embeddings")

val rxcui_resolver = SentenceEntityResolverModel\
     .pretrained("sbiobertresolve_rxcui","en", "clinical/models")
     .setInputCols(Array("ner_chunk", "sbert_embeddings"))
     .setOutputCol("resolution")
     .setDistanceFunction("EUCLIDEAN")

val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, word_embeddings, clinical_ner, ner_converter, chunk2doc, sbert_embedder, rxcui_resolver))

val data = Seq("He was seen by the endocrinology service and she was discharged on 50 mg of eltrombopag oral at night, 5 mg amlodipine with meals, and metformin 1000 mg two times a day").toDF("text")

val result = pipeline.fit(data).transform(data)

Results

+---------------------------+--------+-----------------------------------------------------+
| chunk                     | code   | term                                                |               
+---------------------------+--------+-----------------------------------------------------+
| 50 mg of eltrombopag oral | 825427 | eltrombopag 50 MG Oral Tablet                       |
| 5 mg amlodipine           | 197361 | amlodipine 5 MG Oral Tablet                         |
| metformin 1000 mg         | 861004 | metformin hydrochloride 2000 MG Oral Tablet         |
+---------------------------+--------+-----------------------------------------------------+

Model Information

Model Name: sbiobertresolve_rxcui
Compatibility: Spark NLP for Healthcare 3.0.4+
License: Licensed
Edition: Official
Input Labels: [sentence_embeddings]
Output Labels: [rxcui_code]
Language: en
Case sensitive: false

Data Source

Trained on November 2020 RxNorm Clinical Drugs ontology graph with sbiobert_base_cased_mli embeddings. https://www.nlm.nih.gov/pubs/techbull/nd20/brief/nd20_rx_norm_november_release.html. Sample Content.