Description
This model maps clinical entities and concepts (like drugs/ingredients) to RxNorm codes using mpnet_embeddings_biolord_2023
embeddings. It trained on the augmented version of the dataset used in previous RxNorm resolver models. Additionally, this model returns concept classes of the drugs in the all_k_aux_labels
column.
Predicted Entities
How to use
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentenceDetectorDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models")\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
.setInputCols(["sentence", "token"])\
.setOutputCol("word_embeddings")
ner = MedicalNerModel.pretrained("ner_posology_greedy", "en", "clinical/models")\
.setInputCols(["sentence", "token", "word_embeddings"])\
.setOutputCol("ner")
ner_converter = NerConverterInternal()\
.setInputCols(["sentence", "token", "ner"])\
.setOutputCol("ner_chunk")\
.setWhiteList(["DRUG"])
c2doc = Chunk2Doc()\
.setInputCols("ner_chunk")\
.setOutputCol("ner_chunk_doc")
biolord_embedding = MPNetEmbeddings.pretrained("mpnet_embeddings_biolord_2023", "en")\
.setInputCols(["ner_chunk_doc"])\
.setOutputCol("embeddings")
rxnorm_resolver = SentenceEntityResolverModel.pretrained("biolordresolve_avg_rxnorm_augmented_v2", "en", "clinical/models")\
.setInputCols(["embeddings"])\
.setOutputCol("rxnorm_code")\
.setDistanceFunction("EUCLIDEAN")
resolver_pipeline = Pipeline(stages = [document_assembler,
sentenceDetectorDL,
tokenizer,
word_embeddings,
ner,
ner_converter,
c2doc,
biolord_embedding,
rxnorm_resolver])
data = spark.createDataFrame([["""The patient was prescribed aspirin and and Albuterol inhaler, two puffs every 4 hours as needed for asthma. She was seen by the endocrinology service and was discharged on avandia 4 mg at night , Coumadin 5 mg with meals , and metformin 1000 mg two times a day and Lisinopril 10 mg daily."""]]).toDF("text")
result = resolver_pipeline.fit(data).transform(data)
val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentenceDetectorDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models")
.setInputCols(["document"])
.setOutputCol("sentence")
val tokenizer = new Tokenizer()
.setInputCols(["sentence"])
.setOutputCol("token")
val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
.setInputCols(Array("sentence", "token"))
.setOutputCol("word_embeddings")
val ner = MedicalNerModel.pretrained("ner_posology_greedy", "en", "clinical/models")
.setInputCols(Array("sentence", "token", "word_embeddings"))
.setOutputCol("ner")
val ner_converter = new NerConverterInternal()
.setInputCols(Array("sentence", "token", "ner"))
.setOutputCol("ner_chunk")
.setWhiteList(["DRUG"])
val c2doc = new Chunk2Doc()
.setInputCols("ner_chunk")
.setOutputCol("ner_chunk_doc")
val biolord_embedding = MPNetEmbeddings.pretrained("mpnet_embeddings_biolord_2023", "en")
.setInputCols(["ner_chunk_doc"])
.setOutputCol("embeddings")
val rxnorm_resolver = SentenceEntityResolverModel.pretrained("biolordresolve_avg_rxnorm_augmented_v2", "en", "clinical/models")
.setInputCols(["embeddings"])
.setOutputCol("rxnorm_code")
.setDistanceFunction("EUCLIDEAN")
val resolver_pipeline = new PipelineModel().setStages(Array(
document_assembler,
sentenceDetectorDL,
tokenizer,
word_embeddings,
ner,
ner_converter,
c2doc,
biolord_embedding,
rxnorm_resolver))
val data = Seq([["""The patient was prescribed aspirin and and Albuterol inhaler, two puffs every 4 hours as needed for asthma. She was seen by the endocrinology service and was discharged on avandia 4 mg at night , Coumadin 5 mg with meals , and metformin 1000 mg two times a day and Lisinopril 10 mg daily."""]]).toDF("text")
val result = resolver_pipeline.fit(data).transform(data)
Results
+-----------------+-----------+------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+
| ner_chunk|rxnorm_code|entity| all_k_resolutions| all_k_results| all_k_distances| all_k_cosine_distances| all_k_aux_labels|
+-----------------+-----------+------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+
| aspirin| 1154070| DRUG|aspirin Pill[aspirin Pill]:::aspirin Oral Produ...|1154070:::1154069:::368473:::830532:::1299851::...|0.4035:::0.4122:::0.4906:::0.4908:::0.4995:::0....|0.0814:::0.0849:::0.1203:::0.1204:::0.1248:::0....|Clinical Dose Group:::Clinical Dose Group:::Bra...|
|Albuterol inhaler| 801094| DRUG|albuterol Metered Dose Inhaler [Ventolin][albut...|801094:::2108228:::746762:::745682:::2108257:::...|0.4221:::0.4684:::0.4786:::0.5358:::0.5424:::0....|0.0891:::0.1097:::0.1145:::0.1435:::0.1471:::0....|Branded Drug Form:::Branded Drug Form:::Branded...|
| Coumadin 5 mg| 855334| DRUG|Coumadin 5 MG Oral Tablet [warfarin sodium 5 MG...|855334:::855297:::432467:::451604:::333664:::85...|0.5419:::0.5466:::0.5577:::0.5817:::0.6057:::0....|0.1468:::0.1494:::0.1555:::0.1692:::0.1835:::0....|Branded Drug:::Branded Drug Comp:::Clinical Dru...|
|metformin 1000 mg| 860997| DRUG|metformin hydrochloride 1000 MG [Fortamet][metf...|860997:::861004:::861006:::316256:::583195:::86...|0.3527:::0.3679:::0.3897:::0.3914:::0.4724:::0....|0.0622:::0.0677:::0.0759:::0.0766:::0.1116:::0....|Branded Drug Comp:::Clinical Drug:::Branded Dru...|
| Lisinopril 10 mg| 311354| DRUG|lisinopril 5 MG Oral Tablet:::lisinopril 2.5 MG...|311354:::316152:::197885:::201381:::567581:::57...|0.4696:::0.5070:::0.5200:::0.5214:::0.5283:::0....|0.1103:::0.1285:::0.1352:::0.1359:::0.1395:::0....|Clinical Drug:::Clinical Drug Comp:::Clinical D...|
+-----------------+-----------+------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+
Model Information
Model Name: | biolordresolve_avg_rxnorm_augmented_v2 |
Compatibility: | Healthcare NLP 5.4.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [mpnet_embeddings] |
Output Labels: | [rxnorm_code] |
Language: | en |
Size: | 1.1 GB |
Case sensitive: | false |