Description
This model maps extracted medical entities to RxNorm codes using chunk embeddings.
Predicted Entities
RxNorm Codes and their normalized definition with sbiobert_base_cased_mli
embeddings.
How to use
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentenceDetectorDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models")\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
.setInputCols(["sentence", "token"])\
.setOutputCol("embeddings")
ner = MedicalNerModel.pretrained("ner_posology_greedy", "en", "clinical/models")\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")\
ner_converter = NerConverterInternal()\
.setInputCols(["sentence", "token", "ner"])\
.setOutputCol("ner_chunk")\
.setWhiteList(["DRUG"])
c2doc = Chunk2Doc()\
.setInputCols("ner_chunk")\
.setOutputCol("ner_chunk_doc")
sbert_embedder = BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli", "en", "clinical/models")\
.setInputCols(["ner_chunk_doc"])\
.setOutputCol("sbert_embeddings")\
resolver = SentenceEntityResolverModel.pretrained("sbiobertresolve_rxnorm","en", "clinical/models") \
.setInputCols(["sbert_embeddings"]) \
.setOutputCol("resolution")\
.setDistanceFunction("EUCLIDEAN")
resolver_pipeline = Pipeline(stages = [
document_assembler,
sentenceDetectorDL,
tokenizer,
word_embeddings,
ner,
ner_converter,
c2doc,
sbert_embedder,
resolver
])
data = spark.createDataFrame([["""This is an 82 - year-old male with a history of prior tobacco use , hypertension , chronic renal insufficiency , COPD , gastritis , and TIA who initially presented to Braintree with a non-ST elevation MI and Guaiac positive stools , transferred to St . Margaret\'s Center for Women & Infants for cardiac catheterization with PTCA to mid LAD lesion complicated by hypotension and bradycardia requiring Atropine , IV fluids and transient dopamine possibly secondary to vagal reaction , subsequently transferred to CCU for close monitoring , hemodynamically stable at the time of admission to the CCU."""]]).toDF("text")
result = resolver_pipeline.fit(data).transform(data)
val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentenceDetectorDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")
.setInputCols(Array("document"))
.setOutputCol("sentence")
val tokenizer = new Tokenizer()
.setInputCols(Array("sentence"))
.setOutputCol("token")
val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical","en","clinical/models")
.setInputCols(Array("sentence","token"))
.setOutputCol("embeddings")
val ner = MedicalNerModel.pretrained("ner_posology_greedy","en","clinical/models")
.setInputCols(Array("sentence","token","embeddings"))
.setOutputCol("ner")
val ner_converter = new NerConverterInternal()
.setInputCols(Array("sentence","token","ner"))
.setOutputCol("ner_chunk")
.setWhiteList(Array("DRUG"))
val c2doc = new Chunk2Doc()
.setInputCols("ner_chunk")
.setOutputCol("ner_chunk_doc")
val sbert_embedder = BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli","en","clinical/models")
.setInputCols(Array("ner_chunk_doc"))
.setOutputCol("sbert_embeddings")
val resolver = SentenceEntityResolverModel.pretrained("sbiobertresolve_rxnorm","en","clinical/models")
.setInputCols(Array("sbert_embeddings"))
.setOutputCol("resolution")
.setDistanceFunction("EUCLIDEAN")
val resolver_pipeline = new Pipeline().setStages(Array(
document_assembler,
sentenceDetectorDL,
tokenizer,
word_embeddings,
ner,
ner_converter,
c2doc,
sbert_embedder,
resolver ))
val data = Seq("""This is an 82 - year-old male with a history of prior tobacco use ,hypertension ,chronic renal insufficiency ,COPD ,gastritis ,and TIA who initially presented to Braintree with a non-ST elevation MI and Guaiac positive stools ,transferred to St . Margaret's Center for Women & Infants for cardiac catheterization with PTCA to mid LAD lesion complicated by hypotension and bradycardia requiring Atropine ,IV fluids and transient dopamine possibly secondary to vagal reaction ,subsequently transferred to CCU for close monitoring ,hemodynamically stable at the time of admission to the CCU.""").toDF("text")
val result = resolver_pipeline.fit(data).transform(data)
Results
+---------+-----+---+---------+------+----------------------+--------------------------------------------------------------------------------+
| chunk|begin|end|ner_label| code| description| resolutions|
+---------+-----+---+---------+------+----------------------+--------------------------------------------------------------------------------+
| Atropine| 400|407| DRUG| 1223| atropine|atropine:::isopto atropine:::attane:::atropisol:::atropen:::atridine:::aramin...|
|IV fluids| 411|419| DRUG|346168|intravenous suspension|intravenous suspension:::intravenous solution:::wal-four:::injectable suspens...|
| dopamine| 435|442| DRUG| 3628| dopamine|dopamine:::dopamine injection:::dopexamine:::dopa, dl:::dolophine:::distigmin...|
+---------+-----+---+---------+------+----------------------+--------------------------------------------------------------------------------+
Model Information
Name: | sbiobertresolve_rxnorm |
Type: | SentenceEntityResolverModel |
Compatibility: | Spark NLP 2.6.4 + |
License: | Licensed |
Edition: | Official |
Input labels: | [ner_chunk, chunk_embeddings] |
Output labels: | [resolution] |
Language: | en |
Dependencies: | sbiobert_base_cased_mli |
Data Source
Trained on November 2020 RxNorm Clinical Drugs ontology graph with sbiobert_base_cased_mli
embeddings.
https://www.nlm.nih.gov/pubs/techbull/nd20/brief/nd20_rx_norm_november_release.html