Description
This pretrained model maps UMLS codes to corresponding RxNorm codes.
Predicted Entities
How to use
documentAssembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("ner_chunk")
sbert_embedder = BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli", "en", "clinical/models")\
.setInputCols(["ner_chunk"])\
.setOutputCol("sbert_embeddings")\
.setCaseSensitive(False)
umls_resolver = SentenceEntityResolverModel.pretrained("sbiobertresolve_umls_drug_substance", "en", "clinical/models")\
.setInputCols(["sbert_embeddings"]) \
.setOutputCol("umls_code")\
.setDistanceFunction("EUCLIDEAN")
resolver2chunk = Resolution2Chunk()\
.setInputCols(["umls_code"])\
.setOutputCol("umls2chunk")
chunkerMapper = ChunkMapperModel.pretrained("umls_rxnorm_mapper", "en", "clinical/models")\
.setInputCols(["umls2chunk"])\
.setOutputCol("mappings")
pipeline = Pipeline(stages = [
documentAssembler,
sbert_embedder,
umls_resolver,
resolver2chunk,
chunkerMapper])
data = spark.createDataFrame([['Hydrogen peroxide 30 mg'], ['magnesium hydroxide 100 MG'], ['metformin 1000 MG'], ['dilaudid']]).toDF("text")
mapper_model = pipeline.fit(data)
result = mapper_model.transform(data)
Results
+--------------------------+---------+-----------+
|chunk |umls_code|rxnorm_code|
+--------------------------+---------+-----------+
|Hydrogen peroxide 30 mg |C1126248 |330565 |
|magnesium hydroxide 100 MG|C1134402 |337012 |
|metformin 1000 MG |C0987664 |316255 |
|dilaudid |C0728755 |224913 |
+--------------------------+---------+-----------+
Model Information
Model Name: | umls_rxnorm_mapper |
Compatibility: | Healthcare NLP 5.3.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [ner_chunk] |
Output Labels: | [mappings] |
Language: | en |
Size: | 3.0 MB |