Description
Map clinical entities to UMLS CUI codes.
Predicted Entities
This model returns CUI (concept unique identifier) codes for 200K concepts from clinical findings. https://www.nlm.nih.gov/research/umls/index.html
Live Demo Open in Colab Copy S3 URI
How to use
...
chunk2doc = Chunk2Doc().setInputCols("ner_chunk").setOutputCol("ner_chunk_doc")
sbert_embedder = BertSentenceEmbeddings\
.pretrained("sbiobert_base_cased_mli",'en','clinical/models')\
.setInputCols(["ner_chunk_doc"])\
.setOutputCol("sbert_embeddings")
resolver = SentenceEntityResolverModel
.pretrained("sbiobertresolve_umls_findings","en", "clinical/models") \
.setInputCols(["ner_chunk", "sbert_embeddings"]) \
.setOutputCol("resolution")\
.setDistanceFunction("EUCLIDEAN")
pipeline = Pipeline(stages = [documentAssembler, sentenceDetector, tokenizer, stopwords, word_embeddings, clinical_ner, ner_converter, chunk2doc, sbert_embedder, resolver])
data = spark.createDataFrame([["""A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus (T2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, associated with an acute hepatitis, and obesity with a body mass index (BMI) of 33.5 kg/m2, presented with a one-week history of polyuria, polydipsia, poor appetite, and vomiting."""]]).toDF("text")
results = pipeline.fit(data).transform(data)
import nlu
nlu.load("en.resolve.umls.findings").predict("""A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus (T2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, associated with an acute hepatitis, and obesity with a body mass index (BMI) of 33.5 kg/m2, presented with a one-week history of polyuria, polydipsia, poor appetite, and vomiting.""")
Results
| | ner_chunk | cui_code |
|---:|:--------------------------------------|:-----------|
| 0 | gestational diabetes mellitus | C2183115 |
| 1 | subsequent type two diabetes mellitus | C3532488 |
| 2 | T2DM | C3280267 |
| 3 | HTG-induced pancreatitis | C4554179 |
| 4 | an acute hepatitis | C4750596 |
| 5 | obesity | C1963185 |
| 6 | a body mass index | C0578022 |
| 7 | polyuria | C3278312 |
| 8 | polydipsia | C3278316 |
| 9 | poor appetite | C0541799 |
| 10 | vomiting | C0042963 |
Model Information
Model Name: | sbiobertresolve_umls_findings |
Compatibility: | Healthcare NLP 3.0.2+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence_embeddings] |
Output Labels: | [umls_code] |
Language: | en |
Data Source
https://www.nlm.nih.gov/research/umls/index.html