Description
This model maps clinical entities and concepts to 4 major categories of UMLS CUI codes using sbiobert_base_cased_mli
Sentence Bert Embeddings. It has faster load time, with a speedup of about 6X when compared to previous versions.
Predicted Entities
This model returns CUI (concept unique identifier) codes for Clinical Findings
, Medical Devices
, Anatomical Structures and Injuries & Poisoning terms
How to use
sbiobertresolve_umls_major_concepts
resolver model must be used with sbiobert_base_cased_mli
as embeddings ner_jsl
as NER model. Cerebrovascular_Disease, Communicable_Disease, Diabetes, Disease_Syndrome_Disorder, Heart_Disease, Hyperlipidemia, Hypertension, Injury_or_Poisoning, Kidney_Disease, Medical-Device, Obesity, Oncological, Overweight, Psychological_Condition, Symptom, VS_Finding, ImagingFindings, EKG_Findings
set in .setWhiteList()
.
...
chunk2doc = Chunk2Doc().setInputCols("ner_chunk").setOutputCol("ner_chunk_doc")
sbert_embedder = BertSentenceEmbeddings\
.pretrained("sbiobert_base_cased_mli",'en','clinical/models')\
.setInputCols(["ner_chunk_doc"])\
.setOutputCol("sbert_embeddings")
resolver = SentenceEntityResolverModel\
.pretrained("sbiobertresolve_umls_major_concepts","en", "clinical/models") \
.setInputCols(["ner_chunk_doc", "sbert_embeddings"]) \
.setOutputCol("resolution")\
.setDistanceFunction("EUCLIDEAN")
pipeline = Pipeline(stages = [documentAssembler, sentenceDetector, tokenizer, stopwords, word_embeddings, clinical_ner, ner_converter, chunk2doc, sbert_embedder, resolver])
data = spark.createDataFrame([["The patient complains of ankle pain after falling from stairs. She has been advised Arthroscopy by her primary care pyhsician"]]).toDF("text")
results = pipeline.fit(data).transform(data)
...
val chunk2doc = Chunk2Doc().setInputCols("ner_chunk").setOutputCol("ner_chunk_doc")
val sbert_embedder = BertSentenceEmbeddings
.pretrained("sbiobert_base_cased_mli", "en","clinical/models")
.setInputCols(Array("ner_chunk_doc"))
.setOutputCol("sbert_embeddings")
val resolver = SentenceEntityResolverModel
.pretrained("sbiobertresolve_umls_major_concepts", "en", "clinical/models")
.setInputCols(Array("ner_chunk_doc", "sbert_embeddings"))
.setOutputCol("resolution")
.setDistanceFunction("EUCLIDEAN")
val p_model = new Pipeline().setStages(Array(documentAssembler, sentenceDetector, tokenizer, stopwords, word_embeddings, clinical_ner, ner_converter, chunk2doc, sbert_embedder, resolver))
val data = Seq(""The patient complains of ankle pain after falling from stairs. She has been advised Arthroscopy by her primary care pyhsician"").toDF("text")
val res = p_model.fit(data).transform(data)
import nlu
nlu.load("en.resolve.umls").predict("""The patient complains of ankle pain after falling from stairs. She has been advised Arthroscopy by her primary care pyhsician""")
Results
| | ner_chunk | code | code_description |
|---:|:------------------------------|:-------------|:---------------------------------------------|
| 0 | ankle | C4047548 | bilateral ankle joint pain (finding) |
| 1 | falling from stairs | C0417023 | fall from stairs |
| 2 | Arthroscopy | C0179144 | arthroscope |
| 3 | primary care pyhsician | C3266804 | referred by primary care physician (finding) |
Model Information
Model Name: | sbiobertresolve_umls_major_concepts |
Compatibility: | Healthcare NLP 3.2.3+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence_chunk_embeddings] |
Output Labels: | [umls_code] |
Language: | en |
Case sensitive: | false |
Data Source
Trained on data sampled from https://www.nlm.nih.gov/research/umls/index.html