Description
Map clinical NER entities to LOINC codes.
Predicted Entities
LOINC codes - per input NER entity
Live Demo Open in Colab Copy S3 URI
How to use
chunk2doc = Chunk2Doc().setInputCols("ner_chunk").setOutputCol("ner_chunk_doc")
sbert_embedder = BertSentenceEmbeddings\
.pretrained("sbluebert_base_uncased_mli","en","clinical/models")\
.setInputCols(["ner_chunk_doc"])\
.setOutputCol("sbert_embeddings")
resolver = SentenceEntityResolverModel.pretrained("sbluebertresolve_loinc","en", "clinical/models") \
.setInputCols(["sbert_embeddings"]) \
.setOutputCol("resolution")\
.setDistanceFunction("EUCLIDEAN")
pipeline_loinc = Pipeline(stages = [documentAssembler, sentenceDetector, tokenizer, stopwords, word_embeddings, clinical_ner, ner_converter, chunk2doc, sbert_embedder, resolver])
model = pipeline_loinc.fit(spark.createDataFrame([["""A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus (T2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, associated with an acute hepatitis, and obesity with a body mass index (BMI) of 33.5 kg/m2, presented with a one-week history of polyuria, polydipsia, poor appetite, and vomiting."""]]).toDF("text"))
results = model.transform(data)
Results
| | chunk | loinc_code |
|---:|:--------------------------------------|:-------------|
| 0 | gestational diabetes mellitus | 45636-8 |
| 1 | subsequent type two diabetes mellitus | 44877-9 |
| 2 | T2DM | 45636-8 |
| 3 | HTG-induced pancreatitis | 79102-0 |
| 4 | an acute hepatitis | 28083-4 |
| 5 | obesity | 50227-8 |
| 6 | a body mass index | 59574-4 |
| 7 | BMI | 59574-4 |
| 8 | polyuria | 28239-2 |
| 9 | polydipsia | 90552-1 |
| 10 | poor appetite | 65961-5 |
| 11 | vomiting | 81224-8 |
Model Information
Model Name: | sbluebertresolve_loinc |
Compatibility: | Healthcare NLP 3.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence_embeddings] |
Output Labels: | [loinc_code] |
Language: | en |
Data Source
Trained on standard LOINC coding system.