Sentence Entity Resolver for Snomed Aux Concepts, INT version (``sbiobert_base_cased_mli`` embeddings)

Description

This model maps clinical entities and concepts to Snomed codes (INT version) codes using sbiobert_base_cased_mli Sentence Bert Embeddings. This model is capable of extracting Morph Abnormality, Procedure, Substance, Physical Object, and Body Structure concepts of Snomed codes.

It has faster load time, with a speedup of about 6X when compared to previous versions. Also the load process now is more memory friendly meaning that the maximum memory required during load time is smaller, reducing the chances of OOM exceptions, and thus relaxing hardware requirements.

Predicted Entities

Predicts Snomed Codes and their normalized definition for each chunk.

Live Demo Open in Colab Copy S3 URI

How to use

document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")
         
sentence_detector = SentenceDetector()\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols(["sentence", "token"])\
    .setOutputCol("embeddings")

clinical_ner = MedicalNerModel.pretrained("ner_clinical_large", "en", "clinical/models") \
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("ner")

ner_converter = NerConverter()\
 	.setInputCols(["sentence", "token", "ner"])\
 	.setOutputCol("ner_chunk")

chunk2doc = Chunk2Doc()\
    .setInputCols("ner_chunk")\
    .setOutputCol("ner_chunk_doc")

sbert_embedder = BertSentenceEmbeddings\
    .pretrained("sbiobert_base_cased_mli","en","clinical/models")\
    .setInputCols(["ner_chunk_doc"])\
    .setOutputCol("sbert_embeddings")

snomed_aux_int_resolver = SentenceEntityResolverModel\
    .pretrained("sbiobertresolve_snomed_auxConcepts_int","en", "clinical/models") \
    .setInputCols(["ner_chunk", "sbert_embeddings"]) \
    .setOutputCol("resolution")\
    .setDistanceFunction("EUCLIDEAN")

nlpPipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, word_embeddings, clinical_ner, ner_converter, chunk2doc, sbert_embedder, snomed_aux_int_resolver])

data = spark.createDataFrame([["This is an 82 - year-old male with a history of prior tobacco use , hypertension , chronic renal insufficiency , COPD , gastritis , and TIA who initially presented to Braintree with a non-ST elevation MI and Guaiac positive stools , transferred to St . Margaret\'s Center for Women & Infants for cardiac catheterization with PTCA to mid LAD lesion complicated by hypotension and bradycardia requiring Atropine , IV fluids and transient dopamine possibly secondary to vagal reaction , subsequently transferred to CCU for close monitoring , hemodynamically stable at the time of admission to the CCU ."]]).toDF("text")

results = nlpPipeline.fit(data).transform(data)
val document_assembler = new DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("document")

val sentence_detector = new SentenceDetector()
    .setInputCols("document")
    .setOutputCol("sentence")

val tokenizer = new Tokenizer()
    .setInputCols("sentence")
    .setOutputCol("token")

val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
    .setInputCols(Array("sentence", "token"))
    .setOutputCol("embeddings")

val clinical_ner = MedicalNerModel.pretrained("ner_clinical", "en", "clinical/models")
    .setInputCols(Array("sentence", "token", "embeddings"))
    .setOutputCol("ner")

val ner_converter = new NerConverter()
    .setInputCols(Array("sentence", "token", "ner"))
    .setOutputCol("ner_chunk")

val chunk2doc = new Chunk2Doc()
    .setInputCols("ner_chunk")
    .setOutputCol("ner_chunk_doc")

val sbert_embedder = BertSentenceEmbeddings
    .pretrained("sbiobert_base_cased_mli","en","clinical/models")
    .setInputCols("ner_chunk_doc")
    .setOutputCol("sbert_embeddings")

val snomed_aux_int_resolver = SentenceEntityResolverModel
    .pretrained("sbiobertresolve_snomed_auxConcepts_int","en", "clinical/models")
    .setInputCols(Array("ner_chunk", "sbert_embeddings"))
    .setOutputCol("resolution")
    .setDistanceFunction("EUCLIDEAN")

val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, word_embeddings, clinical_ner, ner_converter, chunk2doc, sbert_embedder, snomed_aux_int_resolver))

val data = Seq("""This is an 82 - year-old male with a history of prior tobacco use , hypertension , chronic renal insufficiency , COPD , gastritis , and TIA who initially presented to Braintree with a non-ST elevation MI and Guaiac positive stools , transferred to St . Margaret\'s Center for Women & Infants for cardiac catheterization with PTCA to mid LAD lesion complicated by hypotension and bradycardia requiring Atropine , IV fluids and transient dopamine possibly secondary to vagal reaction , subsequently transferred to CCU for close monitoring , hemodynamically stable at the time of admission to the CCU .""").toDS().toDF("text")

val result = pipeline.fit(data).transform(data)
import nlu
nlu.load("en.resolve.snomed.aux_concepts_int").predict("""This is an 82 - year-old male with a history of prior tobacco use , hypertension , chronic renal insufficiency , COPD , gastritis , and TIA who initially presented to Braintree with a non-ST elevation MI and Guaiac positive stools , transferred to St . Margaret\'s Center for Women & Infants for cardiac catheterization with PTCA to mid LAD lesion complicated by hypotension and bradycardia requiring Atropine , IV fluids and transient dopamine possibly secondary to vagal reaction , subsequently transferred to CCU for close monitoring , hemodynamically stable at the time of admission to the CCU .""")

Results

+--------------------+-----+---+---------+---------------+----------+--------------------+--------------------+
|               chunk|begin|end|   entity|           code|confidence|         resolutions|               codes|
+--------------------+-----+---+---------+---------------+----------+--------------------+--------------------+
|        hypertension|   68| 79|  PROBLEM|      148439002|    0.2138|risk factors pres...|148439002:::42595...|
|chronic renal ins...|   83|109|  PROBLEM|      722403003|    0.8517|gastrointestinal ...|722403003:::13781...|
|                COPD|  113|116|  PROBLEM|845101000000100|    0.0962|management of chr...|845101000000100::...|
|           gastritis|  120|128|  PROBLEM|      711498001|    0.3398|magnetic resonanc...|711498001:::71771...|
|                 TIA|  136|138|  PROBLEM|      449758002|    0.1927|traumatic infarct...|449758002:::85844...|
|a non-ST elevatio...|  182|202|  PROBLEM|  1411000087101|    0.0823|ct of left knee::...|1411000087101:::3...|
|Guaiac positive s...|  208|229|  PROBLEM|      388507006|    0.0555|asparagus rast:::...|388507006:::71771...|
|cardiac catheteri...|  295|317|     TEST|       41976001|    0.9790|cardiac catheteri...|41976001:::705921...|
|                PTCA|  324|327|TREATMENT|      312644004|    0.0616|angioplasty of po...|312644004:::41507...|
|      mid LAD lesion|  332|345|  PROBLEM|       91749005|    0.1399|structure of firs...|91749005:::917470...|
+--------------------+-----+---+---------+---------------+----------+--------------------+--------------------+

Model Information

Model Name: sbiobertresolve_snomed_auxConcepts_int
Compatibility: Healthcare NLP 3.0.4+
License: Licensed
Edition: Official
Input Labels: [ner_chunk, sbert_embeddings]
Output Labels: [snomed_code_int_aux_loaded]
Language: en
Case sensitive: false

Data Source

Trained on SNOMED (INT version) Findings with sbiobert_base_cased_mli sentence embeddings. https://www.snomed.org/