ChunkResolver Loinc Clinical

Description

Entity Resolution model Based on KNN using Word Embeddings + Word Movers Distance.

Predicted Entities

LOINC Codes with clinical_embeddings.

Live Demo Open in Colab Copy S3 URI

How to use

...    
loinc_resolver = ChunkEntityResolverModel.pretrained("chunkresolve_loinc_clinical", "en", "clinical/models") \
  .setInputCols(["token", "chunk_embeddings"]) \
  .setOutputCol("loinc_code") \
  .setDistanceFunction("COSINE") \
  .setNeighbours(5)

pipeline_loinc = Pipeline(stages = [documentAssembler, sentenceDetector, tokenizer, stopwords, word_embeddings, clinical_ner, ner_converter, chunk_embeddings, loinc_resolver])

data = spark.createDataFrame([["""A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus (T2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, associated with an acute hepatitis, and obesity with a body mass index (BMI) of 33.5 kg/m2, presented with a one-week history of polyuria, polydipsia, poor appetite, and vomiting."""]]).toDF("text")

model = pipeline_loinc.fit(data)

results = model.transform(data)
...
val loinc_resolver = ChunkEntityResolverModel.pretrained("chunkresolve_loinc_clinical", "en", "clinical/models")
  .setInputCols(Array("token", "chunk_embeddings"))
  .setOutputCol("loinc_code")
  .setDistanceFunction("COSINE")
  .setNeighbours(5)

val pipeline_loinc = new Pipeline().setStages(Array(documentAssembler, sentenceDetector, tokenizer, stopwords, word_embeddings, clinical_ner, ner_converter, chunk_embeddings, loinc_resolver))

val data = Seq("A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus (T2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, associated with an acute hepatitis, and obesity with a body mass index (BMI) of 33.5 kg/m2, presented with a one-week history of polyuria, polydipsia, poor appetite, and vomiting.").toDF("text")

val result = pipeline_loinc.fit(data).transform(data)

Results

                                      Chunk  loinc-Code

0             gestational diabetes mellitus  44877-9
1                type two diabetes mellitus  44877-9
2                                      T2DM  93692-2
3 prior episode of HTG-induced pancreatitis  85695-5
4        associated with an acute hepatitis  24363-4
5            obesity with a body mass index  47278-7
6                        BMI) of 33.5 kg/m2  47214-2
7                                  polyuria  35234-4
8                                polydipsia  25541-4
9                             poor appetite  50056-1
10                                 vomiting  34175-0

Model Information

Name: chunkresolve_loinc_clinical  
Type: ChunkEntityResolverModel  
Compatibility: Spark NLP 2.5.0+  
License: Licensed  
Edition: Official  
Input labels: [token, chunk_embeddings]  
Output labels: [entity]  
Language: en  
Case sensitive: True  
Dependencies: embeddings_clinical  

Data Source

Trained on LOINC dataset with embeddings_clinical. https://loinc.org/