ChunkResolver Loinc Clinical

Description

Entity Resolution model Based on KNN using Word Embeddings + Word Movers Distance.

Predicted Entities

LOINC Codes with clinical_embeddings.

Live Demo Open in Colab Download Copy S3 URI

How to use

...    
loinc_resolver = ChunkEntityResolverModel.pretrained("chunkresolve_loinc_clinical", "en", "clinical/models") \
  .setInputCols(["token", "chunk_embeddings"]) \
  .setOutputCol("loinc_code") \
  .setDistanceFunction("COSINE") \
  .setNeighbours(5)

pipeline_loinc = Pipeline(stages = [documentAssembler, sentenceDetector, tokenizer, stopwords, word_embeddings, clinical_ner, ner_converter, chunk_embeddings, loinc_resolver])

data = spark.createDataFrame([["""A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus (T2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, associated with an acute hepatitis, and obesity with a body mass index (BMI) of 33.5 kg/m2, presented with a one-week history of polyuria, polydipsia, poor appetite, and vomiting."""]]).toDF("text")

model = pipeline_loinc.fit(data)

results = model.transform(data)

...
val loinc_resolver = ChunkEntityResolverModel.pretrained("chunkresolve_loinc_clinical", "en", "clinical/models")
  .setInputCols(Array("token", "chunk_embeddings"))
  .setOutputCol("loinc_code")
  .setDistanceFunction("COSINE")
  .setNeighbours(5)

val pipeline_loinc = new Pipeline().setStages(Array(documentAssembler, sentenceDetector, tokenizer, stopwords, word_embeddings, clinical_ner, ner_converter, chunk_embeddings, loinc_resolver))

val data = Seq("A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus (T2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, associated with an acute hepatitis, and obesity with a body mass index (BMI) of 33.5 kg/m2, presented with a one-week history of polyuria, polydipsia, poor appetite, and vomiting.").toDF("text")

val result = pipeline_loinc.fit(data).transform(data)

Results

                                      Chunk  loinc-Code

           gestational diabetes mellitus  44877-9
              type two diabetes mellitus  44877-9
                                    T2DM  93692-2
prior episode of HTG-induced pancreatitis  85695-5
      associated with an acute hepatitis  24363-4
          obesity with a body mass index  47278-7
                      BMI) of 33.5 kg/m2  47214-2
                                polyuria  35234-4
                              polydipsia  25541-4
                           poor appetite  50056-1
                               vomiting  34175-0

Model Information

Name:	chunkresolve_loinc_clinical
Type:	ChunkEntityResolverModel
Compatibility:	Spark NLP 2.5.0+
License:	Licensed
Edition:	Official
Input labels:	[token, chunk_embeddings]
Output labels:	[entity]
Language:	en
Case sensitive:	True
Dependencies:	embeddings_clinical

Data Source

Trained on LOINC dataset with embeddings_clinical. https://loinc.org/

PREVIOUSICD10CM Puerile Entity Resolver

NEXTDetect Assertion Status (DL Large)