Description
Entity Resolution model Based on KNN using Word Embeddings + Word Movers Distance.
Predicted Entities
LOINC Codes with clinical_embeddings
.
Live Demo Open in Colab Copy S3 URI
How to use
...
loinc_resolver = ChunkEntityResolverModel.pretrained("chunkresolve_loinc_clinical", "en", "clinical/models") \
.setInputCols(["token", "chunk_embeddings"]) \
.setOutputCol("loinc_code") \
.setDistanceFunction("COSINE") \
.setNeighbours(5)
pipeline_loinc = Pipeline(stages = [documentAssembler, sentenceDetector, tokenizer, stopwords, word_embeddings, clinical_ner, ner_converter, chunk_embeddings, loinc_resolver])
data = spark.createDataFrame([["""A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus (T2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, associated with an acute hepatitis, and obesity with a body mass index (BMI) of 33.5 kg/m2, presented with a one-week history of polyuria, polydipsia, poor appetite, and vomiting."""]]).toDF("text")
model = pipeline_loinc.fit(data)
results = model.transform(data)
...
val loinc_resolver = ChunkEntityResolverModel.pretrained("chunkresolve_loinc_clinical", "en", "clinical/models")
.setInputCols(Array("token", "chunk_embeddings"))
.setOutputCol("loinc_code")
.setDistanceFunction("COSINE")
.setNeighbours(5)
val pipeline_loinc = new Pipeline().setStages(Array(documentAssembler, sentenceDetector, tokenizer, stopwords, word_embeddings, clinical_ner, ner_converter, chunk_embeddings, loinc_resolver))
val data = Seq("A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus (T2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, associated with an acute hepatitis, and obesity with a body mass index (BMI) of 33.5 kg/m2, presented with a one-week history of polyuria, polydipsia, poor appetite, and vomiting.").toDF("text")
val result = pipeline_loinc.fit(data).transform(data)
Results
Chunk loinc-Code
0 gestational diabetes mellitus 44877-9
1 type two diabetes mellitus 44877-9
2 T2DM 93692-2
3 prior episode of HTG-induced pancreatitis 85695-5
4 associated with an acute hepatitis 24363-4
5 obesity with a body mass index 47278-7
6 BMI) of 33.5 kg/m2 47214-2
7 polyuria 35234-4
8 polydipsia 25541-4
9 poor appetite 50056-1
10 vomiting 34175-0
Model Information
Name: | chunkresolve_loinc_clinical | |
Type: | ChunkEntityResolverModel | |
Compatibility: | Spark NLP 2.5.0+ | |
License: | Licensed | |
Edition: | Official | |
Input labels: | [token, chunk_embeddings] | |
Output labels: | [entity] | |
Language: | en | |
Case sensitive: | True | |
Dependencies: | embeddings_clinical |
Data Source
Trained on LOINC dataset with embeddings_clinical
.
https://loinc.org/