Description
This model maps extracted medical entities to Logical Observation Identifiers Names and Codes (LOINC) codes using sbluebert_base_uncased_mli Sentence Bert Embeddings.
It also provides the official resolution of the codes within the brackets.
How to use
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models") \
.setInputCols(["document"]) \
.setOutputCol("sentence")
tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
.setInputCols(["sentence", "token"])\
.setOutputCol("embeddings")
ner_model = MedicalNerModel.pretrained("ner_radiology", "en", "clinical/models")\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner_radiology")
ner_converter = NerConverterInternal()\
.setInputCols(["sentence", "token", "ner_radiology"])\
.setOutputCol("ner_chunk_radiology")\
.setWhiteList(["Test"])
ner_model_jsl = MedicalNerModel.pretrained("ner_jsl", "en", "clinical/models")\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner_jsl")
ner_converter_jsl = NerConverterInternal()\
.setInputCols(["sentence", "token", "ner_jsl"])\
.setOutputCol("ner_chunk_jsl")\
.setWhiteList(["Test"])
chunk_merger = ChunkMergeApproach()\
.setInputCols("ner_chunk_jsl", "ner_chunk_radiology")\
.setOutputCol('merged_ner_chunk')
chunk2doc = Chunk2Doc()\
.setInputCols("merged_ner_chunk")\
.setOutputCol("ner_chunk_doc")
sbert_embedder = BertSentenceEmbeddings.pretrained("sbluebert_base_uncased_mli", "en", "clinical/models")\
.setInputCols(["ner_chunk_doc"])\
.setOutputCol("sbluebert_embeddings")\
.setCaseSensitive(False)
resolver = SentenceEntityResolverModel.pretrained("sbluebertresolve_loinc_uncased","en", "clinical/models") \
.setInputCols(["sbluebert_embeddings"]) \
.setOutputCol("resolution")\
.setDistanceFunction("EUCLIDEAN")
nlpPipeline = Pipeline(stages=[document_assembler,
sentence_detector,
tokenizer,
word_embeddings,
ner_model,
ner_converter,
ner_model_jsl,
ner_converter_jsl,
chunk_merger,
chunk2doc,
sbert_embedder,
resolver])
data = spark.createDataFrame([["""A 65-year-old woman presents to the office with generalized fatigue for the last 4 months.
She used to walk 1 mile each evening but now gets tired after 1-2 blocks. She has a history of Crohn disease and hypertension
for which she receives appropriate medications. She is married and lives with her husband. She eats a balanced diet that
includes chicken, fish, pork, fruits, and vegetables. She rarely drinks alcohol and denies tobacco use. A physical examination
is unremarkable. Laboratory studies show the following: Hemoglobin: 9.8g/dL, Hematocrit: 32%, Mean Corpuscular Volume: 110 μm3"""]]).toDF("text")
result = nlpPipeline.fit(data).transform(data)
document_assembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentence_detector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models") \
.setInputCols(["document"]) \
.setOutputCol("sentence")
tokenizer = nlp.Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
word_embeddings = nlp.WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
.setInputCols(["sentence", "token"])\
.setOutputCol("embeddings")
ner_model = medical.NerModel.pretrained("ner_radiology", "en", "clinical/models")\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner_radiology")
ner_converter = medical.NerConverterInternal()\
.setInputCols(["sentence", "token", "ner_radiology"])\
.setOutputCol("ner_chunk_radiology")\
.setWhiteList(["Test"])
ner_model_jsl = medical.NerModel.pretrained("ner_jsl", "en", "clinical/models")\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner_jsl")
ner_converter_jsl = medical.NerConverterInternal()\
.setInputCols(["sentence", "token", "ner_jsl"])\
.setOutputCol("ner_chunk_jsl")\
.setWhiteList(["Test"])
chunk_merger = medical.ChunkMergeApproach()\
.setInputCols("ner_chunk_jsl", "ner_chunk_radiology")\
.setOutputCol('merged_ner_chunk')
chunk2doc = medical.Chunk2Doc()\
.setInputCols("merged_ner_chunk")\
.setOutputCol("ner_chunk_doc")
sbert_embedder = nlp.BertSentenceEmbeddings.pretrained("sbluebert_base_uncased_mli", "en", "clinical/models")\
.setInputCols(["ner_chunk_doc"])\
.setOutputCol("sbluebert_embeddings")\
.setCaseSensitive(False)
resolver = medical.SentenceEntityResolverModel.pretrained("sbluebertresolve_loinc_uncased","en", "clinical/models") \
.setInputCols(["sbluebert_embeddings"]) \
.setOutputCol("resolution")\
.setDistanceFunction("EUCLIDEAN")
nlpPipeline = nlp.Pipeline(stages=[document_assembler,
sentence_detector,
tokenizer,
word_embeddings,
ner_model,
ner_converter,
ner_model_jsl,
ner_converter_jsl,
chunk_merger,
chunk2doc,
sbert_embedder,
resolver])
data = spark.createDataFrame([["""A 65-year-old woman presents to the office with generalized fatigue for the last 4 months.
She used to walk 1 mile each evening but now gets tired after 1-2 blocks. She has a history of Crohn disease and hypertension
for which she receives appropriate medications. She is married and lives with her husband. She eats a balanced diet that
includes chicken, fish, pork, fruits, and vegetables. She rarely drinks alcohol and denies tobacco use. A physical examination
is unremarkable. Laboratory studies show the following: Hemoglobin: 9.8g/dL, Hematocrit: 32%, Mean Corpuscular Volume: 110 μm3"""]]).toDF("text")
result = nlpPipeline.fit(data).transform(data)
val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")
.setInputCols(Array("document"))
.setOutputCol("sentence")
val tokenizer = new Tokenizer()
.setInputCols(Array("sentence"))
.setOutputCol("token")
val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical","en","clinical/models")
.setInputCols(Array("sentence","token"))
.setOutputCol("embeddings")
val ner_model = MedicalNerModel.pretrained("ner_radiology", "en", "clinical/models")
.setInputCols(Array("sentence", "token", "embeddings"))
.setOutputCol("ner_radiology")
val ner_converter = new NerConverterInternal()
.setInputCols(Array("sentence", "token", "ner_radiology"))
.setOutputCol("ner_chunk_radiology")
.setWhiteList(Array("Test"))
val ner_model_jsl = MedicalNerModel.pretrained("ner_jsl", "en", "clinical/models")
.setInputCols(Array("sentence", "token", "embeddings"))
.setOutputCol("ner_jsl")
val ner_converter_jsl = new NerConverterInternal()
.setInputCols(Array("sentence", "token", "ner_jsl"))
.setOutputCol("ner_chunk_jsl")
.setWhiteList(Array("Test"))
val chunk_merger = new ChunkMergeApproach()
.setInputCols(Array("ner_chunk_jsl", "ner_chunk_radiology"))
.setOutputCol("merged_ner_chunk")
val chunk2doc = new Chunk2Doc()
.setInputCols("merged_ner_chunk")
.setOutputCol("ner_chunk_doc")
val sbert_embedder = BertSentenceEmbeddings.pretrained("sbluebert_base_uncased_mli", "en", "clinical/models")
.setInputCols("ner_chunk_doc")
.setOutputCol("sbluebert_embeddings")
.setCaseSensitive(false)
val resolver = SentenceEntityResolverModel.pretrained("sbluebertresolve_loinc_uncased","en", "clinical/models")
.setInputCols("sbluebert_embeddings")
.setOutputCol("resolution")
.setDistanceFunction("EUCLIDEAN")
val nlpPipeline = new Pipeline().setStages(Array(document_assembler,
sentence_detector,
tokenizer,
word_embeddings,
ner_model,
ner_converter,
ner_model_jsl,
ner_converter_jsl,
chunk_merger,
chunk2doc,
sbert_embedder,
resolver))
val data = Seq("""A 65-year-old woman presents to the office with generalized fatigue for the last 4 months.
She used to walk 1 mile each evening but now gets tired after 1-2 blocks. She has a history of Crohn disease and hypertension
for which she receives appropriate medications. She is married and lives with her husband. She eats a balanced diet that
includes chicken, fish, pork, fruits, and vegetables. She rarely drinks alcohol and denies tobacco use. A physical examination
is unremarkable. Laboratory studies show the following: Hemoglobin: 9.8g/dL, Hematocrit: 32%, Mean Corpuscular Volume: 110 μm3""").toDF("text")
val result = nlpPipeline.fit(data).transform(data)
Results
| chunk | begin | end | ner_label | loinc_code | description | resolutions | all_codes | aux_labels |
|-------------------------|------:|----:|-----------|------------|-----------------------------------------------------------------|-----------------------------------------------------------------------------|---------------------------------------------------------------------------|---------------------------------------------------------------------------|
| physical examination | 450 | 469 | Test | 29544-4 | Physical findings [Physical findings] | Physical findings [Physical findings]:::Physical exam by ... | 29544-4:::11384-5:::100223-7:::29545-1:::100038-9:::89492... | ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACT... |
| Laboratory studies | 490 | 507 | Test | 26436-6 | Laboratory studies (set) [Laboratory studies (set)] | Laboratory studies (set) [Laboratory studies (set)]:::Lab... | 26436-6:::11502-2:::52482-7:::34075-2:::100455-5:::56850-... | ACTIVE:::ACTIVE:::DISCOURAGED:::ACTIVE:::ACTIVE:::ACTIVE:... |
| Hemoglobin | 529 | 538 | Test | 10346-5 | Hemoglobin [Hemoglobin A [Units/volume] in Blood by Elect...] | Hemoglobin [Hemoglobin A [Units/volume] in Blood by Elect...] | 10346-5:::2030-5:::109592-6:::11559-2:::50191-6:::101634-... | ACTIVE:::ACTIVE:::TRIAL:::ACTIVE:::ACTIVE:::ACTIVE:::ACTI... |
| Hematocrit | 550 | 559 | Test | 101655-9 | Hct [Basic metabolic and hematocrit panel - Blood] | Hct [Basic metabolic and hematocrit panel - Blood]:::stea... | 101655-9:::103845-4:::10346-5:::16966-4:::12250-7:::24360... | ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::DISCOURAGED:::ACTIVE:... |
| Mean Corpuscular Volume | 567 | 589 | Test | 82627-1 | Mean phase [Mean phase [Angle] Left ventricle SPECT --W s...] | Mean phase [Mean phase [Angle] Left ventricle SPECT --W s...] | 82627-1:::33878-0:::59117-2:::60949-5:::8478-0:::103205-1... | ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACT... |
Model Information
| Model Name: | sbluebertresolve_loinc_uncased |
| Compatibility: | Healthcare NLP 6.3.0+ |
| License: | Licensed |
| Edition: | Official |
| Input Labels: | [sbluebert_embeddings] |
| Output Labels: | [loinc_code] |
| Language: | en |
| Size: | 693.1 MB |
| Case sensitive: | false |