Sentence Entity Resolver for SNOMED Veterinary

Description

This model maps veterinary-related entities and concepts to SNOMED codes using sbiobert_base_cased_mli Sentence Bert Embeddings. It is trained with the augmented version of the sbiobertresolve_snomed_veterinary_wip model dataset.

Predicted Entities

Copy S3 URI

How to use

document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentenceDetectorDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models")\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols(["sentence", "token"])\
    .setOutputCol("word_embeddings")

ner = MedicalNerModel.pretrained("ner_clinical", "en", "clinical/models")\
    .setInputCols(["sentence", "token", "word_embeddings"])\
    .setOutputCol("ner")\

ner_converter = NerConverterInternal()\
    .setInputCols(["sentence", "token", "ner"])\
    .setOutputCol("ner_chunk")\
    .setWhiteList(['PROBLEM'])

c2doc = Chunk2Doc()\
    .setInputCols("ner_chunk")\
    .setOutputCol("ner_chunk_doc")

sbert_embedder = BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli", "en", "clinical/models")\
    .setInputCols(["ner_chunk_doc"])\
    .setOutputCol("sentence_embeddings")\
    .setCaseSensitive(False)

snomed_resolver = SentenceEntityResolverModel.pretrained("sbiobertresolve_snomed_veterinary", "en", "clinical/models") \
    .setInputCols(["sentence_embeddings"]) \
    .setOutputCol("snomed_code")\
    .setDistanceFunction("EUCLIDEAN")

resolver_pipeline = Pipeline(stages = [document_assembler,
                                       sentenceDetectorDL,
                                       tokenizer,
                                       word_embeddings,
                                       ner,
                                       ner_converter,
                                       c2doc,
                                       sbert_embedder,
                                       snomed_resolver])
text = [["The veterinary team, is closely monitoring the patient for signs of lymphoblastic lymphoma, a malignant neoplasm of lymphoid origin. They are also treating the patient's osteoarthritis, a degenerative joint disease. Additionally, the team is vigilantly observing the facility for potential outbreaks of mink distemper."]]

data = spark.createDataFrame(text, StringType()).toDF("text")

result = resolver_pipeline.fit(spark.createDataFrame([[""]]).toDF("text")).transform(data)
val document_assembler = new DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("document")

val sentenceDetectorDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models")
    .setInputCols("document")
    .setOutputCol("sentence")

val tokenizer = new Tokenizer()
    .setInputCols("sentence")
    .setOutputCol("token")

val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
    .setInputCols(Array("sentence", "token"))
    .setOutputCol("word_embeddings")

val ner = MedicalNerModel.pretrained("ner_clinical", "en", "clinical/models")
    .setInputCols(Array("sentence", "token", "word_embeddings"))
    .setOutputCol("ner")

val ner_converter = new NerConverterInternal()
    .setInputCols(Array("sentence", "token", "ner"))
    .setOutputCol("ner_chunk")
    .setWhiteList(Array("PROBLEM"))

val c2doc = new Chunk2Doc()
    .setInputCols("ner_chunk")
    .setOutputCol("ner_chunk_doc")

val sbert_embedder = BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli", "en", "clinical/models")
    .setInputCols("ner_chunk_doc")
    .setOutputCol("sentence_embeddings")
    .setCaseSensitive(False)

val snomed_resolver = SentenceEntityResolverModel.pretrained("sbiobertresolve_snomed_veterinary", "en", "clinical/models")
    .setInputCols("sentence_embeddings") 
    .setOutputCol("snomed_code")
    .setDistanceFunction("EUCLIDEAN")

val resolver_pipeline = new PipelineModel().setStages(Array(document_assembler,
                                       sentenceDetectorDL,
                                       tokenizer,
                                       word_embeddings,
                                       ner,
                                       ner_converter,
                                       c2doc,
                                       sbert_embedder,
                                       snomed_resolver))

val data= Seq("The veterinary team, is closely monitoring the patient for signs of lymphoblastic lymphoma, a malignant neoplasm of lymphoid origin. They are also treating the patient's osteoarthritis, a degenerative joint disease. Additionally, the team is vigilantly observing the facility for potential outbreaks of mink distemper.").toDF("text")

val result = resolver_pipeline.fit(data).transform(data)

Results

+---------------------------------------+-------+---------------+----------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+
|                              ner_chunk| entity|    snomed_code|                                         description|                                                   all_codes|                                                 resolutions|
+---------------------------------------+-------+---------------+----------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+
|                 lymphoblastic lymphoma|PROBLEM|312281000009102|                              lymphoblastic lymphoma|312281000009102:::188675007:::109979007:::1217301006:::11...|lymphoblastic lymphoma:::lymphoblastic lymphosarcoma:::b-...|
|a malignant neoplasm of lymphoid origin|PROBLEM|      443495005|               neoplasm of lymphoid system structure|443495005:::1156403002:::277604002:::127220001:::26947500...|neoplasm of lymphoid system structure:::composite lymphom...|
|           the patient's osteoarthritis|PROBLEM|      201826000|                              erosive osteoarthrosis|201826000:::43829003:::1162304008:::1679003:::201819000::...|erosive osteoarthrosis:::chronic osteoarthritis:::oligoar...|
|           a degenerative joint disease|PROBLEM|      201819000|degenerative joint disease involving multiple joints|201819000:::363056008:::399269003:::362975008:::1679003::...|degenerative joint disease involving multiple joints:::de...|
|                         mink distemper|PROBLEM|348361000009108|                                      mink distemper|348361000009108:::86031000009108:::207191000009103:::1901...|mink distemper:::dendropicos obsoletus:::xenops minutus o...|
+---------------------------------------+-------+---------------+----------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+

Model Information

Model Name: sbiobertresolve_snomed_veterinary
Compatibility: Healthcare NLP 5.3.3+
License: Licensed
Edition: Official
Input Labels: [sentence_embeddings]
Output Labels: [snomed_code]
Language: en
Size: 743.1 MB
Case sensitive: false

References

This model is trained with the Veterinary Extension to SNOMED CT(R): April 1, 2024 Release version (available at the website https://vtsl.vetmed.vt.edu/extension/ ) and the augmented version of NIH September 2023 SNOMED CT United States (US) Edition