Sentence Entity Resolver for SNOMED codes (procedures and measurements)

Description

This model maps medical entities to SNOMED codes using Sentence Bert Embeddings. The corpus of this model includes Procedures and Measurement domains.

Predicted Entities

SNOMED Codes

Open in Colab Copy S3 URI

How to use

documentAssembler = DocumentAssembler()\
      .setInputCol("text")\
      .setOutputCol("ner_chunk")

sbert_embedder = BertSentenceEmbeddings\
      .pretrained('sbiobert_base_cased_mli','en','clinical/models')\
      .setInputCols(["ner_chunk"])\
      .setOutputCol("sbert_embeddings")

resolver = SentenceEntityResolverModel\
      .pretrained("sbiobertresolve_snomed_procedures_measurements", "en", "clinical/models") \
      .setInputCols(["ner_chunk", "sbert_embeddings"]) \
      .setOutputCol("snomed_code")

pipelineModel = PipelineModel(
    stages = [
        documentAssembler,
        sbert_embedder,
        resolver])

l_model = LightPipeline(pipelineModel)
result = l_model.fullAnnotate(['coronary calcium score', 'heart surgery', 'ct scan', 'bp value'])
val document_assembler = DocumentAssembler()
      .setInputCol("text")
      .setOutputCol("ner_chunk")

val sbert_embedder = BertSentenceEmbeddings
      .pretrained("sbiobert_base_cased_mli","en","clinical/models")
      .setInputCols(Array("ner_chunk"))
      .setOutputCol("sbert_embeddings")

val resolver = SentenceEntityResolverModel
      .pretrained("sbiobertresolve_snomed_procedures_measurements", "en", "clinical/models) 
      .setInputCols(Array("ner_chunk", "sbert_embeddings")) 
      .setOutputCol("snomed_code")

val pipelineModel= new PipelineModel().setStages(Array(document_assembler, sbert_embedder, resolver))

val l_model = LightPipeline(pipelineModel)
val result = l_model.fullAnnotate(Array("coronary calcium score", "heart surgery", "ct scan", "bp value"))
import nlu
nlu.load("en.resolve.snomed.procedures_measurements").predict("""coronary calcium score""")

Results

|    | chunk                  |      code | code_description              | all_k_code_desc                                                                 | all_k_codes                                                                                                                                                     |
|---:|:-----------------------|----------:|:------------------------------|:--------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------|
|  0 | coronary calcium score | 450360000 | Coronary artery calcium score | ['450360000', '450734004', '1086491000000104', '1086481000000101', '762241007'] | ['Coronary artery calcium score', 'Coronary artery calcium score', 'Dundee Coronary Risk Disk score', 'Dundee Coronary Risk rank', 'Dundee Coronary Risk Disk'] |
|  1 | heart surgery          |   2598006 | Open heart surgery            | ['2598006', '64915003', '119766003', '34068001', '233004008']                   | ['Open heart surgery', 'Operation on heart', 'Heart reconstruction', 'Heart valve replacement', 'Coronary sinus operation']                                     |
|  2 | ct scan                | 303653007 | CT of head                    | ['303653007', '431864000', '363023007', '418272005', '241577003']               | ['CT of head', 'CT guided injection', 'CT of site', 'CT angiography', 'CT of spine']                                                                            |
|  3 | bp value               |  75367002 | Blood pressure                | ['75367002', '6797001', '723232008', '46973005', '427732000']                   | ['Blood pressure', 'Mean blood pressure', 'Average blood pressure', 'Blood pressure taking', 'Speed of blood pressure response']                                |

Model Information

Model Name: sbiobertresolve_snomed_procedures_measurements
Compatibility: Healthcare NLP 3.3.0+
License: Licensed
Edition: Official
Input Labels: [sentence_chunk_embeddings]
Output Labels: [output]
Language: en
Case sensitive: false

Data Source

Trained on SNOMED code dataset with sbiobert_base_cased_mli sentence embeddings.