Detect SNOMED Terms

Description

This Name Entity Recognition(NER) model extracts SNOMED terms from clinical text. It has been trained using the embeddings_clinical embeddings model.

Predicted Entities

snomed_term

Copy S3 URI

How to use

document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models") \
    .setInputCols(["document"]) \
    .setOutputCol("sentence") 

tokenizer = Tokenizer() \
    .setInputCols(["sentence"]) \
    .setOutputCol("token")

embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols("sentence", "token")\
    .setOutputCol("embeddings")

ner = MedicalNerModel.pretrained("ner_snomed_term", "en", "clinical/models") \
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("ner")

ner_converter = NerConverter() \
    .setInputCols(["sentence", "token", "ner"]) \
    .setOutputCol("ner_chunk")

nlpPipeline = Pipeline(stages=[document_assembler,
                            sentence_detector,
                            tokenizer,
                            embeddings,
                            ner,
                            ner_converter])

empty_data = spark.createDataFrame([[""]]).toDF("text")

model = nlpPipeline.fit(empty_data)

text_list = ["The patient was diagnosed with acute appendicitis and scheduled for immediate surgery.",
"Due to experiencing chronic pain the patient was referred to a fibromyalgia specialist for further evaluation.",
"His hypertension is currently managed with a combination of lifestyle modifications and medication.",
"The child was brought in with symptoms of acute otitis including ear pain and fever.",
"Laboratory tests indicate the individual has hyperthyroidism requiring further endocrinological assessment.",
"The radiograph showed evidence of a distal radius fracture from a recent fall."]

data = spark.createDataFrame(text_list, StringType()).toDF("text")

result = model.transform(data)
import spark.implicits._

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models")
  .setInputCols(Array("document"))
  .setOutputCol("sentence")

val tokenizer = new Tokenizer()
  .setInputCols(Array("sentence"))
  .setOutputCol("token")

val embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
  .setInputCols(Array("sentence", "token"))
  .setOutputCol("embeddings")

val ner = MedicalNerModel.pretrained("ner_snomed_term", "en", "clinical/models")
  .setInputCols(Array("sentence", "token", "embeddings"))
  .setOutputCol("ner")

val nerConverter = new NerConverter()
  .setInputCols(Array("sentence", "token", "ner"))
  .setOutputCol("ner_chunk")

val pipeline = new Pipeline()
  .setStages(Array(documentAssembler, sentenceDetector, tokenizer, embeddings, ner, nerConverter))

val textList = Seq(
  "The patient was diagnosed with acute appendicitis and scheduled for immediate surgery.",
  "Due to experiencing chronic pain the patient was referred to a fibromyalgia specialist for further evaluation.",
  "His hypertension is currently managed with a combination of lifestyle modifications and medication.",
  "The child was brought in with symptoms of acute otitis including ear pain and fever.",
  "Laboratory tests indicate the individual has hyperthyroidism requiring further endocrinological assessment.",
  "The radiograph showed evidence of a distal radius fracture from a recent fall."
)

val data = Seq(textList).toDS.toDF("text")

val result = pipeline.fit(data).transform(data)

Results

+------------------+-----------+
|chunk             |ner_label  |
+------------------+-----------+
|acute appendicitis|snomed_term|
|chronic pain      |snomed_term|
|fibromyalgia      |snomed_term|
|hypertension      |snomed_term|
|otitis            |snomed_term|
|ear pain          |snomed_term|
|hyperthyroidism   |snomed_term|
|radiograph        |snomed_term|
|radius fracture   |snomed_term|
+------------------+-----------+

Model Information

Model Name: ner_snomed_term
Compatibility: Healthcare NLP 5.2.1+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en
Size: 14.6 MB

References

In-house annotated dataset

Benchmarking

label          precision  recall  f1-score  support 
B-snomed_term  0.87       0.88    0.87      5210    
I-snomed_term  0.85       0.91    0.88      4922    
micro-avg      0.86       0.89    0.88      10132   
macro-avg      0.86       0.89    0.88      10132   
weighted-avg   0.86       0.89    0.88      10132