ICD10CM Text Matcher

Description

This model extracts icd10cm entities in clinical notes using rule-based TextMatcherInternal annotator.

Predicted Entities

ICD10_ENTITY

How to use

 documentAssembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

tokenizer = Tokenizer()\
    .setInputCols(["document"])\
    .setOutputCol("token")

text_matcher = TextMatcherInternalModel.pretrained("icd10cm_matcher","en","clinical/models") \
    .setInputCols(["document", "token"])\
    .setOutputCol("icd10cm")\
    .setMergeOverlapping(True)

mathcer_pipeline = Pipeline().setStages([
    documentAssembler,
    tokenizer,
    text_matcher])

data = spark.createDataFrame([["""A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus, associated with obesity with a body mass index (BMI) of 33.5 kg/m2, presented with a one-week history of polyuria, polydipsia, poor appetite, and vomiting. Two weeks prior to presentation, she was treated with a five-day course of amoxicillin for a respiratory tract infection."""]]).toDF("text")

result = mathcer_pipeline.fit(data).transform(data)

documentAssembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

tokenizer = nlp.Tokenizer()\
    .setInputCols(["document"])\
    .setOutputCol("token")

text_matcher = medical.TextMatcherInternalModel.pretrained("icd10cm_matcher","en","clinical/models") \
    .setInputCols(["document", "token"])\
    .setOutputCol("icd10cm")\
    .setMergeOverlapping(True)

mathcer_pipeline = nlp.Pipeline().setStages([
    documentAssembler,
    tokenizer,
    text_matcher])

data = spark.createDataFrame([["""A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus, associated with obesity with a body mass index (BMI) of 33.5 kg/m2, presented with a one-week history of polyuria, polydipsia, poor appetite, and vomiting. Two weeks prior to presentation, she was treated with a five-day course of amoxicillin for a respiratory tract infection."""]]).toDF("text")

result = mathcer_pipeline.fit(data).transform(data)

val documentAssembler = new DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("document")

val tokenizer = new Tokenizer()
    .setInputCols("document")
    .setOutputCol("token")

val text_matcher = TextMatcherInternalModel.pretrained("icd10cm_matcher","en","clinical/models")
    .setInputCols(Array("document", "token"))
    .setOutputCol("icd10cm")
    .setMergeOverlapping(true)

val mathcer_pipeline = new Pipeline().setStages(Array(
    documentAssembler,
    tokenizer,
    text_matcher))

val data = Seq("""A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus, associated with obesity with a body mass index (BMI) of 33.5 kg/m2, presented with a one-week history of polyuria, polydipsia, poor appetite, and vomiting. Two weeks prior to presentation, she was treated with a five-day course of amoxicillin for a respiratory tract infection.""").toDF("text")

val result = mathcer_pipeline.fit(data).transform(data)

Results

+-----------------------------+-----+---+------------+
|chunk                        |begin|end|ner_label   |
+-----------------------------+-----+---+------------+
|gestational diabetes mellitus|39   |67 |ICD10_ENTITY|
|polyuria                     |261  |268|ICD10_ENTITY|
|polydipsia                   |271  |280|ICD10_ENTITY|
|vomiting                     |302  |309|ICD10_ENTITY|
+-----------------------------+-----+---+------------+

Model Information

Model Name:	icd10cm_matcher
Compatibility:	Healthcare NLP 6.1.1+
License:	Licensed
Edition:	Official
Input Labels:	[sentence, token]
Output Labels:	[entity_text]
Language:	en
Size:	9.8 MB
Case sensitive:	false

PREVIOUSDistilBERT IBD Embeddings (ONNX)

NEXTClinical Deidentification Pipeline Benchmark Large (Sentence Wise)