ICD10CM Text Matcher

Description

This model extracts icd10cm entities in clinical notes using rule-based TextMatcherInternal annotator.

Predicted Entities

ICD10_ENTITY

Copy S3 URI

How to use

 documentAssembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

tokenizer = Tokenizer()\
    .setInputCols(["document"])\
    .setOutputCol("token")

text_matcher = TextMatcherInternalModel.pretrained("icd10cm_matcher","en","clinical/models") \
    .setInputCols(["document", "token"])\
    .setOutputCol("icd10cm")\
    .setMergeOverlapping(True)

mathcer_pipeline = Pipeline().setStages([
    documentAssembler,
    tokenizer,
    text_matcher])

data = spark.createDataFrame([["""A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus, associated with obesity with a body mass index (BMI) of 33.5 kg/m2, presented with a one-week history of polyuria, polydipsia, poor appetite, and vomiting. Two weeks prior to presentation, she was treated with a five-day course of amoxicillin for a respiratory tract infection."""]]).toDF("text")

result = mathcer_pipeline.fit(data).transform(data)
documentAssembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

tokenizer = nlp.Tokenizer()\
    .setInputCols(["document"])\
    .setOutputCol("token")

text_matcher = medical.TextMatcherInternalModel.pretrained("icd10cm_matcher","en","clinical/models") \
    .setInputCols(["document", "token"])\
    .setOutputCol("icd10cm")\
    .setMergeOverlapping(True)

mathcer_pipeline = nlp.Pipeline().setStages([
    documentAssembler,
    tokenizer,
    text_matcher])

data = spark.createDataFrame([["""A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus, associated with obesity with a body mass index (BMI) of 33.5 kg/m2, presented with a one-week history of polyuria, polydipsia, poor appetite, and vomiting. Two weeks prior to presentation, she was treated with a five-day course of amoxicillin for a respiratory tract infection."""]]).toDF("text")

result = mathcer_pipeline.fit(data).transform(data)
val documentAssembler = new DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("document")

val tokenizer = new Tokenizer()
    .setInputCols("document")
    .setOutputCol("token")

val text_matcher = TextMatcherInternalModel.pretrained("icd10cm_matcher","en","clinical/models")
    .setInputCols(Array("document", "token"))
    .setOutputCol("icd10cm")
    .setMergeOverlapping(true)

val mathcer_pipeline = new Pipeline().setStages(Array(
    documentAssembler,
    tokenizer,
    text_matcher))

val data = Seq("""A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus, associated with obesity with a body mass index (BMI) of 33.5 kg/m2, presented with a one-week history of polyuria, polydipsia, poor appetite, and vomiting. Two weeks prior to presentation, she was treated with a five-day course of amoxicillin for a respiratory tract infection.""").toDF("text")

val result = mathcer_pipeline.fit(data).transform(data)

Results

+-----------------------------+-----+---+------------+
|chunk                        |begin|end|ner_label   |
+-----------------------------+-----+---+------------+
|gestational diabetes mellitus|39   |67 |ICD10_ENTITY|
|polyuria                     |261  |268|ICD10_ENTITY|
|polydipsia                   |271  |280|ICD10_ENTITY|
|vomiting                     |302  |309|ICD10_ENTITY|
+-----------------------------+-----+---+------------+

Model Information

Model Name: icd10cm_matcher
Compatibility: Healthcare NLP 6.1.1+
License: Licensed
Edition: Official
Input Labels: [sentence, token]
Output Labels: [entity_text]
Language: en
Size: 9.8 MB
Case sensitive: false