ICD10 Contextual Parser Model

Description

This model, extracts icd10 entities from clinical texts.

Copy S3 URI

How to use



document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

ıcd10_contextualparser = ContextualParserModel.pretrained("icd10_parser","en","clinical/models")\
    .setInputCols(["sentence", "token"])\
    .setOutputCol("chunk_ıcd_10") 

chunk_converter = ChunkConverter()\
    .setInputCols(["chunk_ıcd_10"])\
    .setOutputCol("ner_chunk")

parserPipeline = Pipeline(stages=[
        document_assembler,
        sentence_detector,
        tokenizer,
        ıcd10_contextualparser,
        chunk_converter
        ])

model = parserPipeline.fit(spark.createDataFrame([[""]]).toDF("text"))

sample_text = """A . Record date: 2093-01-13, date: 2093-01-13, DATE: 2093-01-13, David Hale, M.D. IP: 203.120.223.13. ID: 1231511863, Des Moines AL 50129-4444, The driver's license no: A334455B. the SSN:324598674 and e-mail: hale@gmail.com. Name : Hendrickson, Ora MR. # 719435 Date : 01/13/93. PCP : Oliveira, 25 years-old. Record date : 2079-11-09, Cocke County Baptist Hospital. 0295 Keats Street. Phone (302) 786-5227.
Mine is SSN#332255677, The other is ssN: 333-44-6666. the rest ssn:  212-33-4444. his is sSN : 345-33-5666, HER is ssn 332233445.
me again ssn 223344556, Their SSN: 234-44-3333. MY dln# D08954796. your DLN : AB773955A. 
Social Security no: 445-66-4432. Patient's VIN : 1HGBH41JXMN109286. Molina Cortes, Alvaro, MD. MRN: 1482926 from GE Healthcare. Primary diagnosis: E11.9 for type 2 diabetes mellitus.
His personal IP: 2001:db8:85a3:8d3:1319:8a2e:370:7348. the patient's ssns: 333224444. my account number is 123456789. your routing 44334456. Secondary diagnosis code I10 indicates essential hypertension.
account#3344556677. his bank no: 334455667788. Patient's Vehicle Identifiers: 1HGBH41JXMN109286. my VIN# 1dddd41JXMN109286. our vehicle id no: 1HGBH41JXMN109286. The patient also has chronic kidney disease coded as N18.3.
his aba is 3445-6543-2. sample bankrouting: 23443245. Final billing diagnosis was J45.909 for asthma without complications."""

result = model.transform(spark.createDataFrame([[sample_text]]).toDF("text"))





document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentence_detector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = nlp.Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

ıcd10_contextualparser = medical.ContextualParserModel.pretrained("icd10_parser","en","clinical/models")\
    .setInputCols(["sentence", "token"])\
    .setOutputCol("chunk_ıcd_10") 

chunk_converter = medical.ChunkConverter()\
    .setInputCols(["chunk_ıcd_10"])\
    .setOutputCol("ner_chunk")

parserPipeline = nlp.Pipeline(stages=[
        document_assembler,
        sentence_detector,
        tokenizer,
        ıcd10_contextualparser,
        chunk_converter
        ])

model = parserPipeline.fit(spark.createDataFrame([[""]]).toDF("text"))

sample_text = """A . Record date: 2093-01-13, date: 2093-01-13, DATE: 2093-01-13, David Hale, M.D. IP: 203.120.223.13. ID: 1231511863, Des Moines AL 50129-4444, The driver's license no: A334455B. the SSN:324598674 and e-mail: hale@gmail.com. Name : Hendrickson, Ora MR. # 719435 Date : 01/13/93. PCP : Oliveira, 25 years-old. Record date : 2079-11-09, Cocke County Baptist Hospital. 0295 Keats Street. Phone (302) 786-5227.
Mine is SSN#332255677, The other is ssN: 333-44-6666. the rest ssn:  212-33-4444. his is sSN : 345-33-5666, HER is ssn 332233445.
me again ssn 223344556, Their SSN: 234-44-3333. MY dln# D08954796. your DLN : AB773955A. 
Social Security no: 445-66-4432. Patient's VIN : 1HGBH41JXMN109286. Molina Cortes, Alvaro, MD. MRN: 1482926 from GE Healthcare. Primary diagnosis: E11.9 for type 2 diabetes mellitus.
His personal IP: 2001:db8:85a3:8d3:1319:8a2e:370:7348. the patient's ssns: 333224444. my account number is 123456789. your routing 44334456. Secondary diagnosis code I10 indicates essential hypertension.
account#3344556677. his bank no: 334455667788. Patient's Vehicle Identifiers: 1HGBH41JXMN109286. my VIN# 1dddd41JXMN109286. our vehicle id no: 1HGBH41JXMN109286. The patient also has chronic kidney disease coded as N18.3.
his aba is 3445-6543-2. sample bankrouting: 23443245. Final billing diagnosis was J45.909 for asthma without complications."""

result = model.transform(spark.createDataFrame([[sample_text]]).toDF("text"))




val document_assembler = new DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("document")

val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")
    .setInputCols("document")
    .setOutputCol("sentence")

val tokenizer = new Tokenizer()
    .setInputCols("sentence")
    .setOutputCol("token")

val ıcd10_contextualparser = ContextualParserModel.pretrained("icd10_parser","en","clinical/models")
    .setInputCols(Array("sentence", "token"))
    .setOutputCol("chunk_ssn") 

val chunk_converter = new ChunkConverter()
    .setInputCols("chunk_ıcd-10")
    .setOutputCol("ner_chunk")

val parserPipeline = new Pipeline().setStages(Array(
        document_assembler,
        sentence_detector,
        tokenizer,
        ıcd10_contextualparser,
        chunk_converter
))

val sample_text = """A . Record date: 2093-01-13, date: 2093-01-13, DATE: 2093-01-13, David Hale, M.D. IP: 203.120.223.13. ID: 1231511863, Des Moines AL 50129-4444, The driver's license no: A334455B. the SSN:324598674 and e-mail: hale@gmail.com. Name : Hendrickson, Ora MR. # 719435 Date : 01/13/93. PCP : Oliveira, 25 years-old. Record date : 2079-11-09, Cocke County Baptist Hospital. 0295 Keats Street. Phone (302) 786-5227.
Mine is SSN#332255677, The other is ssN: 333-44-6666. the rest ssn:  212-33-4444. his is sSN : 345-33-5666, HER is ssn 332233445.
me again ssn 223344556, Their SSN: 234-44-3333. MY dln# D08954796. your DLN : AB773955A. 
Social Security no: 445-66-4432. Patient's VIN : 1HGBH41JXMN109286. Molina Cortes, Alvaro, MD. MRN: 1482926 from GE Healthcare. Primary diagnosis: E11.9 for type 2 diabetes mellitus.
His personal IP: 2001:db8:85a3:8d3:1319:8a2e:370:7348. the patient's ssns: 333224444. my account number is 123456789. your routing 44334456. Secondary diagnosis code I10 indicates essential hypertension.
account#3344556677. his bank no: 334455667788. Patient's Vehicle Identifiers: 1HGBH41JXMN109286. my VIN# 1dddd41JXMN109286. our vehicle id no: 1HGBH41JXMN109286. The patient also has chronic kidney disease coded as N18.3.
his aba is 3445-6543-2. sample bankrouting: 23443245. Final billing diagnosis was J45.909 for asthma without complications."""


val data = Seq(sample_text).toDF("text")

val results = parserPipeline.fit(data).transform(data)


Results



| chunk    |   begin |   end | label   |
|:---------|--------:|------:|:--------|
| A334455B |     169 |   176 | ICD_10  |
| E11.9    |     774 |   778 | ICD_10  |
| I10      |     976 |   978 | ICD_10  |
| N18.3    |    1229 |  1233 | ICD_10  |
| J45.909  |    1318 |  1324 | ICD_10  |



Model Information

Model Name: icd10_parser
Compatibility: Healthcare NLP 6.2.2+
License: Licensed
Edition: Official
Input Labels: [document, token_doc]
Output Labels: [entity_icd10]
Language: en
Size: 4.2 KB
Case sensitive: true