Clinical Deidentification

Description

This pipeline can be used to de-identify PHI information from medical texts. The PHI information will be obfuscated in the resulting text.

Live Demo Open in Colab Copy S3 URI

How to use

from sparknlp.pretrained import PretrainedPipeline

sample = """Name : Hendrickson, Ora, Record date: 2093-01-13, # 719435.
Dr. John Green, ID: 1231511863, IP 203.120.223.13.
He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 01/13/93.
Patient's VIN : 1HGBH41JXMN109286, SSN #333-44-6666, Driver's license no:A334455B.
Phone (302) 786-5227, 0295 Keats Street, San Francisco, E-MAIL: smith@gmail.com."""

model = PretrainedPipeline("clinical_deidentification", "en", "clinical/models")
result = deid_pipeline.annotate(sample)
import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline

val sample = """Name : Hendrickson, Ora, Record date: 2093-01-13, # 719435.
Dr. John Green, ID: 1231511863, IP 203.120.223.13.
He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 01/13/93.
Patient's VIN : 1HGBH41JXMN109286, SSN #333-44-6666, Driver's license no:A334455B.
Phone (302) 786-5227, 0295 Keats Street, San Francisco, E-MAIL: smith@gmail.com."""


val model = new PretrainedPipeline("clinical_deidentification", "en", "clinical/models")
val result = deid_pipeline.annotate(sample)
import nlu
nlu.load("en.de_identify.clinical_pipeline").predict("""Name : Hendrickson, Ora, Record date: 2093-01-13, # 719435.
Dr. John Green, ID: 1231511863, IP 203.120.223.13.
He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 01/13/93.
Patient's VIN : 1HGBH41JXMN109286, SSN #333-44-6666, Driver's license no:A334455B.
Phone (302) 786-5227, 0295 Keats Street, San Francisco, E-MAIL: smith@gmail.com.""")

Model Information

Name: clinical_deidentification
Type: PipelineModel
Compatibility: Spark NLP 2.4.0+
License: Licensed
Edition: Official
Language: en