Description
This pipeline can be used to de-identify PHI information from medical texts. The PHI information will be masked and obfuscated in the resulting text. The pipeline can mask and obfuscate LOCATION
, CONTACT
, PROFESSION
, NAME
, DATE
, ID
, AGE
, MEDICALRECORD
, ORGANIZATION
, HEALTHPLAN
, DOCTOR
, USERNAME
, URL
, DEVICE
, CITY
, ZIP
, STATE
, PATIENT
, COUNTRY
, STREET
, PHONE
, HOSPITAL
, EMAIL
, IDNUM
, BIOID
, FAX
, SSN
, ACCOUNT
, DLN
, PLATE
, VIN
, LICENSE
entities.
This pipeline simultaneously produces masked with entity labels, fixed-length char, same-length char and obfuscated version of the text.
Predicted Entities
LOCATION
, CONTACT
, PROFESSION
, NAME
, DATE
, ID
, AGE
, MEDICALRECORD
, ORGANIZATION
, HEALTHPLAN
, DOCTOR
, USERNAME
, URL
, DEVICE
, CITY
, ZIP
, STATE
, PATIENT
, COUNTRY
, STREET
, PHONE
, HOSPITAL
, EMAIL
, IDNUM
, BIOID
, FAX
, SSN
, ACCOUNT
, DLN
, PLATE
, VIN
, LICENSE
How to use
from sparknlp.pretrained import PretrainedPipeline
deid_pipeline = PretrainedPipeline("clinical_deidentification_multi_mode_output", "en", "clinical/models")
text = """Name : Hendrickson, Ora, Record date: 2093-01-13, MR # 719435.
Dr. John Green, IP 203.120.223.13.
He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 01/13/93.
Patient's VIN : 1HGBH41JXMN109286, SSN #333-44-6666, Driver's license no: A334455B.
Phone (302) 786-5227, 0295 Keats Street, San Francisco, CA 94131, E-MAIL: smith@gmail.com."""
result = deid_pipeline.annotate(text)
import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline
val deid_pipeline = PretrainedPipeline("clinical_deidentification_multi_mode_output", "en", "clinical/models")
val text = """Name : Hendrickson, Ora, Record date: 2093-01-13, MR # 719435.
Dr. John Green, IP 203.120.223.13.
He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 01/13/93.
Patient's VIN : 1HGBH41JXMN109286, SSN #333-44-6666, Driver's license no: A334455B.
Phone (302) 786-5227, 0295 Keats Street, San Francisco, CA 94131, E-MAIL: smith@gmail.com."""
val result = deid_pipeline.annotate(text)
Results
print("\nMasked with entity labels")
print("-"*30)
print("\n".join(result['masked']))
print("\nMasked with chars")
print("-"*30)
print("\n".join(result['masked_with_chars']))
print("\nMasked with fixed length chars")
print("-"*30)
print("\n".join(result['masked_fixed_length_chars']))
print("\nObfuscated")
print("-"*30)
print("\n".join(result['obfuscated']))
Masked with entity labels
------------------------------
Name : <PATIENT>, Record date: <DATE>, MR # <MEDICALRECORD>.
Dr. <DOCTOR>, IP <IPADDR>.
He is a <AGE>-year-old male was admitted to the <HOSPITAL> for cystectomy on <DATE>.
Patient's VIN : <VIN>, SSN <SSN>, Driver's license no: <DLN>.
Phone <PHONE>, <STREET>, <CITY>, <STATE> <ZIP>, E-MAIL: <EMAIL>.
Masked with chars
------------------------------
Name : [**************], Record date: [********], MR # [****].
Dr. [********], IP [************].
He is a **-year-old male was admitted to the [**********] for cystectomy on [******].
Patient's VIN : [***************], SSN [**********], Driver's license no: [******].
Phone [************], [***************], [***********], ** [***], E-MAIL: [*************].
Masked with fixed length chars
------------------------------
Name : ****, Record date: ****, MR # ****.
Dr. ****, IP ****.
He is a ****-year-old male was admitted to the **** for cystectomy on ****.
Patient's VIN : ****, SSN ****, Driver's license no: ****.
Phone ****, ****, ****, **** ****, E-MAIL: ****.
Obfuscated
------------------------------
Name : Dallis Dues, Record date: 2093-03-10, MR # 071219.
Dr. Emilie Harden, IP 001.001.001.001.
He is a 73-year-old male was admitted to the SOMERSET HOSPITAL for cystectomy on 03/10/93.
Patient's VIN : 7JOIT25QDIY641583, SSN #094-07-6808, Driver's license no: U110315X.
Phone (458) 592-9244, 100 Hospital Drive, NUNGATTA, Louisiana 62863, E-MAIL: Adelais@google.com.
Model Information
Model Name: | clinical_deidentification_multi_mode_output |
Type: | pipeline |
Compatibility: | Healthcare NLP 5.3.3+ |
License: | Licensed |
Edition: | Official |
Language: | en |
Size: | 1.7 GB |
Included Models
- DocumentAssembler
- SentenceDetectorDLModel
- TokenizerModel
- WordEmbeddingsModel
- MedicalNerModel
- NerConverterInternalModel
- MedicalNerModel
- NerConverterInternalModel
- ChunkMergeModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- TextMatcherInternalModel
- TextMatcherInternalModel
- ContextualParserModel
- RegexMatcherModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ChunkMergeModel
- ChunkMergeModel
- DeIdentificationModel
- DeIdentificationModel
- DeIdentificationModel
- DeIdentificationModel
- Finisher