Description
This pipeline can be used to de-identify PHI information from medical texts. The PHI information will be masked and obfuscated in the resulting text. The pipeline can mask and obfuscate LOCATION
, CONTACT
, PROFESSION
, NAME
, DATE
, ID
, AGE
, MEDICALRECORD
, ORGANIZATION
, HEALTHPLAN
, DOCTOR
, USERNAME
, URL
, DEVICE
, CITY
, ZIP
, STATE
, PATIENT
, COUNTRY
, STREET
, PHONE
, HOSPITAL
, EMAIL
, IDNUM
, BIOID
, FAX
, SSN
, ACCOUNT
, DLN
, PLATE
, VIN
, LICENSE
entities.
This pipeline simultaneously produces masked with entity labels, fixed-length char, same-length char and obfuscated version of the text.
Predicted Entities
LOCATION
, CONTACT
, PROFESSION
, NAME
, DATE
, ID
, AGE
, MEDICALRECORD
, ORGANIZATION
, HEALTHPLAN
, DOCTOR
, USERNAME
, URL
, DEVICE
, CITY
, ZIP
, STATE
, PATIENT
, COUNTRY
, STREET
, PHONE
, HOSPITAL
, EMAIL
, IDNUM
, BIOID
, FAX
, SSN
, ACCOUNT
, DLN
, PLATE
, VIN
, LICENSE
How to use
from sparknlp.pretrained import PretrainedPipeline
deid_pipeline = PretrainedPipeline("clinical_deidentification_multi_mode_output", "en", "clinical/models")
text = """Name : Hendrickson, Ora, Record date: 2093-01-13, MR # 719435.
Dr. John Green, ID: 1231511863, IP 203.120.223.13.
He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 01/13/93.
Patient's VIN : 1HGBH41JXMN109286, SSN #333-44-6666, Driver's license no: A334455B.
Phone (302) 786-5227, 0295 Keats Street, San Francisco, E-MAIL: smith@gmail.com."""
result = deid_pipeline.annotate(text)
import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline
val deid_pipeline = PretrainedPipeline("clinical_deidentification_multi_mode_output", "en", "clinical/models")
val text = """Name : Hendrickson, Ora, Record date: 2093-01-13, MR # 719435.
Dr. John Green, ID: 1231511863, IP 203.120.223.13.
He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 01/13/93.
Patient's VIN : 1HGBH41JXMN109286, SSN #333-44-6666, Driver's license no: A334455B.
Phone (302) 786-5227, 0295 Keats Street, San Francisco, E-MAIL: smith@gmail.com."""
val result = deid_pipeline.annotate(text)
Results
print("\nMasked with entity labels")
print("-"*30)
print("\n".join(result['masked']))
print("\nMasked with chars")
print("-"*30)
print("\n".join(result['masked_with_chars']))
print("\nMasked with fixed length chars")
print("-"*30)
print("\n".join(result['masked_fixed_length_chars']))
print("\nObfuscated")
print("-"*30)
print("\n".join(result['obfuscated']))
Masked with entity labels
------------------------------
Name : <PATIENT>, Record date: <DATE>, MR # <MEDICALRECORD>.
Dr. <DOCTOR>, ID: <DEVICE>, IP <IPADDR>.
He is a <AGE>-year-old male was admitted to the <HOSPITAL> for cystectomy on <DATE>.
Patient's VIN : <VIN>, SSN <SSN>, Driver's license no: <DLN>.
Phone <PHONE>, <STREET>, <CITY>, E-MAIL: <EMAIL>.
Masked with chars
------------------------------
Name : [**************], Record date: [********], MR # [****].
Dr. [********], ID: [********], IP [************].
He is a **-year-old male was admitted to the [**********] for cystectomy on [******].
Patient's VIN : [***************], SSN [**********], Driver's license no: [******].
Phone [************], [***************], [***********], E-MAIL: [*************].
Masked with fixed length chars
------------------------------
Name : ****, Record date: ****, MR # ****.
Dr. ****, ID: ****, IP ****.
He is a ****-year-old male was admitted to the **** for cystectomy on ****.
Patient's VIN : ****, SSN ****, Driver's license no: ****.
Phone ****, ****, ****, E-MAIL: ****.
Obfuscated
------------------------------
Name : Marlana Salvage, Record date: 2093-02-23, MR # 824235.
Dr. Vic Blackbird, ID: X2814358, IP 001.001.001.001.
He is a 68-year-old male was admitted to the PRAIRIE SAINT JOHN'S for cystectomy on 02/23/93.
Patient's VIN : 3IRWE31VQMG867619, SSN #509-32-6712, Driver's license no: W580998P.
Phone (382) 505-3976, 521 Adams St, Port Shannon, E-MAIL: Dasha@yahoo.com.
Model Information
Model Name: | clinical_deidentification_multi_mode_output |
Type: | pipeline |
Compatibility: | Healthcare NLP 5.3.1+ |
License: | Licensed |
Edition: | Official |
Language: | en |
Size: | 1.7 GB |
Included Models
- DocumentAssembler
- SentenceDetectorDLModel
- TokenizerModel
- WordEmbeddingsModel
- MedicalNerModel
- NerConverter
- MedicalNerModel
- NerConverter
- ChunkMergeModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- TextMatcherModel
- ContextualParserModel
- RegexMatcherModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ChunkMergeModel
- ChunkMergeModel
- DeIdentificationModel
- DeIdentificationModel
- DeIdentificationModel
- DeIdentificationModel
- Finisher