Description
This pipeline can be used to detect the PHI information from medical texts and obfuscate (replace them with fake ones) in the resulting text.
Obfuscated entities: MEDICALRECORD
, ORGANIZATION
, PROFESSION
, HEALTHPLAN
, DOCTOR
, USERNAME
, LOCATION-OTHER
, URL
, DEVICE
, CITY
, DATE
, ZIP
, STATE
, PATIENT
, COUNTRY
, STREET
, PHONE
, HOSPITAL
, EMAIL
, IDNUM
, BIOID
, FAX
, AGE
, SSN
, ACCOUNT
, DLN
, PLATE
, VIN
, LICENSE
, IPADDR
Predicted Entities
MEDICALRECORD
, ORGANIZATION
, PROFESSION
, HEALTHPLAN
, DOCTOR
, USERNAME
, LOCATION-OTHER
, URL
, DEVICE
, CITY
, DATE
, ZIP
, STATE
, PATIENT
, COUNTRY
, STREET
, PHONE
, HOSPITAL
, EMAIL
, IDNUM
, BIOID
, FAX
, AGE
, SSN
, ACCOUNT
, DLN
, PLATE
, VIN
, LICENSE
, IPADDR
How to use
from sparknlp.pretrained import PretrainedPipeline
deid_pipeline = PretrainedPipeline("clinical_deidentification_obfuscation_small", "en", "clinical/models")
result = deid_pipeline.annotate("""Name : Hendrickson, Ora, Record date: 2093-01-13, MR 719435.
Dr. John Green, ID: 1231511863, IP 203.120.223.13.
He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 12/17/1993.
SSN #333-44-6666, Driver's license no: A334455B.
Phone 302-786-5227, 0295 Keats Street, San Francisco, E-MAIL: smith@gmail.com.""")
import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline
val deid_pipeline = PretrainedPipeline("clinical_deidentification_obfuscation_small", "en", "clinical/models")
val result = deid_pipeline.annotate("""Name : Hendrickson, Ora, Record date: 2093-01-13, MR 719435.
Dr. John Green, ID: 1231511863, IP 203.120.223.13.
He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 12/17/1993.
SSN #333-44-6666, Driver's license no: A334455B.
Phone 302-786-5227, 0295 Keats Street, San Francisco, E-MAIL: smith@gmail.com.""")
Results
Obfuscated
------------------------------
Name : Maralyn Sago, Record date: 2093-01-31, MR 161096.
Dr. Darin Engels, ID: 9811914782, IP 003.003.003.003.
He is a 74-year-old male was admitted to the SCOTT COUNTY MEMORIAL HOSPITAL AKA SCOTT MEMORIAL for cystectomy on 01/04/1994.
SSN #956-21-3086, Driver's license no: V784696E.
Phone 952-841-3244, 11130 Parkview Circle Dr, Chesapeake, E-MAIL: Davie@yahoo.com.
Model Information
Model Name: | clinical_deidentification_obfuscation_small |
Type: | pipeline |
Compatibility: | Healthcare NLP 5.2.1+ |
License: | Licensed |
Edition: | Official |
Language: | en |
Size: | 1.7 GB |
Included Models
- DocumentAssembler
- SentenceDetectorDLModel
- TokenizerModel
- WordEmbeddingsModel
- MedicalNerModel
- NerConverter
- TextMatcherModel
- RegexMatcherModel
- ChunkMergeModel
- DeIdentificationModel
- Finisher