Description
This pipeline can be used to detect the PHI information from medical texts and obfuscate (replace them with fake ones) in the resulting text.
Obfuscated entities: LOCATION, CONTACT, PROFESSION, NAME, DATE, ID, AGE, MEDICALRECORD, ORGANIZATION, HEALTHPLAN, DOCTOR, USERNAME, LOCATION-OTHER, URL, DEVICE, CITY, ZIP, STATE, PATIENT, COUNTRY, STREET, PHONE, HOSPITAL, EMAIL, IDNUM, BIOID, FAX, SSN, ACCOUNT, DLN, PLATE, VIN, LICENSE, IPADDR
Predicted Entities
LOCATION, CONTACT, PROFESSION, NAME, DATE, ID, AGE, MEDICALRECORD, ORGANIZATION, HEALTHPLAN, DOCTOR, USERNAME, LOCATION-OTHER, URL, DEVICE, CITY, ZIP, STATE, PATIENT, COUNTRY, STREET, PHONE, HOSPITAL, EMAIL, IDNUM, BIOID, FAX, SSN, ACCOUNT, DLN, PLATE, VIN, LICENSE, IPADDR
How to use
from sparknlp.pretrained import PretrainedPipeline
deid_pipeline = PretrainedPipeline("clinical_deidentification_obfuscation_medium", "en", "clinical/models")
result = deid_pipeline.annotate("""Name : Hendrickson, Ora, Record date: 2093-01-13, MR #719435.
Dr. John Green, ID: 1231511863, IP 203.120.223.13.
He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 04/08/1993.
SSN #333-44-6666, Driver's license no: A334455B.
Phone 302-786-5227, 0295 Keats Street, San Francisco, E-MAIL: smith@gmail.com.""")
import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline
val deid_pipeline = PretrainedPipeline("clinical_deidentification_obfuscation_medium", "en", "clinical/models")
val result = deid_pipeline.annotate("""Name : Hendrickson, Ora, Record date: 2093-01-13, MR #719435.
Dr. John Green, ID: 1231511863, IP 203.120.223.13.
He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 04/08/1993.
SSN #333-44-6666, Driver's license no: A334455B.
Phone 302-786-5227, 0295 Keats Street, San Francisco, E-MAIL: smith@gmail.com.""")
Results
Obfuscated
------------------------------
Name : Kara Dies, Record date: 2093-03-11, MR #528413.
Dr. Leandrew Koyanagi, ID: 0272536644, IP 333.333.333.333.
He is a 78-year-old male was admitted to the VA MEDICAL CENTER - JOHN COCHRAN DIVISION for cystectomy on 30/09/1993.
SSN #308-23-1994, Driver's license no: I347425Z.
Phone 563-875-6433, 230 West Miller Street, Danielskuil, E-MAIL: Harolda@yahoo.com.
Model Information
| Model Name: | clinical_deidentification_obfuscation_medium |
| Type: | pipeline |
| Compatibility: | Healthcare NLP 5.2.1+ |
| License: | Licensed |
| Edition: | Official |
| Language: | en |
| Size: | 1.7 GB |
Included Models
- DocumentAssembler
- SentenceDetectorDLModel
- TokenizerModel
- WordEmbeddingsModel
- MedicalNerModel
- NerConverter
- MedicalNerModel
- NerConverter
- TextMatcherModel
- RegexMatcherModel
- ChunkMergeModel
- DeIdentificationModel
- Finisher