Description
This pipeline can be used to detect the PHI information from medical texts and obfuscate (replace them with fake ones) in the resulting text.
Obfuscated entities: LOCATION
, CONTACT
, PROFESSION
, NAME
, DATE
, ID
, AGE
, MEDICALRECORD
, ORGANIZATION
, HEALTHPLAN
, DOCTOR
, USERNAME
, LOCATION-OTHER
, URL
, DEVICE
, CITY
, ZIP
, STATE
, PATIENT
, COUNTRY
, STREET
, PHONE
, HOSPITAL
, EMAIL
, IDNUM
, BIOID
, FAX
, SSN
, ACCOUNT
, DLN
, PLATE
, VIN
, LICENSE
, IPADDR
Predicted Entities
LOCATION
, CONTACT
, PROFESSION
, NAME
, DATE
, ID
, AGE
, MEDICALRECORD
, ORGANIZATION
, HEALTHPLAN
, DOCTOR
, USERNAME
, LOCATION-OTHER
, URL
, DEVICE
, CITY
, ZIP
, STATE
, PATIENT
, COUNTRY
, STREET
, PHONE
, HOSPITAL
, EMAIL
, IDNUM
, BIOID
, FAX
, SSN
, ACCOUNT
, DLN
, PLATE
, VIN
, LICENSE
, IPADDR
How to use
from sparknlp.pretrained import PretrainedPipeline
deid_pipeline = PretrainedPipeline("clinical_deidentification_obfuscation_medium", "en", "clinical/models")
result = deid_pipeline.annotate("""Name : Hendrickson, Ora, Record date: 2093-01-13, MR #719435.
Dr. John Green, ID: 1231511863, IP 203.120.223.13.
He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 04/08/1993.
SSN #333-44-6666, Driver's license no: A334455B.
Phone 302-786-5227, 0295 Keats Street, San Francisco, E-MAIL: smith@gmail.com.""")
import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline
val deid_pipeline = PretrainedPipeline("clinical_deidentification_obfuscation_medium", "en", "clinical/models")
val result = deid_pipeline.annotate("""Name : Hendrickson, Ora, Record date: 2093-01-13, MR #719435.
Dr. John Green, ID: 1231511863, IP 203.120.223.13.
He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 04/08/1993.
SSN #333-44-6666, Driver's license no: A334455B.
Phone 302-786-5227, 0295 Keats Street, San Francisco, E-MAIL: smith@gmail.com.""")
Results
Obfuscated
------------------------------
Name : Kara Dies, Record date: 2093-03-11, MR #528413.
Dr. Leandrew Koyanagi, ID: 0272536644, IP 333.333.333.333.
He is a 78-year-old male was admitted to the VA MEDICAL CENTER - JOHN COCHRAN DIVISION for cystectomy on 30/09/1993.
SSN #308-23-1994, Driver's license no: I347425Z.
Phone 563-875-6433, 230 West Miller Street, Danielskuil, E-MAIL: Harolda@yahoo.com.
Model Information
Model Name: | clinical_deidentification_obfuscation_medium |
Type: | pipeline |
Compatibility: | Healthcare NLP 5.2.1+ |
License: | Licensed |
Edition: | Official |
Language: | en |
Size: | 1.7 GB |
Included Models
- DocumentAssembler
- SentenceDetectorDLModel
- TokenizerModel
- WordEmbeddingsModel
- MedicalNerModel
- NerConverter
- MedicalNerModel
- NerConverter
- TextMatcherModel
- RegexMatcherModel
- ChunkMergeModel
- DeIdentificationModel
- Finisher