Description
This pipeline can be used to de-identify PHI information from medical texts. The PHI information will be masked and obfuscated in the resulting text. The pipeline can mask and obfuscate MEDICALRECORD
, ORGANIZATION
, PROFESSION
, HEALTHPLAN
, DOCTOR
, USERNAME
, LOCATION-OTHER
, URL
, DEVICE
, CITY
, DATE
, ZIP
, STATE
, PATIENT
, COUNTRY
, STREET
, PHONE
, HOSPITAL
, EMAIL
, IDNUM
, BIOID
, FAX
, AGE
, SSN
, ACCOUNT
, DLN
, PLATE
, VIN
, LICENSE
entities. This pipeline is built using the ner_deid_subentity_augmented
model as well as ContextualParser, RegexMatcher, and TextMatcher.
Predicted Entities
MEDICALRECORD
, ORGANIZATION
, PROFESSION
, HEALTHPLAN
, DOCTOR
, USERNAME
, LOCATION-OTHER
, URL
, DEVICE
, CITY
, DATE
, ZIP
, STATE
, PATIENT
, COUNTRY
, STREET
, PHONE
, HOSPITAL
, EMAIL
, IDNUM
, BIOID
, FAX
, AGE
, SSN
, ACCOUNT
, DLN
, PLATE
, VIN
, LICENSE
How to use
from sparknlp.pretrained import PretrainedPipeline
deid_pipeline = PretrainedPipeline("clinical_deidentification_subentity", "en", "clinical/models")
text = """Name : Hendrickson, Ora, Record date: 2093-01-13, MR: 719435.
Dr. John Green, IP 203.120.223.13.
He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 01/13/93.
Patient's VIN : 1HGBH41JXMN109286, SSN #333-44-6666, Driver's license no: A334455B.
Phone (302) 786-5227, 0295 Keats Street, San Francisco, E-MAIL: smith@gmail.com."""
result = deid_pipeline.annotate(text)
import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline
val deid_pipeline = PretrainedPipeline("clinical_deidentification_subentity", "en", "clinical/models")
val text = """Name : Hendrickson, Ora, Record date: 2093-01-13, MR: 719435.
Dr. John Green, IP 203.120.223.13.
He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 01/13/93.
Patient's VIN : 1HGBH41JXMN109286, SSN #333-44-6666, Driver's license no: A334455B.
Phone (302) 786-5227, 0295 Keats Street, San Francisco, E-MAIL: smith@gmail.com."""
val result = deid_pipeline.annotate(text)
Results
print("
Masked with entity labels")
print("-"*30)
print("
".join(result['masked']))
print("
Masked with chars")
print("-"*30)
print("
".join(result['masked_with_chars']))
print("
Masked with fixed length chars")
print("-"*30)
print("
".join(result['masked_fixed_length_chars']))
print("
Obfuscated")
print("-"*30)
print("
".join(result['obfuscated']))
Masked with entity labels
------------------------------
Name : <PATIENT>, Record date: <DATE>, MR <MEDICALRECORD>.
Dr. <DOCTOR>, IP <IPADDR>.
He is a <AGE>-year-old male was admitted to the <HOSPITAL> for cystectomy on <DATE>.
Patient's VIN : <VIN>, SSN <SSN>, Driver's license no: <DLN>.
Phone <PHONE>, <STREET>, <CITY>, E-MAIL: <EMAIL>.
Masked with chars
------------------------------
Name : [**************], Record date: [********], MR [****].
Dr. [********], IP [************].
He is a **-year-old male was admitted to the [**********] for cystectomy on [******].
Patient's VIN : [***************], SSN [**********], Driver's license no: [******].
Phone [************], [***************], [***********], E-MAIL: [*************].
Masked with fixed length chars
------------------------------
Name : ****, Record date: ****, MR ****.
Dr. ****, IP ****.
He is a ****-year-old male was admitted to the **** for cystectomy on ****.
Patient's VIN : ****, SSN ****, Driver's license no: ****.
Phone ****, ****, ****, E-MAIL: ****.
Obfuscated
------------------------------
Name : Neta Ehlers, Record date: 2093-01-18, MR 175102.
Dr. Tomi Bamberger, IP 444.444.444.444.
He is a 68-year-old male was admitted to the ROYAL OAKS HOSPITAL for cystectomy on 01/18/93.
Patient's VIN : 5ENID78EUMP536144, SSN #315-40-0867, Driver's license no: Y195093O.
Phone (671) 245-8099, 401 E Vaughn Ave, Brawley, E-MAIL: Dene@google.com.
Model Information
Model Name: | clinical_deidentification_subentity |
Type: | pipeline |
Compatibility: | Healthcare NLP 5.2.1+ |
License: | Licensed |
Edition: | Official |
Language: | en |
Size: | 1.7 GB |
Included Models
- DocumentAssembler
- SentenceDetectorDLModel
- TokenizerModel
- WordEmbeddingsModel
- MedicalNerModel
- NerConverter
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- TextMatcherModel
- ContextualParserModel
- RegexMatcherModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ChunkMergeModel
- ChunkMergeModel
- DeIdentificationModel
- DeIdentificationModel
- DeIdentificationModel
- DeIdentificationModel
- Finisher