Clinical Deidentification Pipeline - Obfuscation (Small)

Description

This pipeline can be used to detect the PHI information from medical texts and obfuscate (replace them with fake ones) in the resulting text. Obfuscated entities: AGE, CONTACT, DATE, ID, LOCATION, NAME, PROFESSION, CITY, COUNTRY, DOCTOR, HOSPITAL, IDNUM, MEDICALRECORD, ORGANIZATION, PATIENT, PHONE, PROFESSION, STREET, USERNAME, ZIP, ACCOUNT, LICENSE, VIN, SSN, DLN, PLATE, IPADDR

Copy S3 URI

How to use


from sparknlp.pretrained import PretrainedPipeline

deid_pipeline = PretrainedPipeline("clinical_deidentification_obfuscation_small", "en", "clinical/models")

result = deid_pipeline.annotate("""Name : Hendrickson, Ora, Record date: 2093-01-13, MR 719435.
Dr. John Green, ID: 1231511863, IP 203.120.223.13.
He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 12/17/1993.
SSN #333-44-6666, Driver's license no: A334455B.
Phone 302-786-5227, 0295 Keats Street, San Francisco, E-MAIL: smith@gmail.com.""")




import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline

val deid_pipeline = PretrainedPipeline("clinical_deidentification_obfuscation_small", "en", "clinical/models")

val result = deid_pipeline.annotate("""Name : Hendrickson, Ora, Record date: 2093-01-13, MR 719435.
Dr. John Green, ID: 1231511863, IP 203.120.223.13.
He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 12/17/1993.
SSN #333-44-6666, Driver's license no: A334455B.
Phone 302-786-5227, 0295 Keats Street, San Francisco, E-MAIL: smith@gmail.com.""")


Results

Obfuscated
------------------------------
Name : Maralyn Sago, Record date: 2093-01-31, MR 161096.
Dr. Darin Engels, ID: 9811914782, IP 003.003.003.003.
He is a 74-year-old male was admitted to the SCOTT COUNTY MEMORIAL HOSPITAL AKA SCOTT MEMORIAL for cystectomy on 01/04/1994.
SSN #956-21-3086, Driver's license no: V784696E.
Phone 952-841-3244, 11130 Parkview Circle Dr, Chesapeake, E-MAIL: Davie@yahoo.com.

Model Information

Model Name: clinical_deidentification_obfuscation_small
Type: pipeline
Compatibility: Healthcare NLP 5.2.1+
License: Licensed
Edition: Official
Language: en
Size: 1.7 GB

Included Models

  • DocumentAssembler
  • SentenceDetectorDLModel
  • TokenizerModel
  • WordEmbeddingsModel
  • MedicalNerModel
  • NerConverter
  • TextMatcherModel
  • RegexMatcherModel
  • ChunkMergeModel
  • DeIdentificationModel
  • Finisher