Clinical Deidentification Pipeline - Obfuscation (Medium)

Description

This pipeline can be used to detect the PHI information from medical texts and obfuscate (replace them with fake ones) in the resulting text. Obfuscated entities: AGE, CONTACT, DATE, ID, LOCATION, NAME, PROFESSION, CITY, COUNTRY, DOCTOR, HOSPITAL, IDNUM, MEDICALRECORD, ORGANIZATION, PATIENT, PHONE, PROFESSION, STREET, USERNAME, ZIP, ACCOUNT, LICENSE, VIN, SSN, DLN, PLATE, IPADDR

Copy S3 URI

How to use


from sparknlp.pretrained import PretrainedPipeline

deid_pipeline = PretrainedPipeline("clinical_deidentification_obfuscation_medium", "en", "clinical/models")

result = deid_pipeline.annotate("""Name : Hendrickson, Ora, Record date: 2093-01-13, MR #719435.
Dr. John Green, ID: 1231511863, IP 203.120.223.13.
He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 04/08/1993.
SSN #333-44-6666, Driver's license no: A334455B.
Phone 302-786-5227, 0295 Keats Street, San Francisco, E-MAIL: smith@gmail.com.""")




import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline

val deid_pipeline = PretrainedPipeline("clinical_deidentification_obfuscation_medium", "en", "clinical/models")

val result = deid_pipeline.annotate("""Name : Hendrickson, Ora, Record date: 2093-01-13, MR #719435.
Dr. John Green, ID: 1231511863, IP 203.120.223.13.
He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 04/08/1993.
SSN #333-44-6666, Driver's license no: A334455B.
Phone 302-786-5227, 0295 Keats Street, San Francisco, E-MAIL: smith@gmail.com.""")


Results

Obfuscated
------------------------------
Name : Kara Dies, Record date: 2093-03-11, MR #528413.
Dr. Leandrew Koyanagi, ID: 0272536644, IP 333.333.333.333.
He is a 78-year-old male was admitted to the VA MEDICAL CENTER - JOHN COCHRAN DIVISION for cystectomy on 30/09/1993.
SSN #308-23-1994, Driver's license no: I347425Z.
Phone 563-875-6433, 230 West Miller Street, Danielskuil, E-MAIL: Harolda@yahoo.com.

Model Information

Model Name: clinical_deidentification_obfuscation_medium
Type: pipeline
Compatibility: Healthcare NLP 5.2.1+
License: Licensed
Edition: Official
Language: en
Size: 1.7 GB

Included Models

  • DocumentAssembler
  • SentenceDetectorDLModel
  • TokenizerModel
  • WordEmbeddingsModel
  • MedicalNerModel
  • NerConverter
  • MedicalNerModel
  • NerConverter
  • TextMatcherModel
  • RegexMatcherModel
  • ChunkMergeModel
  • DeIdentificationModel
  • Finisher