Clinical Deidentification Pipeline (English)

Description

This pipeline can be used to deidentify PHI information from medical texts. The PHI information will be masked and obfuscated in the resulting text. The pipeline can mask and obfuscate ACCOUNT, AGE, BIOID, CITY, CONTACT, COUNTRY, DATE, DEVICE, DLN, DOCTOR, EMAIL, FAX, HEALTHPLAN, HOSPITAL, ID, IPADDR, LICENSE, LOCATION, MEDICALRECORD, NAME, ORGANIZATION, PATIENT, PHONE, PLATE, PROFESSION, SREET, SSN, STATE, STREET, URL, USERNAME, VIN, ZIP entities.

Copy S3 URI

How to use

from sparknlp.pretrained import PretrainedPipeline

deid_pipeline = PretrainedPipeline("clinical_deidentification_nameAugmented_v2", "en", "clinical/models")

text = """Dr. John Taylor, a cardiologist at St. Mary's Hospital in Boston, was contacted on 05/10/2023 regarding a 45-year-old male patient."""

deid_result = deid_pipeline.fullAnnotate(text)

print(''.join([i.metadata['masked'] for i in deid_result[0]['obfuscated']]))
print(''.join([i.result for i in deid_result[0]['obfuscated']]))
import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline

val deid_pipeline = PretrainedPipeline("clinical_deidentification_nameAugmented_v2", "en", "clinical/models")

val text = """Dr. John Taylor, a cardiologist at St. Mary's Hospital in Boston, was contacted on 05/10/2023 regarding a 45-year-old male patient."""

val deid_result = deid_pipeline.fullAnnotate(text)

println(deid_result(0)("obfuscated").map(_("metadata")("masked").toString).mkString(""))
println(deid_result(0)("obfuscated").map(_("result").toString).mkString(""))

Results

Masked with entity labels
------------------------------
Dr. <NAME>, a <PROFESSION> at <HOSPITAL> in <CITY>, was contacted on <DATE> regarding a <AGE>-year-old male patient.

Obfuscated
------------------------------
Dr. Rolande Cleverly, a Fish farm manager at NORTH COUNTRY HOSPITAL & HEALTH CENTER in BARMOLLOCH, was contacted on 16/10/2023 regarding a 48-year-old male patient.

Model Information

Model Name: clinical_deidentification_nameAugmented_v2
Type: pipeline
Compatibility: Healthcare NLP 5.5.0+
License: Licensed
Edition: Official
Language: en
Size: 1.9 GB

Included Models

  • DocumentAssembler
  • SentenceDetectorDLModel
  • TokenizerModel
  • WordEmbeddingsModel
  • NerDLModel
  • NerConverterInternalModel
  • WordEmbeddingsModel
  • MedicalNerModel
  • NerConverterInternalModel
  • MedicalNerModel
  • NerConverterInternalModel
  • MedicalNerModel
  • NerConverterInternalModel
  • ChunkMergeModel
  • ContextualParserModel
  • ContextualParserModel
  • ContextualParserModel
  • ContextualParserModel
  • ContextualParserModel
  • ContextualParserModel
  • ContextualParserModel
  • RegexMatcherInternalModel
  • ContextualParserModel
  • ContextualParserModel
  • TextMatcherInternalModel
  • TextMatcherInternalModel
  • TextMatcherInternalModel
  • ContextualParserModel
  • RegexMatcherInternalModel
  • RegexMatcherInternalModel
  • RegexMatcherInternalModel
  • ChunkMergeModel
  • ChunkMergeModel
  • DeIdentificationModel
  • Finisher