Deidentify Large

Description

Anonymization and DeIdentification model based on outputs from DeId NERs and Replacement Dictionaries Deidentify (Large) is a deidentification model. It identifies instances of protected health information in text documents, and it can either obfuscate them (e.g., replacing names with different, fake names) or mask them (e.g., replacing “2020,06,04” with “"). This model is useful for maintaining HIPAA compliance when dealing with text documents that contain protected health information.

Predicted Entities

Contact, Location, Name, Profession

Live DemoOpen in ColabDownload

How to use

model = DeIdentificationModel.pretrained("deidentify_large","en","clinical/models")\
	.setInputCols("document","token","chunk")\
	.setOutputCol("document")
val model = DeIdentificationModel.pretrained("deidentify_large","en","clinical/models")
	.setInputCols("document","token","chunk")
	.setOutputCol("document")

Model Information

Name: deidentify_large
Type: DeIdentificationModel
Compatibility: Spark NLP 2.5.1+
License: Licensed
Edition: Official
Input labels: document, token, chunk
Output labels: document
Language: en
Dependencies: ner_deid_large

Data Source

Trained on 10.000 Contact, Location, Name and Profession random replacements https://portal.dbmi.hms.harvard.edu/projects/n2c2-2014/