Description
This pipeline is designed for deidentification in clinical texts, leveraging a range of pretrained NER models tailored for extracting and anonymizing sensitive information. By integrating these models, the pipeline provides a comprehensive solution for protecting patient privacy and complying with data protection regulations.
The pipeline employs embeddings_clinical
for contextual understanding and includes the following specialized NER models for deidentification:
ner_deid_augmented
, ner_deid_enriched
, ner_deid_generic_augmented
, ner_deid_name_multilingual_clinical
, ner_deid_sd
, ner_deid_subentity_augmented
, ner_deid_subentity_augmented_i2b2
, ner_deid_synthetic
, ner_jsl
, ner_jsl_enriched
Each model addresses a unique aspect of deidentification, making this pipeline an all-encompassing tool for securing clinical narratives.
How to use
from sparknlp.pretrained import PretrainedPipeline
ner_profiling_pipeline = PretrainedPipeline("ner_profiling_deidentification", 'en', 'clinical/models')
result = ner_profiling_pipeline.annotate("""Record date : 2093-01-13 , David Hale , M.D . , Name : Hendrickson Ora , MR # 7194334 Date : 01/13/93 . PCP : Oliveira , 25 years-old , Record date : 2079-11-09 . Cocke County Baptist Hospital , 0295 Keats Street , Phone 55-555-5555 .""")
import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline
val ner_profiling_pipeline = PretrainedPipeline("ner_profiling_deidentification", "en", "clinical/models")
val result = ner_profiling_pipeline.annotate("""Record date : 2093-01-13 , David Hale , M.D . , Name : Hendrickson Ora , MR # 7194334 Date : 01/13/93 . PCP : Oliveira , 25 years-old , Record date : 2079-11-09 . Cocke County Baptist Hospital , 0295 Keats Street , Phone 55-555-5555 .""")
Results
******************** ner_deid_name_multilingual_clinical Model Results ********************
('David Hale', 'NAME') ('Hendrickson Ora', 'NAME') ('Oliveira', 'NAME')
******************** ner_deid_subentity_augmented_i2b2 Model Results ********************
('2093-01-13', 'DATE') ('David Hale', 'DOCTOR') ('Hendrickson Ora', 'PATIENT') ('7194334', 'MEDICALRECORD') ('01/13/93', 'DATE') ('Oliveira', 'PATIENT') ('25', 'AGE') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'HOSPITAL') ('0295 Keats Street', 'STREET') ('55-555-5555', 'PHONE')
******************** ner_deid_large Model Results ********************
('2093-01-13', 'DATE') ('David Hale', 'NAME') ('Hendrickson Ora', 'NAME') ('7194334', 'ID') ('01/13/93', 'DATE') ('Oliveira', 'NAME') ('25', 'AGE') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'LOCATION') ('0295 Keats Street', 'LOCATION') ('55-555-5555', 'CONTACT')
******************** ner_jsl_enriched Model Results ********************
('01/13/93', 'Date') ('25 years-old', 'Age') ('2079-11-09', 'Date')
******************** ner_deid_sd_large Model Results ********************
('2093-01-13', 'DATE') ('David Hale', 'NAME') ('Hendrickson Ora', 'NAME') ('7194334', 'ID') ('01/13/93', 'DATE') ('Oliveira', 'NAME') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'LOCATION') ('0295 Keats Street', 'LOCATION') ('55-555-5555', 'CONTACT')
******************** ner_deid_generic_augmented Model Results ********************
('2093-01-13', 'DATE') ('David Hale', 'NAME') ('Hendrickson Ora', 'NAME') ('7194334', 'ID') ('01/13/93', 'DATE') ('Oliveira', 'NAME') ('25', 'AGE') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'LOCATION') ('0295 Keats Street', 'LOCATION') ('55-555-5555', 'CONTACT')
******************** ner_deid_name_multilingual_clinical_langtest Model Results ********************
('David Hale', 'NAME') ('Hendrickson Ora', 'NAME') ('Oliveira', 'NAME')
******************** ner_deid_generic_augmented_langtest Model Results ********************
('2093-01-13', 'DATE') ('David Hale', 'NAME') ('Hendrickson Ora', 'NAME') ('7194334', 'ID') ('01/13/93', 'DATE') ('Oliveira', 'NAME') ('25', 'AGE') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'LOCATION') ('0295 Keats Street', 'LOCATION') ('55-555-5555', 'CONTACT')
******************** ner_deid_sd Model Results ********************
('2093-01-13', 'DATE') ('David Hale', 'NAME') ('Hendrickson Ora', 'NAME') ('7194334', 'ID') ('01/13/93', 'DATE') ('Oliveira', 'NAME') ('25', 'AGE') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'LOCATION') ('0295 Keats Street', 'LOCATION')
******************** ner_deid_subentity_augmented Model Results ********************
('2093-01-13', 'DATE') ('David Hale', 'DOCTOR') ('Hendrickson Ora', 'PATIENT') ('7194334', 'MEDICALRECORD') ('01/13/93', 'DATE') ('Oliveira', 'DOCTOR') ('25', 'AGE') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'HOSPITAL') ('0295 Keats Street', 'STREET') ('55-555-5555', 'PHONE')
******************** ner_deid_large_langtest Model Results ********************
('2093-01-13', 'DATE') ('David Hale', 'NAME') ('Hendrickson Ora', 'NAME') ('7194334', 'ID') ('01/13/93', 'DATE') ('Oliveira', 'NAME') ('25', 'AGE') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'LOCATION') ('0295 Keats Street', 'LOCATION') ('55-555-5555', 'CONTACT')
******************** ner_jsl Model Results ********************
('01/13/93', 'DATE') ('25 years-old', 'AGE')
******************** ner_deid_synthetic Model Results ********************
('2093-01-13', 'DATE') ('David Hale', 'NAME') ('Hendrickson Ora', 'NAME') ('7194334', 'ID') ('01/13/93', 'DATE') ('Oliveira', 'NAME') ('25', 'AGE') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'LOCATION') ('0295 Keats Street', 'LOCATION') ('55-555-5555', 'CONTACT')
******************** ner_deid_augmented Model Results ********************
('2093-01-13', 'DATE') ('David Hale', 'NAME') ('Hendrickson Ora', 'NAME') ('7194334', 'ID') ('01/13/93', 'DATE') ('Oliveira', 'NAME') ('25', 'AGE') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'LOCATION') ('Keats Street', 'LOCATION')
******************** ner_deid_generic_augmented_allUpperCased_langtest Model Results ********************
('2093-01-13', 'DATE') ('David Hale', 'NAME') ('Hendrickson Ora', 'NAME') ('7194334', 'ID') ('01/13/93', 'DATE') ('Oliveira', 'NAME') ('25', 'AGE') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'LOCATION') ('0295 Keats Street', 'LOCATION') ('55-555-5555', 'CONTACT')
******************** ner_deid_enriched_langtest Model Results ********************
('2093-01-13', 'DATE') ('David Hale', 'DOCTOR') ('Hendrickson Ora', 'DOCTOR') ('7194334', 'MEDICALRECORD') ('01/13/93', 'DATE') ('Oliveira', 'DOCTOR') ('25', 'AGE') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'HOSPITAL') ('0295 Keats Street', 'STREET') ('55-555-5555', 'PHONE')
******************** ner_deid_subentity_augmented_langtest Model Results ********************
('2093-01-13', 'DATE') ('David Hale', 'DOCTOR') ('Hendrickson Ora', 'DOCTOR') ('7194334', 'MEDICALRECORD') ('01/13/93', 'DATE') ('Oliveira', 'DOCTOR') ('25', 'AGE') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'HOSPITAL') ('0295 Keats Street', 'STREET') ('55-555-5555', 'PHONE')
******************** ner_deid_enriched Model Results ********************
('2093-01-13', 'DATE') ('David Hale', 'DOCTOR') ('Hendrickson Ora', 'DOCTOR') ('7194334', 'MEDICALRECORD') ('01/13/93', 'DATE') ('Oliveira', 'DOCTOR') ('25', 'AGE') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'HOSPITAL') ('0295 Keats Street', 'STREET') ('55-555-5555', 'PHONE')
Model Information
Model Name: | ner_profiling_deidentification |
Type: | pipeline |
Compatibility: | Healthcare NLP 5.3.1+ |
License: | Licensed |
Edition: | Official |
Language: | en |
Size: | 2.0 GB |
Included Models
- DocumentAssembler
- SentenceDetectorDLModel
- TokenizerModel
- WordEmbeddingsModel
- MedicalNerModel x 14
- NerConverterInternalModel x 14