Description
This pipeline is designed for deidentification in clinical texts, leveraging a range of pretrained NER models tailored for extracting and anonymizing sensitive information. By integrating these models, the pipeline provides a comprehensive solution for protecting patient privacy and complying with data protection regulations.
The pipeline employs embeddings_clinical for contextual understanding and includes the following specialized NER models for deidentification:
ner_deid_augmented, ner_deid_enriched, ner_deid_generic_augmented, ner_deid_name_multilingual_clinical, ner_deid_sd, ner_deid_subentity_augmented, ner_deid_subentity_augmented_i2b2, ner_deid_synthetic, ner_jsl, ner_jsl_enriched
Each model addresses a unique aspect of deidentification, making this pipeline an all-encompassing tool for securing clinical narratives.
Predicted Entities
AGE, Age, BIOID, CITY, CONTACT, COUNTRY, DATE, DEVICE, DOCTOR, Date, Dosage, EMAIL, FAX, HEALTHPLAN, HOSPITAL, ID, IDNUM, LOCATION, LOCATION-OTHER, MEDICALRECORD, NAME, NN, ORGANIZATION, PATIENT, PHONE, PROFESSION, RelativeDate, STATE, STREET, URL, USERNAME, ZIP
How to use
from sparknlp.pretrained import PretrainedPipeline
ner_profiling_pipeline = PretrainedPipeline("ner_profiling_deidentification", 'en', 'clinical/models')
result = ner_profiling_pipeline.annotate("""Record date : 2093-01-13 , David Hale , M.D . , Name : Hendrickson Ora , MR # 7194334 Date : 01/13/93 . PCP : Oliveira , 25 years-old , Record date : 2079-11-09 . Cocke County Baptist Hospital , 0295 Keats Street , Phone 55-555-5555 .""")
import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline
val ner_profiling_pipeline = PretrainedPipeline("ner_profiling_deidentification", "en", "clinical/models")
val result = ner_profiling_pipeline.annotate("""Record date : 2093-01-13 , David Hale , M.D . , Name : Hendrickson Ora , MR # 7194334 Date : 01/13/93 . PCP : Oliveira , 25 years-old , Record date : 2079-11-09 . Cocke County Baptist Hospital , 0295 Keats Street , Phone 55-555-5555 .""")
Results
******************** ner_deid_name_multilingual_clinical Model Results ********************
('David Hale', 'NAME') ('Hendrickson Ora', 'NAME') ('Oliveira', 'NAME')
******************** ner_deid_subentity_augmented_i2b2 Model Results ********************
('2093-01-13', 'DATE') ('David Hale', 'DOCTOR') ('Hendrickson Ora', 'PATIENT') ('7194334', 'MEDICALRECORD') ('01/13/93', 'DATE') ('Oliveira', 'PATIENT') ('25', 'AGE') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'HOSPITAL') ('0295 Keats Street', 'STREET') ('55-555-5555', 'PHONE')
******************** ner_deid_large Model Results ********************
('2093-01-13', 'DATE') ('David Hale', 'NAME') ('Hendrickson Ora', 'NAME') ('7194334', 'ID') ('01/13/93', 'DATE') ('Oliveira', 'NAME') ('25', 'AGE') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'LOCATION') ('0295 Keats Street', 'LOCATION') ('55-555-5555', 'CONTACT')
******************** ner_jsl_enriched Model Results ********************
('01/13/93', 'Date') ('25 years-old', 'Age') ('2079-11-09', 'Date')
******************** ner_deid_sd_large Model Results ********************
('2093-01-13', 'DATE') ('David Hale', 'NAME') ('Hendrickson Ora', 'NAME') ('7194334', 'ID') ('01/13/93', 'DATE') ('Oliveira', 'NAME') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'LOCATION') ('0295 Keats Street', 'LOCATION') ('55-555-5555', 'CONTACT')
******************** ner_deid_generic_augmented Model Results ********************
('2093-01-13', 'DATE') ('David Hale', 'NAME') ('Hendrickson Ora', 'NAME') ('7194334', 'ID') ('01/13/93', 'DATE') ('Oliveira', 'NAME') ('25', 'AGE') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'LOCATION') ('0295 Keats Street', 'LOCATION') ('55-555-5555', 'CONTACT')
******************** ner_deid_name_multilingual_clinical_langtest Model Results ********************
('David Hale', 'NAME') ('Hendrickson Ora', 'NAME') ('Oliveira', 'NAME')
******************** ner_deid_generic_augmented_langtest Model Results ********************
('2093-01-13', 'DATE') ('David Hale', 'NAME') ('Hendrickson Ora', 'NAME') ('7194334', 'ID') ('01/13/93', 'DATE') ('Oliveira', 'NAME') ('25', 'AGE') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'LOCATION') ('0295 Keats Street', 'LOCATION') ('55-555-5555', 'CONTACT')
******************** ner_deid_sd Model Results ********************
('2093-01-13', 'DATE') ('David Hale', 'NAME') ('Hendrickson Ora', 'NAME') ('7194334', 'ID') ('01/13/93', 'DATE') ('Oliveira', 'NAME') ('25', 'AGE') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'LOCATION') ('0295 Keats Street', 'LOCATION')
******************** ner_deid_subentity_augmented Model Results ********************
('2093-01-13', 'DATE') ('David Hale', 'DOCTOR') ('Hendrickson Ora', 'PATIENT') ('7194334', 'MEDICALRECORD') ('01/13/93', 'DATE') ('Oliveira', 'DOCTOR') ('25', 'AGE') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'HOSPITAL') ('0295 Keats Street', 'STREET') ('55-555-5555', 'PHONE')
******************** ner_deid_large_langtest Model Results ********************
('2093-01-13', 'DATE') ('David Hale', 'NAME') ('Hendrickson Ora', 'NAME') ('7194334', 'ID') ('01/13/93', 'DATE') ('Oliveira', 'NAME') ('25', 'AGE') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'LOCATION') ('0295 Keats Street', 'LOCATION') ('55-555-5555', 'CONTACT')
******************** ner_jsl Model Results ********************
('01/13/93', 'DATE') ('25 years-old', 'AGE')
******************** ner_deid_synthetic Model Results ********************
('2093-01-13', 'DATE') ('David Hale', 'NAME') ('Hendrickson Ora', 'NAME') ('7194334', 'ID') ('01/13/93', 'DATE') ('Oliveira', 'NAME') ('25', 'AGE') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'LOCATION') ('0295 Keats Street', 'LOCATION') ('55-555-5555', 'CONTACT')
******************** ner_deid_augmented Model Results ********************
('2093-01-13', 'DATE') ('David Hale', 'NAME') ('Hendrickson Ora', 'NAME') ('7194334', 'ID') ('01/13/93', 'DATE') ('Oliveira', 'NAME') ('25', 'AGE') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'LOCATION') ('Keats Street', 'LOCATION')
******************** ner_deid_generic_augmented_allUpperCased_langtest Model Results ********************
('2093-01-13', 'DATE') ('David Hale', 'NAME') ('Hendrickson Ora', 'NAME') ('7194334', 'ID') ('01/13/93', 'DATE') ('Oliveira', 'NAME') ('25', 'AGE') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'LOCATION') ('0295 Keats Street', 'LOCATION') ('55-555-5555', 'CONTACT')
******************** ner_deid_enriched_langtest Model Results ********************
('2093-01-13', 'DATE') ('David Hale', 'DOCTOR') ('Hendrickson Ora', 'DOCTOR') ('7194334', 'MEDICALRECORD') ('01/13/93', 'DATE') ('Oliveira', 'DOCTOR') ('25', 'AGE') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'HOSPITAL') ('0295 Keats Street', 'STREET') ('55-555-5555', 'PHONE')
******************** ner_deid_subentity_augmented_langtest Model Results ********************
('2093-01-13', 'DATE') ('David Hale', 'DOCTOR') ('Hendrickson Ora', 'DOCTOR') ('7194334', 'MEDICALRECORD') ('01/13/93', 'DATE') ('Oliveira', 'DOCTOR') ('25', 'AGE') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'HOSPITAL') ('0295 Keats Street', 'STREET') ('55-555-5555', 'PHONE')
******************** ner_deid_enriched Model Results ********************
('2093-01-13', 'DATE') ('David Hale', 'DOCTOR') ('Hendrickson Ora', 'DOCTOR') ('7194334', 'MEDICALRECORD') ('01/13/93', 'DATE') ('Oliveira', 'DOCTOR') ('25', 'AGE') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'HOSPITAL') ('0295 Keats Street', 'STREET') ('55-555-5555', 'PHONE')
Model Information
| Model Name: | ner_profiling_deidentification |
| Type: | pipeline |
| Compatibility: | Healthcare NLP 5.3.1+ |
| License: | Licensed |
| Edition: | Official |
| Language: | en |
| Size: | 2.0 GB |
Included Models
- DocumentAssembler
- SentenceDetectorDLModel
- TokenizerModel
- WordEmbeddingsModel
- MedicalNerModel x 14
- NerConverterInternalModel x 14