Description
This pipeline can be used to deidentify PHI information from medical texts. The PHI information will be masked and obfuscated in the resulting text. The pipeline can mask and obfuscate ACCOUNT
, AGE
, BIOID
, CITY
, CONTACT
, COUNTRY
, DATE
, DEVICE
, DLN
, DOCTOR
, EMAIL
, FAX
, HEALTHPLAN
, HOSPITAL
, ID
, IPADDR
, LICENSE
, LOCATION
, MEDICALRECORD
, NAME
, ORGANIZATION
, PATIENT
, PHONE
, PLATE
, PROFESSION
, SREET
, SSN
, STATE
, STREET
, URL
, USERNAME
, VIN
, ZIP
entities.
Predicted Entities
ACCOUNT
, AGE
, BIOID
, CITY
, CONTACT
, COUNTRY
, City
, Country
, DATE
, DEVICE
, DLN
, EMAIL
, FAX
, HEALTHPLAN
, HOSPITAL
, ID
, IDNUM
, IP
, LICENSE
, LOCATION
, LOCATION-OTHER
, LOCATION_OTHER
, MEDICALRECORD
, NAME
, ORGANIZATION
, PHONE
, PLATE
, PROFESSION
, SSN
, STATE
, STREET
, URL
, VIN
, ZIP
How to use
from sparknlp.pretrained import PretrainedPipeline
deid_pipeline = PretrainedPipeline("clinical_deidentification_nameAugmented_v2", "en", "clinical/models")
text = """Dr. John Taylor, a cardiologist at St. Mary's Hospital in Boston, was contacted on 05/10/2023 regarding a 45-year-old male patient."""
deid_result = deid_pipeline.fullAnnotate(text)
print(''.join([i.metadata['masked'] for i in deid_result[0]['obfuscated']]))
print(''.join([i.result for i in deid_result[0]['obfuscated']]))
import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline
val deid_pipeline = PretrainedPipeline("clinical_deidentification_nameAugmented_v2", "en", "clinical/models")
val text = """Dr. John Taylor, a cardiologist at St. Mary's Hospital in Boston, was contacted on 05/10/2023 regarding a 45-year-old male patient."""
val deid_result = deid_pipeline.fullAnnotate(text)
println(deid_result(0)("obfuscated").map(_("metadata")("masked").toString).mkString(""))
println(deid_result(0)("obfuscated").map(_("result").toString).mkString(""))
Results
Masked with entity labels
------------------------------
Dr. <NAME>, a <PROFESSION> at <HOSPITAL> in <CITY>, was contacted on <DATE> regarding a <AGE>-year-old male patient.
Obfuscated
------------------------------
Dr. Rolande Cleverly, a Fish farm manager at NORTH COUNTRY HOSPITAL & HEALTH CENTER in BARMOLLOCH, was contacted on 16/10/2023 regarding a 48-year-old male patient.
Model Information
Model Name: | clinical_deidentification_nameAugmented_v2 |
Type: | pipeline |
Compatibility: | Healthcare NLP 5.5.0+ |
License: | Licensed |
Edition: | Official |
Language: | en |
Size: | 1.9 GB |
Included Models
- DocumentAssembler
- SentenceDetectorDLModel
- TokenizerModel
- WordEmbeddingsModel
- NerDLModel
- NerConverterInternalModel
- WordEmbeddingsModel
- MedicalNerModel
- NerConverterInternalModel
- MedicalNerModel
- NerConverterInternalModel
- MedicalNerModel
- NerConverterInternalModel
- ChunkMergeModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- RegexMatcherInternalModel
- ContextualParserModel
- ContextualParserModel
- TextMatcherInternalModel
- TextMatcherInternalModel
- TextMatcherInternalModel
- ContextualParserModel
- RegexMatcherInternalModel
- RegexMatcherInternalModel
- RegexMatcherInternalModel
- ChunkMergeModel
- ChunkMergeModel
- DeIdentificationModel
- Finisher