Description
This pipeline can be used to mask PHI information in PDFs. Masked entities include ‘AGE’, ‘CITY’, ‘COUNTRY’, ‘DATE’, ‘DOCTOR’, ‘EMAIL’, ‘HOSPITAL’, ‘IDNUM’, ‘ORGANIZATION’, ‘PATIENT’, ‘PHONE’, ‘PROFESSION’, ‘STATE’, ‘STREET’, ‘USERNAME’, ‘ZIP’. The output is a PDF document, similar to the one at the input, but with fake obfuscated text on top of the targeted entities.
How to use
from sparknlp.pretrained import PretrainedPipeline
deid_pipeline = PretrainedPipeline("pdf_obfuscation_multi_model_context_pipeline", "en", "clinical/ocr")
Model Information
Model Name: | pdf_obfuscation_multi_model_context_pipeline |
Type: | pipeline |
Compatibility: | Healthcare NLP 6.0.0+ |
License: | Licensed |
Edition: | Official |
Language: | en |
Size: | 4.4 GB |
Included Models
- PdfToImage
- ImageToText
- DocumentAssembler
- SentenceDetectorDLModel
- Regex
- WordEmbeddingsModel
- MedicalNerModel
- NerConverter
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- EntityExtractor
- ContextualParserModel
- RegexMatcher
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- RegexMatcher
- ChunkMergeModel
- ChunkMergeModel
- XLMRobertaEmbeddings
- MedicalNerModel
- NerConverter
- PretrainedZeroShotNER
- NerConverter
- PretrainedZeroShotNER
- NerConverter
- ChunkMergeModel
- PositionFinder
- ImageDrawRegions
- ImageToPdf