Description
This pipeline can be used to mask PHI information in PDFs. The output is a PDF document, similar to the one at the input, but with black bounding boxes on top of the targeted entities.
Predicted Entities
AGE
, CITY
, COUNTRY
, DATE
, DOCTOR
, EMAIL
, HOSPITAL
, IDNUM
, ORGANIZATION
, PATIENT
, PHONE
, PROFESSION
, STATE
, STREET
, USERNAME
, ZIP
.
Live Demo Open in Colab Download
How to use
from sparknlp.pretrained import PretrainedPipeline
deid_pipeline = PretrainedPipeline("pdf_deid_multi_model_context_pipeline", "en", "clinical/ocr")
Example
Input:
Output:
Model Information
Model Name: | pdf_deid_multi_model_context_pipeline |
Type: | pipeline |
Compatibility: | Healthcare NLP 6.0.0+ |
License: | Licensed |
Edition: | Official |
Language: | en |
Size: | 4.4 GB |
Included Models
- PdfToImage
- ImageToText
- DocumentAssembler
- SentenceDetectorDLModel
- Regex
- WordEmbeddingsModel
- MedicalNerModel
- NerConverter
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- EntityExtractor
- ContextualParserModel
- RegexMatcher
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- RegexMatcher
- ChunkMergeModel
- ChunkMergeModel
- XLMRobertaEmbeddings
- MedicalNerModel
- NerConverter
- PretrainedZeroShotNER
- NerConverter
- PretrainedZeroShotNER
- NerConverter
- ChunkMergeModel
- PositionFinder
- ImageDrawRegions
- ImageToPdf