PDF Deidentification Multi Model Context Signature Aware

Description

This pipeline can be used to mask PHI information in PDFs. The output is a PDF document, similar to the one at the input, but with black bounding boxes on top of the targeted entities, also includes removing signatures.

Predicted Entities

AGE, CITY, COUNTRY, DATE, DOCTOR, EMAIL, HOSPITAL, IDNUM, ORGANIZATION, PATIENT, PHONE, PROFESSION, STATE, STREET, USERNAME, ZIP, SIGNATURE.

Live Demo Open in Colab Download

How to use

from sparknlp.pretrained import PretrainedPipeline
deid_pipeline = PretrainedPipeline("pdf_deid_multi_model_context_signature_aware_pipeline", "en", "clinical/ocr")

Example

Input:

Screenshot

Output:

Screenshot

Model Information

Model Name:	pdf_deid_multi_model_context_signature_aware_pipeline
Type:	pipeline
Compatibility:	Healthcare NLP 6.0.0+
License:	Licensed
Edition:	Official
Language:	en
Size:	4.7 GB

Included Models

PdfToImage
ImageToText
DocumentAssembler
SentenceDetectorDLModel
Regex
WordEmbeddingsModel
MedicalNerModel
NerConverter
ContextualParserModel
ContextualParserModel
ContextualParserModel
ContextualParserModel
ContextualParserModel
ContextualParserModel
EntityExtractor
ContextualParserModel
RegexMatcher
ContextualParserModel
ContextualParserModel
ContextualParserModel
ContextualParserModel
RegexMatcher
ChunkMergeModel
ChunkMergeModel
XLMRobertaEmbeddings
MedicalNerModel
NerConverter
PretrainedZeroShotNER
NerConverter
PretrainedZeroShotNER
NerConverter
ChunkMergeModel
PositionFinder
ImageDrawRegions
HW_Signature_Detector
ImageDrawRegions
ImageToPdf

PREVIOUSPDF Deidentification Multilingual Name Plus Signature Aware

NEXTSentence Entity Resolver for UMLS CUI Codes (Disease or Syndrome)