Detect PHI for Deidentification (Generic - Context Augmented)

Description

This pipeline can be used to extract PHI information such as LOCATION, CONTACT, PROFESSION, NAME, DATE, ID, AGE, COUNTRY, SSN, ACCOUNT, DLN, PLATE, VIN, LICENSE, PHONE, ZIP, MEDICALRECORD, EMAIL, IPADDR entities.

Predicted Entities

LOCATION, CONTACT, PROFESSION, NAME, DATE, ID, AGE, COUNTRY, SSN, ACCOUNT, DLN, PLATE, VIN, LICENSE, PHONE, ZIP, MEDICALRECORD, EMAIL, IPADDR

Copy S3 URI

How to use

from sparknlp.pretrained import PretrainedPipeline

deid_pipeline = PretrainedPipeline("ner_deid_generic_context_augmented_pipeline", "en", "clinical/models")

text = """Name : Hendrickson, Ora, Record date: 2093-01-13, MR: 719435.
Dr. John Green, IP 203.120.223.13.
He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 01/13/93.
Patient's ID: 764543, VIN : 1HGBH41JXMN109286, SSN #333-44-6666, Driver's license no: A334455B.
Phone (302) 786-5227, 0295 Keats Street, San Francisco, E-MAIL: smith@gmail.com."""

result = deid_pipeline.fullAnnotate(text)
import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline

val deid_pipeline = PretrainedPipeline("ner_deid_generic_context_augmented_pipeline", "en", "clinical/models")

val text = """Name : Hendrickson, Ora, Record date: 2093-01-13, MR: 719435.
Dr. John Green, IP 203.120.223.13.
He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 01/13/93.
Patient's ID: 764543, VIN : 1HGBH41JXMN109286, SSN #333-44-6666, Driver's license no: A334455B.
Phone (302) 786-5227, 0295 Keats Street, San Francisco, E-MAIL: smith@gmail.com."""

val result = deid_pipeline.fullAnnotate(text)

Results

|    | chunk             |   begin |   end | entity        |
|---:|:------------------|--------:|------:|:--------------|
|  0 | Hendrickson, Ora  |       7 |    22 | NAME          |
|  1 | 2093-01-13        |      38 |    47 | DATE          |
|  2 | 719435            |      54 |    59 | MEDICALRECORD |
|  3 | John Green        |      66 |    75 | NAME          |
|  4 | 203.120.223.13    |      81 |    94 | IPADDR        |
|  5 | 60                |     105 |   106 | AGE           |
|  6 | Day Hospital      |     142 |   153 | LOCATION      |
|  7 | 01/13/93          |     173 |   180 | DATE          |
|  8 | 764543            |     197 |   202 | ID            |
|  9 | 1HGBH41JXMN109286 |     211 |   227 | VIN           |
| 10 | #333-44-6666      |     234 |   245 | SSN           |
| 11 | A334455B          |     269 |   276 | DLN           |
| 12 | (302) 786-5227    |     285 |   298 | PHONE         |
| 13 | 0295 Keats Street |     301 |   317 | LOCATION      |
| 14 | San Francisco     |     320 |   332 | LOCATION      |
| 15 | smith@gmail.com   |     343 |   357 | EMAIL         |

Model Information

Model Name: ner_deid_generic_context_augmented_pipeline
Type: pipeline
Compatibility: Healthcare NLP 5.3.2+
License: Licensed
Edition: Official
Language: en
Size: 1.7 GB

Included Models

  • DocumentAssembler
  • SentenceDetectorDLModel
  • TokenizerModel
  • WordEmbeddingsModel
  • MedicalNerModel
  • NerConverter
  • ContextualParserModel
  • ContextualParserModel
  • ContextualParserModel
  • ContextualParserModel
  • ContextualParserModel
  • ContextualParserModel
  • TextMatcherInternalModel
  • ContextualParserModel
  • RegexMatcherModel
  • ContextualParserModel
  • ContextualParserModel
  • ContextualParserModel
  • ContextualParserModel
  • RegexMatcherInternalModel
  • ChunkMergeModel
  • ChunkMergeModel