Detect PHI for Deidentification (Name Augmented)

Description

This pipeline can be used to extract PHI information such as LOCATION, CONTACT, PROFESSION, NAME, DATE, ID, AGE, MEDICALRECORD, ORGANIZATION, HEALTHPLAN, DOCTOR, USERNAME, LOCATION-OTHER, URL, DEVICE, CITY, ZIP, STATE, PATIENT, COUNTRY, STREET, PHONE, HOSPITAL, EMAIL, IDNUM, BIOID, FAX, SSN, ACCOUNT, DLN, PLATE, VIN, LICENSE, IPADDR entities. In this pipeline, there are ner_deid_generic_augmented, ner_deid_subentity_augmented, ner_deid_name_multilingual_clinical NER models and several ContextualParser, RegexMatcher, and TextMatcher models were used.

Predicted Entities

LOCATION, CONTACT, PROFESSION, NAME, DATE, ID, AGE, MEDICALRECORD, ORGANIZATION, HEALTHPLAN, DOCTOR, USERNAME, LOCATION-OTHER, URL, DEVICE, CITY, ZIP, STATE, PATIENT, COUNTRY, STREET, PHONE, HOSPITAL, EMAIL, IDNUM, BIOID, FAX, SSN, ACCOUNT, DLN, PLATE, VIN, LICENSE, IPADDR

Copy S3 URI

How to use


from sparknlp.pretrained import PretrainedPipeline

deid_pipeline = PretrainedPipeline("ner_deid_context_nameAugmented_pipeline", "en", "clinical/models")

text = """Name : Hendrickson, Ora, Record date: 2093-01-13, MR: 719435.
Dr. John Green, IP 203.120.223.13.
He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 01/13/93.
Patient's ID: 3454362, VIN : 1HGBH41JXMN109286, SSN #333-44-6666, Driver's license no: A334455B.
Phone (302) 786-5227, 0295 Keats Street, San Francisco, E-MAIL: smith@gmail.com."""

result = deid_pipeline.fullAnnotate(text)



import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline

val deid_pipeline = PretrainedPipeline("ner_deid_context_nameAugmented_pipeline", "en", "clinical/models")

val text = """Name : Hendrickson, Ora, Record date: 2093-01-13, MR: 719435.
Dr. John Green, IP 203.120.223.13.
He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 01/13/93.
Patient's ID: 3454362, VIN : 1HGBH41JXMN109286, SSN #333-44-6666, Driver's license no: A334455B.
Phone (302) 786-5227, 0295 Keats Street, San Francisco, E-MAIL: smith@gmail.com."""

val result = deid_pipeline.fullAnnotate(text)

Results


|    | chunk             |   begin |   end | entity        |
|---:|:------------------|--------:|------:|:--------------|
|  0 | Hendrickson, Ora  |       7 |    22 | PATIENT       |
|  1 | 2093-01-13        |      38 |    47 | DATE          |
|  2 | 719435            |      54 |    59 | MEDICALRECORD |
|  3 | John Green        |      66 |    75 | DOCTOR        |
|  4 | 203.120.223.13    |      81 |    94 | IPADDR        |
|  5 | 60                |     105 |   106 | AGE           |
|  6 | Day Hospital      |     142 |   153 | HOSPITAL      |
|  7 | 01/13/93          |     173 |   180 | DATE          |
|  8 | 3454362           |     197 |   203 | IDNUM         |
|  9 | 1HGBH41JXMN109286 |     212 |   228 | VIN           |
| 10 | #333-44-6666      |     235 |   246 | SSN           |
| 11 | A334455B          |     270 |   277 | DLN           |
| 12 | (302) 786-5227    |     286 |   299 | PHONE         |
| 13 | 0295 Keats Street |     302 |   318 | STREET        |
| 14 | San Francisco     |     321 |   333 | CITY          |
| 15 | smith@gmail.com   |     344 |   358 | EMAIL         |

Model Information

Model Name: ner_deid_context_nameAugmented_pipeline
Type: pipeline
Compatibility: Healthcare NLP 5.3.2+
License: Licensed
Edition: Official
Language: en
Size: 1.7 GB

Included Models

  • DocumentAssembler
  • SentenceDetectorDLModel
  • TokenizerModel
  • WordEmbeddingsModel
  • MedicalNerModel
  • NerConverter
  • MedicalNerModel
  • NerConverter
  • MedicalNerModel
  • NerConverter
  • ChunkMergeModel
  • ContextualParserModel
  • ContextualParserModel
  • ContextualParserModel
  • ContextualParserModel
  • ContextualParserModel
  • ContextualParserModel
  • TextMatcherInternalModel
  • ContextualParserModel
  • RegexMatcherModel
  • ContextualParserModel
  • ContextualParserModel
  • ContextualParserModel
  • ContextualParserModel
  • RegexMatcherInternalModel
  • ChunkMergeModel
  • ChunkMergeModel