Detect PHI for Deidentification (Subentity- Context Augmented)

Description

This pipeline can be used to extract PHI information such as MEDICALRECORD, ORGANIZATION, PROFESSION, HEALTHPLAN, DOCTOR, USERNAME, LOCATION-OTHER, URL, DEVICE, CITY, DATE, ZIP, STATE, PATIENT, COUNTRY, STREET, PHONE, HOSPITAL, EMAIL, IDNUM, BIOID, FAX, AGE, SSN, ACCOUNT, DLN, PLATE, VIN, LICENSE, IPADDR entities.

Predicted Entities

MEDICALRECORD, ORGANIZATION, PROFESSION, HEALTHPLAN, DOCTOR, USERNAME, LOCATION-OTHER, URL, DEVICE, CITY, DATE, ZIP, STATE, PATIENT, COUNTRY, STREET, PHONE, HOSPITAL, EMAIL, IDNUM, BIOID, FAX, AGE, SSN, ACCOUNT, DLN, PLATE, VIN, LICENSE, IPADDR

Copy S3 URI

How to use

from sparknlp.pretrained import PretrainedPipeline

deid_pipeline = PretrainedPipeline("ner_deid_subentity_context_augmented_pipeline", "en", "clinical/models")

text = """Name : Hendrickson, Ora, Record date: 2093-01-13, MR: 719435.
Dr. John Green, IP 203.120.223.13.
He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 01/13/93.
Patient's VIN : 1HGBH41JXMN109286, SSN #333-44-6666, Driver's license no: A334455B.
Phone (302) 786-5227, 0295 Keats Street, San Francisco, E-MAIL: smith@gmail.com."""

result = deid_pipeline.fullAnnotate(text)
import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline

val deid_pipeline = PretrainedPipeline("ner_deid_subentity_context_augmented_pipeline", "en", "clinical/models")

val text = """Name : Hendrickson, Ora, Record date: 2093-01-13, MR: 719435.
Dr. John Green, IP 203.120.223.13.
He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 01/13/93.
Patient's VIN : 1HGBH41JXMN109286, SSN #333-44-6666, Driver's license no: A334455B.
Phone (302) 786-5227, 0295 Keats Street, San Francisco, E-MAIL: smith@gmail.com."""

val result = deid_pipeline.fullAnnotate(text)

Results

|    | chunk             |   begin |   end | entity        |
|---:|:------------------|--------:|------:|:--------------|
|  0 | Hendrickson, Ora  |       7 |    22 | PATIENT       |
|  1 | 2093-01-13        |      38 |    47 | DATE          |
|  2 | 719435            |      54 |    59 | MEDICALRECORD |
|  3 | John Green        |      66 |    75 | DOCTOR        |
|  4 | 203.120.223.13    |      81 |    94 | IPADDR        |
|  5 | 60                |     105 |   106 | AGE           |
|  6 | Day Hospital      |     142 |   153 | HOSPITAL      |
|  7 | 01/13/93          |     173 |   180 | DATE          |
|  8 | 1HGBH41JXMN109286 |     199 |   215 | VIN           |
|  9 | #333-44-6666      |     222 |   233 | SSN           |
| 10 | A334455B          |     257 |   264 | DLN           |
| 11 | (302) 786-5227    |     273 |   286 | PHONE         |
| 12 | 0295 Keats Street |     289 |   305 | STREET        |
| 13 | San Francisco     |     308 |   320 | CITY          |
| 14 | smith@gmail.com   |     331 |   345 | EMAIL         |

Model Information

Model Name: ner_deid_subentity_context_augmented_pipeline
Type: pipeline
Compatibility: Healthcare NLP 5.3.2+
License: Licensed
Edition: Official
Language: en
Size: 1.7 GB

Included Models

  • DocumentAssembler
  • SentenceDetectorDLModel
  • TokenizerModel
  • WordEmbeddingsModel
  • MedicalNerModel
  • NerConverter
  • ContextualParserModel
  • ContextualParserModel
  • ContextualParserModel
  • ContextualParserModel
  • ContextualParserModel
  • ContextualParserModel
  • TextMatcherModel
  • ContextualParserModel
  • RegexMatcherModel
  • ContextualParserModel
  • ContextualParserModel
  • ContextualParserModel
  • ContextualParserModel
  • RegexMatcherInternalModel
  • ChunkMergeModel
  • ChunkMergeModel