Pipeline to Detect PHI for Deidentification

Description

This pretrained pipeline is built on the top of ner_deidentify_dl model.

Predicted Entities

DATE, PATIENT, MEDICALRECORD, DOCTOR, AGE, HOSPITAL,STATE, CITY, PROFESSION, STREET, ZIP, PHONE, COUNTRY, ORGANIZATION, FAX, IDNUM, HEALTHPLAN, USERNAME, EMAIL, BIOID, LOCATION-OTHER, DEVICE, URL,ID

Live Demo Open in Colab Copy S3 URI

How to use

from sparknlp.pretrained import PretrainedPipeline

pipeline = PretrainedPipeline("ner_deidentify_dl_pipeline", "en", "clinical/models")

pipeline.annotate("A . Record date : 2093-01-13 , David Hale , M.D . , Name : Hendrickson , Ora MR . # 7194334 Date : 01/13/93 PCP : Oliveira , 25 month years-old , Record date : 2079-11-09 . Cocke County Baptist Hospital . 0295 Keats Street")
val pipeline = new PretrainedPipeline("ner_deidentify_dl_pipeline", "en", "clinical/models")

pipeline.annotate("A . Record date : 2093-01-13 , David Hale , M.D . , Name : Hendrickson , Ora MR . # 7194334 Date : 01/13/93 PCP : Oliveira , 25 month years-old , Record date : 2079-11-09 . Cocke County Baptist Hospital . 0295 Keats Street")
import nlu
nlu.load("en.med_ner.deidentify.pipeline").predict("""A . Record date : 2093-01-13 , David Hale , M.D . , Name : Hendrickson , Ora MR . # 7194334 Date : 01/13/93 PCP : Oliveira , 25 month years-old , Record date : 2079-11-09 . Cocke County Baptist Hospital . 0295 Keats Street""")

Results

+---------------+-----+
|ner_label      |count|
+---------------+-----+
|O              |28   |
|I-HOSPITAL     |4    |
|B-DATE         |3    |
|I-STREET       |3    |
|I-PATIENT      |2    |
|B-DOCTOR       |2    |
|B-AGE          |1    |
|B-PATIENT      |1    |
|I-DOCTOR       |1    |
|B-MEDICALRECORD|1    |
+---------------+-----+. 

+-----------------------------+-------------+
|chunk                        |ner_label    |
+-----------------------------+-------------+
|2093-01-13                   |DATE         |
|David Hale                   |DOCTOR       |
|Hendrickson , Ora            |PATIENT      |
|7194334                      |MEDICALRECORD|
|01/13/93                     |DATE         |
|Oliveira                     |DOCTOR       |
|25                           |AGE          |
|2079-11-09                   |DATE         |
|Cocke County Baptist Hospital|HOSPITAL     |
|0295 Keats Street            |STREET       |
+-----------------------------+-------------+

Model Information

Model Name: ner_deidentify_dl_pipeline
Type: pipeline
Compatibility: Healthcare NLP 3.4.1+
License: Licensed
Edition: Official
Language: en
Size: 1.7 GB

Included Models

  • DocumentAssembler
  • SentenceDetectorDLModel
  • TokenizerModel
  • WordEmbeddingsModel
  • MedicalNerModel
  • NerConverter