Pipeline to Detect PHI in Text

Description

This pretrained pipeline is built on the top of ner_deid_sd_large model.

Live Demo Open in Colab Copy S3 URI

How to use

from sparknlp.pretrained import PretrainedPipeline

pipeline = PretrainedPipeline("ner_deid_sd_large_pipeline", "en", "clinical/models")

pipeline.annotate("""A. Record date : 2093-01-13, David Hale, M.D., Name : Hendrickson, Ora MR. # 7194334 Date : 01/13/93 PCP : Oliveira, 25-year-old, Record date : 1-11-2000. Cocke County Baptist Hospital. 0295 Keats Street. Phone +1 (302) 786-5227. Patient's complaints first surfaced when he started working for Brothers Coal-Mine.""")
val pipeline = new PretrainedPipeline("ner_deid_sd_large_pipeline", "en", "clinical/models")

pipeline.annotate("A. Record date : 2093-01-13, David Hale, M.D., Name : Hendrickson, Ora MR. # 7194334 Date : 01/13/93 PCP : Oliveira, 25-year-old, Record date : 1-11-2000. Cocke County Baptist Hospital. 0295 Keats Street. Phone +1 (302) 786-5227. Patient's complaints first surfaced when he started working for Brothers Coal-Mine.")
import nlu
nlu.load("en.deid.med_ner_large.pipeline").predict("""A. Record date : 2093-01-13, David Hale, M.D., Name : Hendrickson, Ora MR. # 7194334 Date : 01/13/93 PCP : Oliveira, 25-year-old, Record date : 1-11-2000. Cocke County Baptist Hospital. 0295 Keats Street. Phone +1 (302) 786-5227. Patient's complaints first surfaced when he started working for Brothers Coal-Mine.""")

Results

+-----------------------------+--------+
|chunks                       |entities|
+-----------------------------+--------+
|2093-01-13                   |DATE    |
|David Hale                   |NAME    |
|Hendrickson, Ora             |NAME    |
|7194334                      |ID      |
|01/13/93                     |DATE    |
|Oliveira                     |NAME    |
|1-11-2000                    |DATE    |
|Cocke County Baptist Hospital|LOCATION|
|0295 Keats Street            |LOCATION|
|786-5227                     |CONTACT |
|Brothers Coal-Mine           |LOCATION|
+-----------------------------+--------+

Model Information

Model Name: ner_deid_sd_large_pipeline
Type: pipeline
Compatibility: Healthcare NLP 3.4.1+
License: Licensed
Edition: Official
Language: en
Size: 1.7 GB

Included Models

  • DocumentAssembler
  • SentenceDetectorDLModel
  • TokenizerModel
  • WordEmbeddingsModel
  • MedicalNerModel
  • NerConverter