Pipeline to Detect PHI for Deidentification (BertForTokenClassifier)

Description

This pretrained pipeline is built on the top of bert_token_classifier_ner_deid model.

Predicted Entities

AGE, BIOID, CITY, COUNTRY, DATE, DEVICE, DOCTOR, EMAIL, FAX, HEALTHPLAN, HOSPITAL, IDNUM, LOCATION-OTHER, MEDICALRECORD, ORGANIZATION, PATIENT, PHONE, PROFESSION, STATE, STREET, URL, USERNAME, ZIP

Copy S3 URI

How to use

from sparknlp.pretrained import PretrainedPipeline

pipeline = PretrainedPipeline("bert_token_classifier_ner_deid_pipeline", "en", "clinical/models")

text = '''A. Record date : 2093-01-13, David Hale, M.D. Name : Hendrickson, Ora MR. # 7194334. PCP : Oliveira, non-smoking. Cocke County Baptist Hospital. 0295 Keats Street. Phone +1 (302) 786-5227. Patient's complaints first surfaced when he started working for Brothers Coal-Mine.'''

result = pipeline.fullAnnotate(text)
import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline

val pipeline = new PretrainedPipeline("bert_token_classifier_ner_deid_pipeline", "en", "clinical/models")

val text = "A. Record date : 2093-01-13, David Hale, M.D. Name : Hendrickson, Ora MR. # 7194334. PCP : Oliveira, non-smoking. Cocke County Baptist Hospital. 0295 Keats Street. Phone +1 (302) 786-5227. Patient's complaints first surfaced when he started working for Brothers Coal-Mine."

val result = pipeline.fullAnnotate(text)
import nlu
nlu.load("en.classify.token_bert.ner_deid.pipeline").predict("""A. Record date : 2093-01-13, David Hale, M.D. Name : Hendrickson, Ora MR. # 7194334. PCP : Oliveira, non-smoking. Cocke County Baptist Hospital. 0295 Keats Street. Phone +1 (302) 786-5227. Patient's complaints first surfaced when he started working for Brothers Coal-Mine.""")

Results

|    | ner_chunk                     |   begin |   end | ner_label     |   confidence |
|---:|:------------------------------|--------:|------:|:--------------|-------------:|
|  0 | 2093-01-13                    |      17 |    26 | DATE          |     0.957256 |
|  1 | David Hale                    |      29 |    38 | DOCTOR        |     0.983641 |
|  2 | Hendrickson, Ora              |      53 |    68 | PATIENT       |     0.992943 |
|  3 | 7194334                       |      76 |    82 | MEDICALRECORD |     0.999349 |
|  4 | Oliveira                      |      91 |    98 | DOCTOR        |     0.763455 |
|  5 | Cocke County Baptist Hospital |     114 |   142 | HOSPITAL      |     0.999558 |
|  6 | 0295 Keats Street             |     145 |   161 | STREET        |     0.997889 |
|  7 | 302) 786-5227                 |     174 |   186 | PHONE         |     0.970114 |
|  8 | Brothers Coal-Mine            |     253 |   270 | ORGANIZATION  |     0.998911 |

Model Information

Model Name: bert_token_classifier_ner_deid_pipeline
Type: pipeline
Compatibility: Healthcare NLP 4.3.0+
License: Licensed
Edition: Official
Language: en
Size: 405.0 MB

Included Models

  • DocumentAssembler
  • SentenceDetectorDLModel
  • TokenizerModel
  • MedicalBertForTokenClassifier
  • NerConverterInternalModel