Pipeline to Detect PHI in text (enriched-biobert)

Description

This pretrained pipeline is built on the top of ner_deid_enriched_biobert model.

Predicted Entities

AGE, BIOID, CITY, COUNTRY, DATE, DEVICE, DOCTOR, EMAIL, FAX, HEALTHPLAN, HOSPITAL, IDNUM, LOCATION-OTHER, MEDICALRECORD, ORGANIZATION, PATIENT, PHONE, PROFESSION, STATE, STREET, URL, USERNAME, ZIP

Copy S3 URI

How to use

from sparknlp.pretrained import PretrainedPipeline

pipeline = PretrainedPipeline("ner_deid_enriched_biobert_pipeline", "en", "clinical/models")

text = '''A. Record date : 2093-01-13, David Hale, M.D. Name : Hendrickson, Ora MR. # 7194334. PCP : Oliveira, non-smoking. Cocke County Baptist Hospital. 0295 Keats Street. Phone +1 (302) 786-5227. Patient's complaints first surfaced when he started working for Brothers Coal-Mine.'''

result = pipeline.fullAnnotate(text)
import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline

val pipeline = new PretrainedPipeline("ner_deid_enriched_biobert_pipeline", "en", "clinical/models")

val text = "A. Record date : 2093-01-13, David Hale, M.D. Name : Hendrickson, Ora MR. # 7194334. PCP : Oliveira, non-smoking. Cocke County Baptist Hospital. 0295 Keats Street. Phone +1 (302) 786-5227. Patient's complaints first surfaced when he started working for Brothers Coal-Mine."

val result = pipeline.fullAnnotate(text)
import nlu
nlu.load("en.deid.ner_enriched_biobert.pipeline").predict("""A. Record date : 2093-01-13, David Hale, M.D. Name : Hendrickson, Ora MR. # 7194334. PCP : Oliveira, non-smoking. Cocke County Baptist Hospital. 0295 Keats Street. Phone +1 (302) 786-5227. Patient's complaints first surfaced when he started working for Brothers Coal-Mine.""")

Results

|    | ner_chunk                     |   begin |   end | ner_label    |   confidence |
|---:|:------------------------------|--------:|------:|:-------------|-------------:|
|  0 | 2093-01-13                    |      17 |    26 | DATE         |     0.9267   |
|  1 | David Hale                    |      29 |    38 | DOCTOR       |     0.7949   |
|  2 | Hendrickson, Ora              |      53 |    68 | PATIENT      |     0.637733 |
|  3 | 7194334                       |      76 |    82 | PHONE        |     0.4939   |
|  4 | Cocke County Baptist Hospital |     114 |   142 | HOSPITAL     |     0.6199   |
|  5 | 0295 Keats Street             |     145 |   161 | STREET       |     0.592433 |
|  6 | 302) 786-5227                 |     174 |   186 | PHONE        |     0.846833 |
|  7 | Brothers Coal-Mine            |     253 |   270 | ORGANIZATION |     0.45085  |

Model Information

Model Name: ner_deid_enriched_biobert_pipeline
Type: pipeline
Compatibility: Healthcare NLP 4.3.0+
License: Licensed
Edition: Official
Language: en
Size: 422.2 MB

Included Models

  • DocumentAssembler
  • SentenceDetectorDLModel
  • TokenizerModel
  • BertEmbeddings
  • MedicalNerModel
  • NerConverterInternalModel