Pipeline to Detect PHI in medical text (biobert)

Description

This pretrained pipeline is built on the top of ner_deid_biobert model.

Predicted Entities

LOCATION, CONTACT, PROFESSION, NAME, DATE, ID, AGE

Copy S3 URI

How to use

from sparknlp.pretrained import PretrainedPipeline

pipeline = PretrainedPipeline("ner_deid_biobert_pipeline", "en", "clinical/models")

text = '''A. Record date : 2093-01-13, David Hale, M.D. Name : Hendrickson, Ora MR. # 7194334. PCP : Oliveira, non-smoking. Cocke County Baptist Hospital. 0295 Keats Street. Phone +1 (302) 786-5227. Patient's complaints first surfaced when he started working for Brothers Coal-Mine.'''

result = pipeline.fullAnnotate(text)
import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline

val pipeline = new PretrainedPipeline("ner_deid_biobert_pipeline", "en", "clinical/models")

val text = "A. Record date : 2093-01-13, David Hale, M.D. Name : Hendrickson, Ora MR. # 7194334. PCP : Oliveira, non-smoking. Cocke County Baptist Hospital. 0295 Keats Street. Phone +1 (302) 786-5227. Patient's complaints first surfaced when he started working for Brothers Coal-Mine."

val result = pipeline.fullAnnotate(text)
import nlu
nlu.load("en.deid.ner_biobert.pipeline").predict("""A. Record date : 2093-01-13, David Hale, M.D. Name : Hendrickson, Ora MR. # 7194334. PCP : Oliveira, non-smoking. Cocke County Baptist Hospital. 0295 Keats Street. Phone +1 (302) 786-5227. Patient's complaints first surfaced when he started working for Brothers Coal-Mine.""")

Results

|    | ner_chunk                     |   begin |   end | ner_label   |   confidence |
|---:|:------------------------------|--------:|------:|:------------|-------------:|
|  0 | 2093-01-13                    |      17 |    26 | DATE        |      0.981   |
|  1 | David Hale                    |      29 |    38 | NAME        |      0.77585 |
|  2 | Hendrickson                   |      53 |    63 | NAME        |      0.9666  |
|  3 | Ora                           |      66 |    68 | LOCATION    |      0.8723  |
|  4 | Oliveira                      |      91 |    98 | LOCATION    |      0.7785  |
|  5 | Cocke County Baptist Hospital |     114 |   142 | LOCATION    |      0.792   |
|  6 | Keats Street                  |     150 |   161 | LOCATION    |      0.77305 |
|  7 | Phone                         |     164 |   168 | LOCATION    |      0.7083  |
|  8 | Brothers                      |     253 |   260 | LOCATION    |      0.9447  |

Model Information

Model Name: ner_deid_biobert_pipeline
Type: pipeline
Compatibility: Healthcare NLP 4.4.4+
License: Licensed
Edition: Official
Language: en
Size: 422.0 MB

Included Models

  • DocumentAssembler
  • SentenceDetectorDLModel
  • TokenizerModel
  • BertEmbeddings
  • MedicalNerModel
  • NerConverter