Pipeline to Detect PHI for Deidentification in Romanian (BERT)

Description

This pretrained pipeline is built on the top of ner_deid_subentity_bert model.

Predicted Entities

Copy S3 URI

How to use

from sparknlp.pretrained import PretrainedPipeline

pipeline = PretrainedPipeline("ner_deid_subentity_bert_pipeline", "ro", "clinical/models")

text = '''Spitalul Pentru Ochi de Deal, Drumul Oprea Nr. 972 Vaslui, 737405 România
Tel: +40(235)413773
Data setului de analize: 25 May 2022 15:36:00
Nume si Prenume : BUREAN MARIA, Varsta: 77
Medic : Agota Evelyn Tımar
C.N.P : 2450502264401'''

result = pipeline.fullAnnotate(text)
import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline

val pipeline = new PretrainedPipeline("ner_deid_subentity_bert_pipeline", "ro", "clinical/models")

val text = "Spitalul Pentru Ochi de Deal, Drumul Oprea Nr. 972 Vaslui, 737405 România
Tel: +40(235)413773
Data setului de analize: 25 May 2022 15:36:00
Nume si Prenume : BUREAN MARIA, Varsta: 77
Medic : Agota Evelyn Tımar
C.N.P : 2450502264401"

val result = pipeline.fullAnnotate(text)

Results

|    | ner_chunks                   |   begin |   end | ner_label   |   confidence |
|---:|:-----------------------------|--------:|------:|:------------|-------------:|
|  0 | Spitalul Pentru Ochi de Deal |       0 |    27 | HOSPITAL    |     0.84306  |
|  1 | Drumul Oprea Nr. 972         |      30 |    49 | STREET      |     0.99784  |
|  2 | Vaslui                       |      51 |    56 | CITY        |     0.9896   |
|  3 | 737405                       |      59 |    64 | ZIP         |     1        |
|  4 | +40(235)413773               |      79 |    92 | PHONE       |     1        |
|  5 | 25 May 2022                  |     119 |   129 | DATE        |     1        |
|  6 | BUREAN MARIA                 |     158 |   169 | PATIENT     |     0.7259   |
|  7 | 77                           |     180 |   181 | AGE         |     1        |
|  8 | Agota Evelyn Tımar           |     191 |   208 | DOCTOR      |     0.803667 |
|  9 | 2450502264401                |     218 |   230 | IDNUM       |     0.9995   |

Model Information

Model Name: ner_deid_subentity_bert_pipeline
Type: pipeline
Compatibility: Healthcare NLP 4.4.4+
License: Licensed
Edition: Official
Language: ro
Size: 484.0 MB

Included Models

  • DocumentAssembler
  • SentenceDetectorDLModel
  • TokenizerModel
  • BertEmbeddings
  • MedicalNerModel
  • NerConverterInternalModel