Pipeline to Detect PHI for Generic Deidentification in Romanian (BERT)

Description

This pretrained pipeline is built on the top of ner_deid_generic_bert model.

Predicted Entities

Copy S3 URI

How to use

from sparknlp.pretrained import PretrainedPipeline

pipeline = PretrainedPipeline("ner_deid_generic_bert_pipeline", "ro", "clinical/models")

text = '''Spitalul Pentru Ochi de Deal, Drumul Oprea Nr. 972 Vaslui, 737405 România
Tel: +40(235)413773
Data setului de analize: 25 May 2022 15:36:00
Nume si Prenume : BUREAN MARIA, Varsta: 77
Medic : Agota Evelyn Tımar
C.N.P : 2450502264401'''

result = pipeline.fullAnnotate(text)
import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline

val pipeline = new PretrainedPipeline("ner_deid_generic_bert_pipeline", "ro", "clinical/models")

val text = "Spitalul Pentru Ochi de Deal, Drumul Oprea Nr. 972 Vaslui, 737405 România
Tel: +40(235)413773
Data setului de analize: 25 May 2022 15:36:00
Nume si Prenume : BUREAN MARIA, Varsta: 77
Medic : Agota Evelyn Tımar
C.N.P : 2450502264401"

val result = pipeline.fullAnnotate(text)

Results

|    | ner_chunks                   |   begin |   end | ner_label   |   confidence |
|---:|:-----------------------------|--------:|------:|:------------|-------------:|
|  0 | Spitalul Pentru Ochi de Deal |       0 |    27 | LOCATION    |     0.99352  |
|  1 | Drumul Oprea Nr. 972         |      30 |    49 | LOCATION    |     0.99994  |
|  2 | Vaslui                       |      51 |    56 | LOCATION    |     1        |
|  3 | 737405                       |      59 |    64 | LOCATION    |     1        |
|  4 | +40(235)413773               |      79 |    92 | CONTACT     |     1        |
|  5 | 25 May 2022                  |     119 |   129 | DATE        |     1        |
|  6 | si                           |     145 |   146 | NAME        |     0.9998   |
|  7 | BUREAN MARIA                 |     158 |   169 | NAME        |     0.9993   |
|  8 | 77                           |     180 |   181 | AGE         |     1        |
|  9 | Agota Evelyn Tımar           |     191 |   210 | NAME        |     0.859975 |
|    | C                            |         |       |             |              |
| 10 | 2450502264401                |     218 |   230 | ID          |     1        |

Model Information

Model Name: ner_deid_generic_bert_pipeline
Type: pipeline
Compatibility: Healthcare NLP 4.4.4+
License: Licensed
Edition: Official
Language: ro
Size: 483.8 MB

Included Models

  • DocumentAssembler
  • SentenceDetectorDLModel
  • TokenizerModel
  • BertEmbeddings
  • MedicalNerModel
  • NerConverterInternalModel