Description
This pipeline is trained with w2v_cc_300d
Romanian embeddings and can be used to deidentify PHI information from medical texts in Romanian. The PHI information will be masked and obfuscated in the resulting text. The pipeline can mask, fake or obfuscate the following entities: AGE
, CITY
, COUNTRY
, DATE
, DOCTOR
, EMAIL
, FAX
, HOSPITAL
, IDNUM
, LOCATION-OTHER
, MEDICALRECORD
, ORGANIZATION
, PATIENT
, PHONE
, PROFESSION
, STREET
, ZIP
, ACCOUNT
, LICENSE
, PLATE
Live Demo Open in Colab Copy S3 URI
How to use
from sparknlp.pretrained import PretrainedPipeline
deid_pipeline = PretrainedPipeline("clinical_deidentification", "ro", "clinical/models")
sample = """Medic : Dr. Agota EVELYN, C.N.P : 2450502264401, Data setului de analize: 25 May 2022
Varsta : 77, Nume si Prenume : BUREAN MARIA
Tel: +40(235)413773, E-mail : hale@gmail.com,
Licență : B004256985M, Înmatriculare : CD205113, Cont : FXHZ7170951927104999,
Spitalul Pentru Ochi de Deal Drumul Oprea Nr. 972 Vaslui, 737405 """
result = deid_pipeline.annotate(sample)
import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline
val deid_pipeline = new PretrainedPipeline("clinical_deidentification", "ro", "clinical/models")
val sample = """Medic : Dr. Agota EVELYN, C.N.P : 2450502264401, Data setului de analize: 25 May 2022
Varsta : 77, Nume si Prenume : BUREAN MARIA
Tel: +40(235)413773, E-mail : hale@gmail.com,
Licență : B004256985M, Înmatriculare : CD205113, Cont : FXHZ7170951927104999,
Spitalul Pentru Ochi de Deal Drumul Oprea Nr. 972 Vaslui, 737405 """
val result = deid_pipeline.annotate(sample)
import nlu
nlu.load("ro.deid.clinical").predict("""Medic : Dr. Agota EVELYN, C.N.P : 2450502264401, Data setului de analize: 25 May 2022
Varsta : 77, Nume si Prenume : BUREAN MARIA
Tel: +40(235)413773, E-mail : hale@gmail.com,
Licență : B004256985M, Înmatriculare : CD205113, Cont : FXHZ7170951927104999,
Spitalul Pentru Ochi de Deal Drumul Oprea Nr. 972 Vaslui, 737405 """)
Results
Masked with entity labels
------------------------------
Medic : Dr. <DOCTOR>, C.N.P : <IDNUM>, Data setului de analize: <DATE>
Varsta : <AGE>, Nume si Prenume : <PATIENT>
Tel: <PHONE>, E-mail : <EMAIL>,
Licență : <LICENSE>, Înmatriculare : <PLATE>, Cont : <ACCOUNT>,
<HOSPITAL> <STREET> <CITY>, <ZIP>
Masked with chars
------------------------------
Medic : Dr. [**********], C.N.P : [***********], Data setului de analize: [*********]
Varsta : **, Nume si Prenume : [**********]
Tel: [************], E-mail : [************],
Licență : [*********], Înmatriculare : [******], Cont : [******************],
[**************************] [******************] [****], [****]
Masked with fixed length chars
------------------------------
Medic : Dr. ****, C.N.P : ****, Data setului de analize: ****
Varsta : ****, Nume si Prenume : ****
Tel: ****, E-mail : ****,
Licență : ****, Înmatriculare : ****, Cont : ****,
**** **** ****, ****
Obfuscated
------------------------------
Medic : Dr. Doina Gheorghiu, C.N.P : 6794561192919, Data setului de analize: 01-04-2001
Varsta : 91, Nume si Prenume : Dragomir Emilia
Tel: 0248 551 376, E-mail : tudorsmaranda@kappa.ro,
Licență : T003485962M, Înmatriculare : AR-65-UPQ, Cont : KHHO5029180812813651,
Centrul Medical de Evaluare si Recuperare pentru Copii si Tineri Cristian Serban Buzias Aleea Voinea Curcani, 328479
Model Information
Model Name: | clinical_deidentification |
Type: | pipeline |
Compatibility: | Healthcare NLP 4.0.0+ |
License: | Licensed |
Edition: | Official |
Language: | ro |
Size: | 1.2 GB |
Included Models
- DocumentAssembler
- SentenceDetectorDLModel
- TokenizerModel
- WordEmbeddingsModel
- MedicalNerModel
- NerConverter
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ChunkMergeModel
- ChunkMergeModel
- DeIdentificationModel
- DeIdentificationModel
- DeIdentificationModel
- DeIdentificationModel
- Finisher