Deidentify RB

Description

Anonymization and DeIdentification model based on outputs from DeId NERs and Replacement Dictionaries.

Predicted Entities

Personal Information in order to deidentify.

Download

How to use

...
nlpPipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, word_embeddings, clinical_ner, ner_converter])
model = nlpPipeline.fit(spark.createDataFrame([[""]]).toDF("text"))
masker = DeIdentificationModel.pretrained("deidentify_rb","en","clinical/models")\
	.setInputCols("sentence","token","chunk")\
	.setOutputCol("deidentified")\
    .setMode("mask")
    
text = '''A . Record date : 2093-01-13 , David Hale , M.D . , Name : Hendrickson , Ora MR . # 7194334 Date : 01/13/93 PCP : Oliveira , 25 years-old , Record date : 2079-11-09 . Cocke County Baptist Hospital . 0295 Keats Street'''
result = model.transform(spark.createDataFrame([[text]]).toDF("text"))    
deid_text = masker.transform(result)
...
val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, word_embeddings, ner, ner_converter))
val result = pipeline.fit(Seq.empty['''A . Record date : 2093-01-13 , David Hale , M.D . , Name : Hendrickson , Ora MR . # 7194334 Date : 01/13/93 PCP : Oliveira , 25 years-old , Record date : 2079-11-09 . Cocke County Baptist Hospital . 0295 Keats Street'''].toDS.toDF("text")).transform(data)   

val masker = DeIdentificationModel.pretrained("deidentify_rb","en","clinical/models")
        .setInputCols(Array("sentence", "token", "chunk"))
        .setOutputCol("deidentified")
        .setMode("mask")
val deid_text = new masker.transform(result)

Results

|   | sentence                                                                              | deidentified                                                                |
|---|---------------------------------------------------------------------------------------|-----------------------------------------------------------------------------|
| 0 | A .                                                                                   | A .                                                                         |
| 1 | Record date : 2093-01-13 , David Hale , M.D .                                         | Record date : <DATE> , David Hale , M.D .                                   |
| 2 | , Name : Hendrickson , Ora MR .                                                       | , Name : Hendrickson , Ora MR .                                             |
| 3 | # 7194334 Date : 01/13/93 PCP : Oliveira , 25 years-old , Record date : 2079-11-09 .  | # <ID> Date : <DATE> PCP : Oliveira , 25 years-old , Record date : <DATE> . |
| 4 | Cocke County Baptist Hospital .                                                       | Cocke County Baptist Hospital .                                             |
| 5 | 0295 Keats Street                                                                     | <ID> Keats Street                                                           |

Model Information

Name: deidentify_rb
Type: DeIdentificationModel
Compatibility: Spark NLP 2.0.2+
License: Licensed
Edition: Official
Input labels: [document, token, chunk]
Output labels: [document]
Language: en
Dependencies: ner_deid

Data Source

Rule based DeIdentifier based on ner_deid.