Deidentify RB

Description

Anonymization and DeIdentification model based on outputs from DeId NERs and Replacement Dictionaries.

Predicted Entities

Personal Information in order to deidentify.

Copy S3 URI

How to use

...
nlpPipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, word_embeddings, clinical_ner, ner_converter])
model = nlpPipeline.fit(spark.createDataFrame([[""]]).toDF("text"))
masker = DeIdentificationModel.pretrained("deidentify_rb","en","clinical/models")\
	.setInputCols("sentence","token","chunk")\
	.setOutputCol("deidentified")\
.setMode("mask")

text = '''A . Record date : 2093-01-13 , David Hale , M.D . , Name : Hendrickson , Ora MR . # 7194334 Date : 01/13/93 PCP : Oliveira , 25 years-old , Record date : 2079-11-09 . Cocke County Baptist Hospital . 0295 Keats Street'''
result = model.transform(spark.createDataFrame([[text]]).toDF("text"))    
deid_text = masker.transform(result)
...
val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, word_embeddings, ner, ner_converter))
val data = Seq("A . Record date : 2093-01-13 , David Hale , M.D . , Name : Hendrickson , Ora MR . # 7194334 Date : 01/13/93 PCP : Oliveira , 25 years-old , Record date : 2079-11-09 . Cocke County Baptist Hospital . 0295 Keats Street").toDF("text")
val result = pipeline.fit(data).transform(data)

val masker = DeIdentificationModel.pretrained("deidentify_rb","en","clinical/models")
.setInputCols(Array("sentence", "token", "chunk"))
.setOutputCol("deidentified")
.setMode("mask")
val deid_text = new masker.transform(result)

import nlu
nlu.load("en.de_identify").predict("""A . Record date : 2093-01-13 , David Hale , M.D . , Name : Hendrickson , Ora MR . # 7194334 Date : 01/13/93 PCP : Oliveira , 25 years-old , Record date : 2079-11-09 . Cocke County Baptist Hospital . 0295 Keats Street""")

Results

|   | sentence                                                                              | deidentified                                                                |
|---|---------------------------------------------------------------------------------------|-----------------------------------------------------------------------------|
| 0 | A .                                                                                   | A .                                                                         |
| 1 | Record date : 2093-01-13 , David Hale , M.D .                                         | Record date : <DATE> , David Hale , M.D .                                   |
| 2 | , Name : Hendrickson , Ora MR .                                                       | , Name : Hendrickson , Ora MR .                                             |
| 3 | # 7194334 Date : 01/13/93 PCP : Oliveira , 25 years-old , Record date : 2079-11-09 .  | # <ID> Date : <DATE> PCP : Oliveira , 25 years-old , Record date : <DATE> . |
| 4 | Cocke County Baptist Hospital .                                                       | Cocke County Baptist Hospital .                                             |
| 5 | 0295 Keats Street                                                                     | <ID> Keats Street                                                           |

Model Information

Name: deidentify_rb
Type: DeIdentificationModel
Compatibility: Spark NLP 2.0.2+
License: Licensed
Edition: Official
Input labels: [document, token, chunk]
Output labels: [document]
Language: en
Dependencies: ner_deid

Data Source

Rule based DeIdentifier based on ner_deid.