Description
Anonymization and DeIdentification model based on outputs from DeId NERs and Replacement Dictionaries.
Predicted Entities
Personal Information in order to deidentify.
How to use
...
nlpPipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, word_embeddings, clinical_ner, ner_converter])
model = nlpPipeline.fit(spark.createDataFrame([[""]]).toDF("text"))
masker = DeIdentificationModel.pretrained("deidentify_rb","en","clinical/models")\
.setInputCols("sentence","token","chunk")\
.setOutputCol("deidentified")\
.setMode("mask")
text = '''A . Record date : 2093-01-13 , David Hale , M.D . , Name : Hendrickson , Ora MR . # 7194334 Date : 01/13/93 PCP : Oliveira , 25 years-old , Record date : 2079-11-09 . Cocke County Baptist Hospital . 0295 Keats Street'''
result = model.transform(spark.createDataFrame([[text]]).toDF("text"))
deid_text = masker.transform(result)
...
val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, word_embeddings, ner, ner_converter))
val data = Seq("A . Record date : 2093-01-13 , David Hale , M.D . , Name : Hendrickson , Ora MR . # 7194334 Date : 01/13/93 PCP : Oliveira , 25 years-old , Record date : 2079-11-09 . Cocke County Baptist Hospital . 0295 Keats Street").toDF("text")
val result = pipeline.fit(data).transform(data)
val masker = DeIdentificationModel.pretrained("deidentify_rb","en","clinical/models")
.setInputCols(Array("sentence", "token", "chunk"))
.setOutputCol("deidentified")
.setMode("mask")
val deid_text = new masker.transform(result)
import nlu
nlu.load("en.de_identify").predict("""A . Record date : 2093-01-13 , David Hale , M.D . , Name : Hendrickson , Ora MR . # 7194334 Date : 01/13/93 PCP : Oliveira , 25 years-old , Record date : 2079-11-09 . Cocke County Baptist Hospital . 0295 Keats Street""")
Results
| | sentence | deidentified |
|---|---------------------------------------------------------------------------------------|-----------------------------------------------------------------------------|
| 0 | A . | A . |
| 1 | Record date : 2093-01-13 , David Hale , M.D . | Record date : <DATE> , David Hale , M.D . |
| 2 | , Name : Hendrickson , Ora MR . | , Name : Hendrickson , Ora MR . |
| 3 | # 7194334 Date : 01/13/93 PCP : Oliveira , 25 years-old , Record date : 2079-11-09 . | # <ID> Date : <DATE> PCP : Oliveira , 25 years-old , Record date : <DATE> . |
| 4 | Cocke County Baptist Hospital . | Cocke County Baptist Hospital . |
| 5 | 0295 Keats Street | <ID> Keats Street |
Model Information
Name: | deidentify_rb |
Type: | DeIdentificationModel |
Compatibility: | Spark NLP 2.0.2+ |
License: | Licensed |
Edition: | Official |
Input labels: | [document, token, chunk] |
Output labels: | [document] |
Language: | en |
Dependencies: | ner_deid |
Data Source
Rule based DeIdentifier based on ner_deid
.
PREVIOUSPOS Tagger Clinical