Deidentification NER (Enriched)

Description

Deidentification NER (Enriched) is a Named Entity Recognition model that annotates text to find protected health information that may need to be deidentified. Clinical NER is trained with the ‘embeddings_clinical’ word embeddings model, so be sure to use the same embeddings in the pipeline.

Predicted Labels

Age, City, Country, Date, Doctor, Hospital, Idnum, Medicalrecord, Organization, Patient, Phone, Profession, State, Street, Username, and Zip.

Live Demo Open in Colab Download

How to use


ner = NerDLModel.pretrained("ner_deid_enriched", "en") \
        .setInputCols(["document", "token", "embeddings"]) \
        .setOutputCol("ner")

val ner = NerDLModel.pretrained("ner_deid_enriched", "en")
        .setInputCols(Array("document", "token", "embeddings"))
        .setOutputCol("ner")

Model Information

Model Name: ner_deid_enriched
Type: ner
Compatibility: Spark NLP for Healthcare 2.4.2+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en
Case sensitive: false

Data Source

The model is trained based on data from https://portal.dbmi.hms.harvard.edu/projects/n2c2-2014/