Deidentification NER (Large)

Description

Deidentification NER (Large) is a Named Entity Recognition model that annotates text to find protected health information that may need to be deidentified. The entities it annotates are Age, Contact, Date, Id, Location, Name, and Profession. Clinical NER is trained with the ‘embeddings_clinical’ word embeddings model, so be sure to use the same embeddings in the pipeline.

Predicted Labels

Age, Contact, Date, Id, Location, Name, and Profession

Live Demo Open in Colab Download

How to use


ner = NerDLModel.pretrained("ner_deid_large", "en") \
        .setInputCols(["document", "token", "embeddings"]) \
        .setOutputCol("ner")

val ner = NerDLModel.pretrained("ner_deid_large", "en")
        .setInputCols(Array("document", "token", "embeddings"))
        .setOutputCol("ner")

Model Information

Model Name: ner_deid_large
Type: ner
Compatibility: Spark NLP for Healthcare 2.4.2+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en
Case sensitive: false

Data Source

The model is trained based on data from https://portal.dbmi.hms.harvard.edu/projects/n2c2-2014/