Deidentification NER (Large)

Description

Named Entity recognition annotator allows for a generic model to be trained by utilizing a deep learning algorithm (Char CNNs - BiLSTM - CRF - word embeddings) inspired on a former state of the art model for NER: Chiu & Nicols, Named Entity Recognition with Bidirectional LSTM,CNN. Deidentification NER (Large) is a Named Entity Recognition model that annotates text to find protected health information that may need to be deidentified. The entities it annotates are Age, Contact, Date, Id, Location, Name, and Profession. Clinical NER is trained with the ‘embeddings_clinical’ word embeddings model, so be sure to use the same embeddings in the pipeline.

Predicted Entities

Age, Contact, Date, Id, Location, Name, Profession

Live DemoOpen in ColabDownload

How to use

model = NerDLModel.pretrained("ner_deid_large","en","clinical/models")
	.setInputCols("sentence","token","word_embeddings")
	.setOutputCol("ner")
val model = NerDLModel.pretrained("ner_deid_large","en","clinical/models")
	.setInputCols("sentence","token","word_embeddings")
	.setOutputCol("ner")

Model Information

Name: ner_deid_large  
Type: NerDLModel  
Compatibility: Spark NLP 2.4.2+  
License: Licensed  
Edition: Official  
Input labels: [sentence, token, word_embeddings]  
Output labels: [ner]  
Language: en  
Case sensitive: False  
Dependencies: embeddings_clinical  

Data Source

Trained on plain n2c2 2014: De-identification and Heart Disease Risk Factors Challenge datasets with embeddings_clinical https://portal.dbmi.hms.harvard.edu/projects/n2c2-2014/