Tensorflow Graph

NER DL uses Char CNNs - BiLSTM - CRF Neural Network architecture. Spark NLP defines this architecture through a Tensorflow graph, which requires the following parameters:

Tags
Embeddings Dimension
Number of Chars

Spark NLP infers these values from the training dataset used in NerDLApproach annotator and tries to load the graph embedded on spark-nlp package. Currently, Spark NLP has graphs for the most common combination of tags, embeddings, and number of chars values:

Tags	Embeddings Dimension
10	100
10	200
10	300
10	768
10	1024
25	300

All of these graphs use an LSTM of size 128 and number of chars 100

In case, your train dataset has a different number of tags, embeddings dimension, number of chars and LSTM size combinations shown in the table above, NerDLApproach will raise an IllegalArgumentException exception during runtime with the message below:

Graph [parameter] should be [value]: Could not find a suitable tensorflow graph for embeddings dim: [value] tags: [value] nChars: [value]. Check https://nlp.johnsnowlabs.com/docs/en/graph for instructions to generate the required graph.

To overcome this exception message we have to follow these steps:

Clone spark-nlp github repo

Run python file create_models with number of tags, embeddings dimension and number of char values mentioned on your exception message error.

 cd spark-nlp/python/tensorflow
 export PYTHONPATH=lib/ner
 python ner/create_models.py [number_of_tags] [embeddings_dimension] [number_of_chars] [output_path]

This will generate a graph on the directory defined on `output_path argument.
Retry training with NerDLApproach annotator but this time use the parameter setGraphFolder with the path of your graph.

Note: Make sure that you have Python 3 and Tensorflow 1.15.0 installed on your system since create_models requires those versions to generate the graph successfully.

Note: We also have a notebook in the same directory if you prefer Jupyter notebook to cerate your custom graph (create_models.ipynb).