NER DL uses Char CNNs - BiLSTM - CRF Neural Network architecture. Spark NLP defines this architecture through a Tensorflow graph, which requires the following parameters:
- Embeddings Dimension
- Number of Chars
Spark NLP infers these values from the training dataset used in NerDLApproach annotator and tries to load the graph embedded on spark-nlp package. Currently, Spark NLP has graphs for the most common combination of tags, embeddings, and number of chars values:
All of these graphs use an LSTM of size 128 and number of chars 100
In case, your train dataset has a different number of tags, embeddings dimension, number of chars and LSTM size combinations shown in the table above,
NerDLApproach will raise an IllegalArgumentException exception during runtime with the message below:
Graph [parameter] should be [value]: Could not find a suitable tensorflow graph for embeddings dim: [value] tags: [value] nChars: [value]. Check https://nlp.johnsnowlabs.com/docs/en/graph for instructions to generate the required graph.
To overcome this exception message we have to follow these steps:
Clone spark-nlp github repo
Go to python/tensorflow/ner/ path
- Run python file
create_modelswith number of tags, embeddings dimension and number of char values mentioned on your exception message error.
python create_models.py [number_of_tags] [embeddings_dimension] [number_of_chars] [output_path]
This will generate a graph on the directory defined on `output_path argument.
- Retry training with
NerDLApproachannotator but this time use the parameter
setGraphFolderwith the path of your graph.
Note: Make sure that you have Python 3 and Tensorflow 1.12.0 installed on your system since
create_models requires those versions to generate the graph successfully