Onto is a Named Entity Recognition (or NER) model, meaning it annotates text to find features like the names of people, places, and organizations. Onto was trained on the OntoNotes text corpus. This NER model does not read words directly but instead reads word embeddings, which represent words as points such that more semantically similar words are closer together. Onto 300 is trained with GloVe 840B 300 word embeddings, so be sure to use the same embeddings in the pipeline.
How to use
ner = NerDLModel.pretrained("onto_300", "en") \ .setInputCols(["document", "token", "embeddings"]) \ .setOutputCol("ner")
val ner = NerDLModel.pretrained("onto_300", "en") .setInputCols(Array("document", "token", "embeddings")) .setOutputCol("ner")
|Compatibility:||Spark NLP 2.4.0+|
|Input Labels:||[sentence, token, embeddings]|
The model is trained based on data fromhttps://catalog.ldc.upenn.edu/LDC2013T19