Spanish BERT Base Uncased Embedding


BETO is a BERT model trained on a big Spanish corpus. BETO is of size similar to a BERT-Base and was trained with the Whole Word Masking technique. Below you find Tensorflow and Pytorch checkpoints for the uncased and cased versions, as well as some results for Spanish benchmarks comparing BETO with Multilingual BERT as well as other (not BERT-based) models.

Predicted Entities


How to use

embeddings = BertEmbeddings.pretrained("bert_base_uncased", "es") \
      .setInputCols("sentence", "token") \

nlp_pipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, embeddings])
val embeddings = BertEmbeddings.pretrained("bert_base_uncased", "es")
      .setInputCols("sentence", "token")

val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, embeddings))

Model Information

Model Name: bert_base_uncased
Compatibility: Spark NLP 3.2.2+
License: Open Source
Edition: Official
Input Labels: [sentence, token]
Output Labels: [bert]
Language: es
Case sensitive: true

Data Source

The model is imported from: