This model uses a BERT base architecture pretrained from scratch on Wikipedia and BooksCorpus. This is a BERT base architecture but some changes have been made to the original training and export scheme based on more recent learning that improve its accuracy over the original BERT base checkpoint.
This model is intended to be used for a variety of English NLP tasks. The pre-training data contains more formal text and the model may not generalize to more colloquial text such as social media or messages.
How to use
sent_embeddings = BertSentenceEmbeddings.pretrained("sent_bert_wiki_books", "en") \ .setInputCols("sentence") \ .setOutputCol("bert_sentence") nlp_pipeline = Pipeline(stages=[document_assembler, sentence_detector, sent_embeddings ])
val sent_embeddings = BertSentenceEmbeddings.pretrained("sent_bert_wiki_books", "en") .setInputCols("sentence") .setOutputCol("bert_sentence") val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, sent_embeddings ))
import nlu text = ["I love NLP"] sent_embeddings_df = nlu.load('en.embed_sentence.bert.wiki_books').predict(text, output_level='sentence') sent_embeddings_df
|Compatibility:||Spark NLP 3.2.0+|
This Model has been imported from: https://tfhub.dev/google/experts/bert/wiki_books/2