BERT Embeddings trained on MEDLINE/PubMed


This model uses a BERT base architecture1 pretrained from scratch on MEDLINE/PubMed

This is a BERT base architecture but some changes have been made to the original training and export scheme based on more recent learnings that improve its accuracy over the original BERT base checkpoint


How to use

embeddings = BertEmbeddings.pretrained("bert_pubmed", "en") \
      .setInputCols("sentence", "token") \

nlp_pipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, embeddings])
val embeddings = BertEmbeddings.pretrained("bert_pubmed", "en")
      .setInputCols("sentence", "token")

val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, embeddings))
import nlu

text = ["I love NLP"]
embeddings_df = nlu.load('en.embed.bert.pubmed').predict(text, output_level='token')

Model Information

Model Name: bert_pubmed
Compatibility: Spark NLP 3.2.0+
License: Open Source
Edition: Official
Input Labels: [sentence, token]
Output Labels: [bert]
Language: en
Case sensitive: false

Data Source

This Model has been imported from: