BERT Sentence Embeddings trained on MEDLINE/PubMed

Description

This model uses a BERT base architecture pretrained from scratch on MEDLINE/PubMed. This is a BERT base architecture but some changes have been made to the original training and export scheme based on more recent learnings that improve its accuracy over the original BERT base checkpoint.

This model is intended to be used for a variety of English NLP tasks in the medical domain. The pre-training data contains more medical text and the model may not generalize to text outside of that domain.

Download Copy S3 URI

How to use

sent_embeddings = BertSentenceEmbeddings.pretrained("sent_bert_pubmed", "en") \
.setInputCols("sentence") \
.setOutputCol("bert_sentence")

nlp_pipeline = Pipeline(stages=[document_assembler, sentence_detector, sent_embeddings ])

val sent_embeddings = BertSentenceEmbeddings.pretrained("sent_bert_pubmed", "en")
.setInputCols("sentence")
.setOutputCol("bert_sentence")

val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, sent_embeddings ))

import nlu

text = ["I love NLP"]
sent_embeddings_df = nlu.load('en.embed_sentence.bert.pubmed').predict(text, output_level='sentence')
sent_embeddings_df

Model Information

Model Name:	sent_bert_pubmed
Compatibility:	Spark NLP 3.2.0+
License:	Open Source
Edition:	Official
Input Labels:	[sentence]
Output Labels:	[bert_sentence]
Language:	en
Case sensitive:	false

Data Source

This Model has been imported from: https://tfhub.dev/google/experts/bert/pubmed/2

PREVIOUSSentence Detection in Yiddish Text

NEXTBERT Sentence Embeddings trained on MEDLINE/PubMed and fine-tuned on SQuAD 2.0