Description
This model is trained to generate contextual sentence embeddings of input sentences. It has been fine-tuned on MedNLI dataset to provide sota performance on STS and SentEval Benchmarks.
How to use
from johnsnowlabs import nlp
documentAssembler = nlp.DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")
sentence = nlp.SentenceDetector() \
.setInputCols(["document"]) \
.setOutputCol("sentence")
embeddings = nlp.BertSentenceEmbeddings.pretrained("sbluebert_base_uncased_mli_openvino", "en", "clinical/models") \
.setInputCols(["sentence"]) \
.setOutputCol("sentence_bert_embeddings")
embeddingsFinisher = nlp.EmbeddingsFinisher() \
.setInputCols(["sentence_bert_embeddings"]) \
.setOutputCols("finished_embeddings") \
.setOutputAsVector(True)
pipeline = nlp.Pipeline().setStages([
documentAssembler,
sentence,
embeddings,
embeddingsFinisher
])
data = spark.createDataFrame([
['William Henry Gates III (born October 28, 1955) is an American business magnate, software developer, investor, and philanthropist.']
]).toDF("text")
model = pipeline.fit(data)
result = model.transform(data)
result.selectExpr("explode(finished_embeddings) as result").show(5, 80)
import spark.implicits._
import com.johnsnowlabs.nlp.base.DocumentAssembler
import com.johnsnowlabs.nlp.annotator.SentenceDetector
import com.johnsnowlabs.nlp.embeddings.BertSentenceEmbeddings
import com.johnsnowlabs.nlp.EmbeddingsFinisher
import org.apache.spark.ml.Pipeline
val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentenceDetector = new SentenceDetector()
.setInputCols("document")
.setOutputCol("sentence")
val embeddings = BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli_openvino", "en", "clinical/models")
.setInputCols("sentence")
.setOutputCol("sentence_bert_embeddings")
val embeddingsFinisher = new EmbeddingsFinisher()
.setInputCols("sentence_bert_embeddings")
.setOutputCols("finished_embeddings")
.setOutputAsVector(true)
val pipeline = new Pipeline().setStages(Array(
documentAssembler,
sentenceDetector,
embeddings,
embeddingsFinisher
))
val data = Seq(
"William Henry Gates III (born October 28, 1955) is an American business magnate, software developer, investor, and philanthropist."
).toDF("text")
val model = pipeline.fit(data)
val result = model.transform(data)
result.selectExpr("explode(finished_embeddings) as result").show(5, 80)
Results
+--------------------------------------------------------------------------------+
| result|
+--------------------------------------------------------------------------------+
|[-0.4706500768661499,-0.605597734451294,0.6036401987075806,-0.590848147869110...|
+--------------------------------------------------------------------------------+
Model Information
Model Name: | sbluebert_base_uncased_mli_openvino |
Compatibility: | Spark NLP 6.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [document] |
Output Labels: | [bert] |
Language: | en |
Size: | 407.1 MB |