German Financial Bert Word Embeddings (Annual Financial Statements)

Description

Pretrained Financial Bert Word Embeddings model, trained on German Financial Statements. Uploaded to Hugging Face, adapted and imported into Spark NLP. german-financial-statements-bert is a German Financial model orginally trained upon 100,000 natural language annual financial statements.

Download Copy S3 URI

How to use

documentAssembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")

tokenizer = Tokenizer() \
.setInputCols("document") \
.setOutputCol("token")

embeddings = BertEmbeddings.pretrained("bert_embeddings_german_financial_statements_bert","de") \
.setInputCols(["document", "token"]) \
.setOutputCol("embeddings")

pipeline = Pipeline(stages=[documentAssembler, tokenizer, embeddings])

data = spark.createDataFrame([["Ich liebe Funken NLP"]]).toDF("text")

result = pipeline.fit(data).transform(data)
val documentAssembler = new DocumentAssembler() 
.setInputCol("text") 
.setOutputCol("document")

val tokenizer = new Tokenizer() 
.setInputCols(Array("document"))
.setOutputCol("token")

val embeddings = BertEmbeddings.pretrained("bert_embeddings_german_financial_statements_bert","de") 
.setInputCols(Array("document", "token")) 
.setOutputCol("embeddings")

val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings))

val data = Seq("Ich liebe Funken NLP").toDF("text")

val result = pipeline.fit(data).transform(data)
import nlu
nlu.load("de.embed.german_financial_statements_bert").predict("""Ich liebe Funken NLP""")

Model Information

Model Name: bert_embeddings_german_financial_statements_bert
Compatibility: Spark NLP 3.4.2+
License: Open Source
Edition: Official
Input Labels: [sentence, token]
Output Labels: [bert]
Language: de
Size: 409.8 MB
Case sensitive: true

References

  • https://huggingface.co/fabianrausch/german-financial-statements-bert