Legal BERT Sentence Base Uncased Embedding


LEGAL-BERT is a family of BERT models for the legal domain, intended to assist legal NLP research, computational law, and legal technology applications. To pre-train the different variations of LEGAL-BERT, we collected 12 GB of diverse English legal text from several fields (e.g., legislation, court cases, contracts) scraped from publicly available resources. Sub-domains variants (CONTRACTS-, EURLEX-, ECHR-) and/or general LEGAL-BERT perform better than using BERT out of the box for domain-specific tasks. A light-weight model (33% the size of BERT-BASE) pre-trained from scratch on legal data with competitive perfomance is also available.

Predicted Entities


How to use

sent_embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_legal", "en") \
      .setInputCols("sentence") \

nlp_pipeline = Pipeline(stages=[document_assembler, sentence_detector, sent_embeddings ])
val sent_embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_legal", "en")

val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, sent_embeddings ))

Model Information

Model Name: sent_bert_base_uncased_legal
Compatibility: Spark NLP 3.2.2+
License: Open Source
Edition: Official
Input Labels: [sentence]
Output Labels: [bert_sentence]
Language: en
Case sensitive: true

Data Source

The model is imported from: