Legal Compliance With Laws Clause Binary Classifier (LEDGAR)

Description

LEDGAR dataset aims to contract provision (paragraph) classification. The contract provisions come from contracts obtained from the US Securities and Exchange Commission (SEC) filings, which are publicly available from EDGAR. Each label represents the single main topic (theme) of the corresponding contract provision.

This model is a Binary Classifier (True, False) for the Compliance_With_Laws clause type. To use this model, make sure you provide enough context as an input. Adding Sentence Splitters to the pipeline will make the model see only sentences, not the whole text, so it’s better to skip it, unless you want to do Binary Classification as sentence level.

If you have big legal documents, and you want to look for clauses, we recommend you to split the documents using any of the techniques available in our Legal NLP Workshop Tokenization & Splitting Tutorial (link here), namely:

  • Paragraph splitting (by multiline);
  • Splitting by headers / subheaders;
  • etc.

Take into consideration the embeddings of this model allows up to 512 tokens. If you have more than that, consider splitting in smaller pieces (you can also check the same tutorial link provided above).

This model can be combined with with any of the other “hundreds” of Legal Clauses Classifiers you will find in Models Hub, getting as an output a series of True/False values for each of the legal clause model you have added.

Predicted Entities

Compliance_With_Laws, Other

Copy S3 URI

How to use


document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

embeddings = nlp.BertSentenceEmbeddings.pretrained("sent_bert_base_cased", "en")\
    .setInputCols("document")\
    .setOutputCol("sentence_embeddings")

doc_classifier = legal.ClassifierDLModel.pretrained("legclf_compliance_with_laws_bert", "en", "legal/models")\
    .setInputCols(["sentence_embeddings"])\
    .setOutputCol("category")

nlpPipeline = nlp.Pipeline(stages=[
    document_assembler, 
    embeddings,
    doc_classifier])

df = spark.createDataFrame([["YOUR TEXT HERE"]]).toDF("text")

model = nlpPipeline.fit(df)

result = model.transform(df)

Results


+-------+
|result|
+-------+
|[Compliance_With_Laws]|
|[Other]|
|[Other]|
|[Compliance_With_Laws]|

Model Information

Model Name: legclf_compliance_with_laws_bert
Compatibility: Legal NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence_embeddings]
Output Labels: [class]
Language: en
Size: 22.8 MB

References

Train dataset available here

Benchmarking


               label precision recall  f1-score  support
Compliance_With_Laws      0.93   0.91      0.92      123
               Other      0.93   0.94      0.93      151
            accuracy         -      -      0.93      274
           macro-avg      0.93   0.93      0.93      274
        weighted-avg      0.93   0.93      0.93      274