Legal Agreement Document Binary Classifier (Bert Sentence Embeddings)

Description

The legclf_agreement_bert model is a Bert Sentence Embeddings Document Classifier used to classify if the document belongs to the class agreement or not (Binary Classification).

Unlike the Longformer model, this model is lighter in terms of inference time.

Predicted Entities

agreement, other

Copy S3 URI

How to use


document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

embeddings = nlp.BertSentenceEmbeddings.pretrained("sent_bert_base_cased", "en")\
    .setInputCols("document")\
    .setOutputCol("sentence_embeddings")

doc_classifier = legal.ClassifierDLModel.pretrained("legclf_agreement_bert", "en", "legal/models")\
    .setInputCols(["sentence_embeddings"])\
    .setOutputCol("category")

nlpPipeline = nlp.Pipeline(stages=[
    document_assembler, 
    embeddings,
    doc_classifier])

df = spark.createDataFrame([["YOUR TEXT HERE"]]).toDF("text")

model = nlpPipeline.fit(df)

result = model.transform(df)

Results


+-------+
|result|
+-------+
|[agreement]|
|[other]|
|[other]|
|[agreement]|

Model Information

Model Name: legclf_agreement_bert
Compatibility: Legal NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence_embeddings]
Output Labels: [class]
Language: en
Size: 22.9 MB

References

Legal documents, scrapped from the Internet, and classified in-house + SEC documents

Benchmarking


        label    precision    recall    f1-score    support 
    agreement         0.75      0.71        0.73         90 
        other         0.88      0.90        0.89        204 
     accuracy            -         -        0.84        294 
    macro-avg         0.81      0.80        0.81        294 
 weighted-avg         0.84      0.84        0.84        294