Legal Indenture Document Binary Classifier (Bert Sentence Embeddings)

Description

The legclf_indenture_agreement_bert model is a Bert Sentence Embeddings Document Classifier used to classify if the document belongs to the class indenture or not (Binary Classification).

Unlike the Longformer model, this model is lighter in terms of inference time.

Predicted Entities

indenture, other

Download Copy S3 URI

How to use

document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

embeddings = nlp.BertSentenceEmbeddings.pretrained("sent_bert_base_cased", "en")\
    .setInputCols("document")\
    .setOutputCol("sentence_embeddings")

doc_classifier = legal.ClassifierDLModel.pretrained("legclf_indenture_agreement_bert", "en", "legal/models")\
    .setInputCols(["sentence_embeddings"])\
    .setOutputCol("category")

nlpPipeline = nlp.Pipeline(stages=[
    document_assembler, 
    embeddings,
    doc_classifier])

df = spark.createDataFrame([["YOUR TEXT HERE"]]).toDF("text")

model = nlpPipeline.fit(df)

result = model.transform(df)

Results

+-------+
|result|
+-------+
|[indenture]|
|[other]|
|[other]|
|[indenture]|

Model Information

Model Name:	legclf_indenture_agreement_bert
Compatibility:	Legal NLP 1.0.0+
License:	Licensed
Edition:	Official
Input Labels:	[sentence_embeddings]
Output Labels:	[class]
Language:	en
Size:	22.9 MB

References

Legal documents, scrapped from the Internet, and classified in-house + SEC documents

Benchmarking

        label    precision    recall    f1-score    support 
    indenture         0.96      0.93        0.94         97 
        other         0.97      0.98        0.97        204 
     accuracy            -         -        0.96        301 
    macro-avg         0.96      0.95        0.96        301 
 weighted-avg         0.96      0.96        0.96        301

PREVIOUSLegal General Provisions Clause Binary Classifier

NEXTLegal Indenture Document Binary Classifier (Longformer)