MNDA / NDA Agreement Document Classifier (Bert Sentence Embeddings)

Description

The legclf_nda_agreements model is a Bert Sentence Embeddings Document Classifier used to classify if the document belongs to the class nda or not (Binary Classification).

Unlike the Longformer model, this model is lighter in terms of inference time.

Predicted Entities

nda, other

Live Demo Copy S3 URI

How to use


document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")
  
embeddings = nlp.BertSentenceEmbeddings.pretrained("sent_bert_base_cased", "en")\
    .setInputCols("document")\
    .setOutputCol("sentence_embeddings")
    
doc_classifier = legal.ClassifierDLModel.pretrained("legclf_nda_agreements", "en", "legal/models")\
    .setInputCols(["sentence_embeddings"])\
    .setOutputCol("category")
    
nlpPipeline = nlp.Pipeline(stages=[
    document_assembler, 
    embeddings,
    doc_classifier])
 
df = spark.createDataFrame([["YOUR TEXT HERE"]]).toDF("text")

model = nlpPipeline.fit(df)

result = model.transform(df)

Results

+-------+
|result|
+-------+
|[nda]|
|[other]|
|[other]|
|[nda]|

Model Information

Model Name: legclf_nda_agreements
Compatibility: Legal NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence_embeddings]
Output Labels: [category]
Language: en
Size: 22.9 MB

References

Legal documents, scrapped from the Internet, and classified in-house + SEC documents

Benchmarking

       label   precision    recall  f1-score   support
         nda       0.93      0.96      0.95       135
       other       0.97      0.95      0.96       181
    accuracy        -          -         0.95       316
   macro-avg       0.95      0.95      0.95       316
weighted-avg       0.95      0.95      0.95       316