Multilabel Classification of NDA Clauses (sentences, small)

Description

This models should be run on each sentence of the NDA clauses, and will retrieve a series of 1..N labels for each of them. The possible clause types detected my this model in NDA / MNDA aggrements are:

Parties to the Agreement - Names of the Parties Clause
Identification of What Information Is Confidential - Definition of Confidential Information Clause
Use of Confidential Information: Permitted Use Clause and Obligations of the Recipient
Time Frame of the Agreement - Termination Clause
Return of Confidential Information Clause
Remedies for Breaches of Agreement - Remedies Clause
Non-Solicitation Clause
Dispute Resolution Clause
Exceptions Clause
Non-competition clause

Predicted Entities

APPLIC_LAW, ASSIGNMENT, DEF_OF_CONF_INFO, DISPUTE_RESOL, EXCEPTIONS, NAMES_OF_PARTIES, NON_COMP, NON_SOLIC, PREAMBLE, REMEDIES, REQ_DISCL, RETURN_OF_CONF_INFO, TERMINATION, USE_OF_CONF_INFO

Download Copy S3 URI

How to use

document_assembler = (
    nlp.DocumentAssembler().setInputCol("text").setOutputCol("document")
)

sentence_splitter = (
    nlp.SentenceDetector()
    .setInputCols(["document"])
    .setOutputCol("sentence")
    .setCustomBounds(["\n"])
)

embeddings = (
    nlp.UniversalSentenceEncoder.pretrained()
    .setInputCols("sentence")
    .setOutputCol("sentence_embeddings")
)

classsifierdl_pred = nlp.MultiClassifierDLModel.pretrained('legmulticlf_mnda_sections', 'en', 'legal/models')\
    .setInputCols(["sentence_embeddings"])\
    .setOutputCol("class")

clf_pipeline = nlp.Pipeline(stages=[document_assembler, sentence_splitter, embeddings, classsifierdl_pred])

df = spark.createDataFrame([["Governing Law.\nThis Agreement shall be govern..."]]).toDF("text")

res = clf_pipeline.fit(df).transform(df)

res.select('text', 'class.result').show()

res.select('class.result')

Results

[APPLIC_LAW]	Governing Law.\nThis Agreement shall be govern...

Model Information

Model Name:	legmulticlf_mnda_sections
Compatibility:	Legal NLP 1.0.0+
License:	Licensed
Edition:	Official
Input Labels:	[sentence_embeddings]
Output Labels:	[class]
Language:	en
Size:	12.9 MB

References

In-house MNDA

Benchmarking

              label    precision    recall  f1-score   support
         APPLIC_LAW       0.93      0.96      0.95        28
         ASSIGNMENT       0.95      0.91      0.93        22
   DEF_OF_CONF_INFO       0.92      0.80      0.86        30
      DISPUTE_RESOL       0.76      0.89      0.82        28
         EXCEPTIONS       0.77      0.91      0.83        11
   NAMES_OF_PARTIES       0.94      0.88      0.91        33
           NON_COMP       1.00      0.91      0.95        23
          NON_SOLIC       0.88      0.94      0.91        16
           PREAMBLE       0.79      0.85      0.81        26
           REMEDIES       0.91      0.91      0.91        32
          REQ_DISCL       0.92      0.92      0.92        13
RETURN_OF_CONF_INFO       1.00      0.96      0.98        24
        TERMINATION       1.00      0.77      0.87        13
   USE_OF_CONF_INFO       0.85      0.88      0.86        32
          micro-avg       0.89      0.89      0.89       331
          macro-avg       0.90      0.89      0.89       331
       weighted-avg       0.90      0.89      0.89       331
        samples-avg       0.87      0.89      0.88       331

PREVIOUSDetect Clinical Entities (ner_eu_clinical_case - eu)

NEXTCategorize Chat Messages from Customer Service