Description
This models is a version of legmulticlf_mnda_sections_other
(sentence, medium) but expecting a bigger-than-sentence context, ideally between 2 and 4-5 sentences, or a small paragraph, to provide with more context.
It should be run on sentences of the NDA clauses, and will retrieve a series of 1..N labels for each of them. The possible clause types detected my this model in NDA / MNDA aggrements are:
- Parties to the Agreement - Names of the Parties Clause
- Identification of What Information Is Confidential - Definition of Confidential Information Clause
- Use of Confidential Information: Permitted Use Clause and Obligations of the Recipient
- Time Frame of the Agreement - Termination Clause
- Return of Confidential Information Clause
- Remedies for Breaches of Agreement - Remedies Clause
- Non-Solicitation Clause
- Dispute Resolution Clause
- Exceptions Clause
- Non-competition clause
- Other: Nothing of the above (synonym to
[]
)-
Predicted Entities
APPLIC_LAW
, ASSIGNMENT
, DEF_OF_CONF_INFO
, DISPUTE_RESOL
, EXCEPTIONS
, NAMES_OF_PARTIES
, NON_COMP
, NON_SOLIC
, PREAMBLE
, REMEDIES
, REQ_DISCL
, RETURN_OF_CONF_INFO
, TERMINATION
, USE_OF_CONF_INFO
, OTHER
How to use
document = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")\
.setCleanupMode("shrink")
embeddings = (
nlp.E5Embeddings.pretrained(
"legembedding_e5_base", "en", "legal/models")
.setInputCols(["document"])
.setOutputCol("sentence_embeddings")
)
paragraph_classifier = (
nlp.MultiClassifierDLModel.load("legmulticlf_mnda_sections_paragraph_other_le", "en", "legal/models")
.setInputCols(["sentence_embeddings"])
.setOutputCol("class")
)
sentence_pipeline = nlp.Pipeline(
stages=[document,
embeddings,
paragraph_classifier])
df = spark.createDataFrame([["'Destruction of Confidential Information. \xa0 Promptly (and in any event within five days) after the earlier of"]]).toDF("text")
model = sentence_pipeline.fit(df)
result = model.transform(df)
result.select("text", "class.result").show(truncate=False)
Results
+-------------------------------------------------------------------------------------------------------------+---------------------+
|text |result |
+-------------------------------------------------------------------------------------------------------------+---------------------+
|'Destruction of Confidential Information. Promptly (and in any event within five days) after the earlier of|[RETURN_OF_CONF_INFO]|
+-------------------------------------------------------------------------------------------------------------+---------------------+
Model Information
Model Name: | legmulticlf_mnda_sections_paragraph_other_le |
Compatibility: | Legal NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence_embeddings] |
Output Labels: | [class] |
Language: | en |
Size: | 14.0 MB |
References
In-house MNDA
Benchmarking
| | precision | recall | f1-score | support |
|---------------|-----------|--------|----------|---------|
| APPLIC_LAW | 0.91 | 0.91 | 0.91 | 58 |
| ASSIGNMENT | 0.96 | 0.87 | 0.91 | 52 |
| DEF_OF_CONF_INFO | 0.91 | 0.83 | 0.87 | 89 |
| DISPUTE_RESOL | 0.90 | 0.72 | 0.80 | 64 |
| EXCEPTIONS | 0.97 | 0.92 | 0.95 | 144 |
| NAMES_OF_PARTIES | 0.95 | 0.85 | 0.89 | 84 |
| NON_COMP | 0.80 | 0.80 | 0.80 | 25 |
| NON_SOLIC | 0.92 | 0.80 | 0.86 | 60 |
| PREAMBLE | 0.79 | 0.82 | 0.80 | 186 |
| REMEDIES | 0.91 | 0.79 | 0.85 | 76 |
| REQ_DISCL | 0.88 | 0.86 | 0.87 | 73 |
| RETURN_OF_CONF_INFO | 0.91 | 0.89 | 0.90 | 83 |
| TERMINATION | 0.98 | 0.86 | 0.92 | 96 |
| USE_OF_CONF_INFO | 0.85 | 0.85 | 0.85 | 47 |
| OTHER | 0.86 | 0.77 | 0.81 | 87 |
| **micro avg** | **0.90** | **0.84** | **0.87** | **1224**|
| **macro avg** | **0.90** | **0.84** | **0.87** | **1224**|
| **weighted avg** | **0.90** | **0.84** | **0.87** | **1224**|
| **samples avg** | **0.85** | **0.85** | **0.85** | **1224**|