Description
IMPORTANT: Don’t run this model on the whole legal agreement. Instead:
- Split by paragraphs. You can use notebook 1 in Finance or Legal as inspiration;
- Use the
legclf_cuad_obligations_clause
Text Classifier to select only these paragraphs;
This Name Entity Recognition model is aimed to extract what the different parties of an agreement commit to do. We call it “obligations”, but could also be called “commitments” or “agreemeents”.
This model extracts the subject (who commits to doing what), the action (the verb - will provide, shall sign…) and the object (what subject will provide, what subject shall sign, etc). Also, if the recipient of the obligation is a third party (a subject will provide to the Company X …), then that third party (Company X) will be extracted as an indirect object.
This model also has a Relation Extraction model which can be used to connect the entities together.
The object is usually very diverse (will provide with technology? documents? people? items? etc) and often times, very long clauses. For that, we include a more advanced way to extract objects, using Automatic Question Generation (what will [subject] [action]? Example - What will the Company provide?) and Question Answering (using that question and a context, we retrieve the answer from the text). Please, check the Question Answering notebook in the Spark NLP Workshop for more information about this approach.
Predicted Entities
OBLIGATION_SUBJECT
, OBLIGATION_ACTION
, OBLIGATION
, OBLIGATION_INDIRECT_OBJECT
How to use
from johnsnowlabs import *
documentAssembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sparktokenizer = nlp.Tokenizer()\
.setInputCols("document")\
.setOutputCol("token")
tokenClassifier = legal.BertForTokenClassification.pretrained("legner_obligations", "en", "legal/models")\
.setInputCols("token", "document")\
.setOutputCol("label")\
.setCaseSensitive(True)
pipeline = nlp.Pipeline(stages=[
documentAssembler,
sparktokenizer,
tokenClassifier
]
)
import pandas as pd
p_model = pipeline.fit(spark.createDataFrame(pd.DataFrame({'text': ['']})))
text = """The Buyer shall use such materials and supplies only in accordance with the present agreement"""
res = p_model.transform(spark.createDataFrame([[text]]).toDF("text"))
Results
+----------+--------------------+
| token| ner_label|
+----------+--------------------+
| The| O|
| Buyer|B-OBLIGATION_SUBJECT|
| shall| B-OBLIGATION_ACTION|
| use| I-OBLIGATION_ACTION|
| such| B-OBLIGATION|
| materials| I-OBLIGATION|
| and| I-OBLIGATION|
| supplies| I-OBLIGATION|
| only| I-OBLIGATION|
| in| I-OBLIGATION|
|accordance| I-OBLIGATION|
| with| I-OBLIGATION|
| the| I-OBLIGATION|
| present| I-OBLIGATION|
| agreement| I-OBLIGATION|
+----------+--------------------+
Model Information
Model Name: | legner_obligations |
Type: | legal |
Compatibility: | Legal NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token] |
Output Labels: | [ner] |
Language: | en |
Size: | 412.2 MB |
Case sensitive: | true |
Max sentence length: | 128 |
References
In-house annotated documents on CUAD dataset
Benchmarking
label precision recall f1-score support
B-OBLIGATION 0.61 0.44 0.51 93
B-OBLIGATION_ACTION 0.88 0.89 0.89 85
B-OBLIGATION_INDIRECT_OBJECT 0.69 0.71 0.70 34
B-OBLIGATION_SUBJECT 0.80 0.87 0.84 87
I-OBLIGATION 0.72 0.77 0.75 1251
I-OBLIGATION_ACTION 0.80 0.79 0.79 167
I-OBLIGATION_SUBJECT 0.75 0.43 0.55 14
O 0.87 0.84 0.85 2395
accuracy - - 0.81 4126
macro-avg 0.76 0.72 0.73 4126
weighted-avg 0.81 0.81 0.81 4126