Extract Information from Termination Clauses (Md)

Description

IMPORTANT: Don’t run this model on the whole legal agreement. Instead:

  • Split by paragraphs. You can use notebook 1 in Finance or Legal as inspiration;
  • Use the legclf_termination_clause Text Classifier to select only these paragraphs;

This is a NER model which extracts information from Termination Clauses, like the subject (Who? Which party?) the action (verb) the object (What?) and the Indirect Object (to Whom?).

This is a md (medium version) of the classifier, trained with more data and being more resistent to false positives outside the specific section, which may help to run it at whole document level (although not recommended).

Predicted Entities

TERMINATION_SUBJECT, TERMINATION_ACTION, TERMINATION_OBJECT, TERMINATION_INDIRECT_OBJECT

Copy S3 URI

How to use

documentAssembler = nlp.DocumentAssembler() \
        .setInputCol("text")\
        .setOutputCol("document")
        
sentenceDetector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl","xx")\
        .setInputCols(["document"])\
        .setOutputCol("sentence")

tokenizer = nlp.Tokenizer()\
        .setInputCols(["sentence"])\
        .setOutputCol("token")

embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_embeddings_legal_roberta_base","en") \
    .setInputCols(["sentence", "token"]) \
    .setOutputCol("embeddings")

ner_model = legal.NerModel.pretrained('legner_termination_md','en','legal/models')\
        .setInputCols(["sentence", "token", "embeddings"])\
        .setOutputCol("ner")

ner_converter = nlp.NerConverter()\
        .setInputCols(["sentence","token","ner"])\
        .setOutputCol("ner_chunk")

nlpPipeline = nlp.Pipeline(stages=[
        documentAssembler,
        sentenceDetector,
        tokenizer,
        embeddings,
        ner_model,
        ner_converter])

text = "(b) Either Party may terminate this Agreement"
data = spark.createDataFrame([[test]]).toDF("text")
model = nlpPipeline.fit(data)

Results

+-----------+---------------------+
|      token|            ner_label|
+-----------+---------------------+
|          (|                    O|
|          b|                    O|
|          )|                    O|
|     Either|B-TERMINATION_SUBJECT|
|      Party|I-TERMINATION_SUBJECT|
|        may| B-TERMINATION_ACTION|
|  terminate| I-TERMINATION_ACTION|
|       this|                    O|
|  Agreement|                    O|
+-----------+---------------------+

Model Information

Model Name: legner_termination_md
Compatibility: Legal NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en
Size: 16.1 MB

References

In-house annotations of CUAD dataset.

Benchmarking

label	 tp	 fp	 fn	 prec	 rec	 f1
I-TERMINATION_INDIRECT_OBJECT	 4	 0	 6	 1.0	 0.4	 0.5714286
B-TERMINATION_INDIRECT_OBJECT	 3	 1	 4	 0.75	 0.42857143	 0.5454545
B-TERMINATION_OBJECT	 38	 22	 36	 0.6333333	 0.5135135	 0.5671642
I-TERMINATION_ACTION	 85	 27	 5	 0.7589286	 0.9444444	 0.8415842
I-TERMINATION_OBJECT	 294	 172	 294	 0.6309013	 0.5	 0.55787474
B-TERMINATION_SUBJECT	 37	 10	 8	 0.78723407	 0.82222223	 0.8043478
I-TERMINATION_SUBJECT	 26	 8	 7	 0.7647059	 0.7878788	 0.7761194
B-TERMINATION_ACTION	 36	 7	 5	 0.8372093	 0.8780488	 0.8571428
Macro-average	 523 247 365 0.770289 0.6593349 0.7105064
Micro-average	 523 247 365 0.6792208 0.588964 0.6308806