Legal Relation Extraction (Whereas)

Description

IMPORTANT: Don’t run this model on the whole legal agreement. Instead:

  • Split by paragraphs. You can use notebook 1 in Finance or Legal as inspiration;
  • Use the legclf_cuad_wheras_clause Text Classifier to select only these paragraphs;

This is a Relation Extraction model to infer relations between elements in WHEREAS clauses, more specifically the SUBJECT, the ACTION and the OBJECT. There are two relations possible: has_subject and has_object. You can also use legpipe_whereas which includes this model and its NER and also depedency parsing, to carry out chunk extraction using grammatical features (the dependency tree). This model requires legner_whereas as an NER in the pipeline. It’s a md model with Unidirectional Relations, meaning that the model retrieves in chunk1 the left side of the relation (source), and in chunk2 the right side (target).

Predicted Entities

has_subject, has_object

Copy S3 URI

How to use


documentAssembler = nlp.DocumentAssembler()\
  .setInputCol("text")\
  .setOutputCol("document")

tokenizer = nlp.Tokenizer()\
  .setInputCols("document")\
  .setOutputCol("token")

embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_embeddings_legal_roberta_base","en") \
    .setInputCols(["document", "token"]) \
    .setOutputCol("embeddings")

ner_model = legal.NerModel.pretrained('legner_whereas', 'en', 'legal/models')\
        .setInputCols(["document", "token", "embeddings"])\
        .setOutputCol("ner")

ner_converter = nlp.NerConverter()\
        .setInputCols(["document","token","ner"])\
        .setOutputCol("ner_chunk")

reDL = legal.RelationExtractionDLModel\
    .pretrained("legre_whereas_md", "en", "legal/models")\
    .setPredictionThreshold(0.9)\
    .setInputCols(["ner_chunk", "document"])\
    .setOutputCol("relations")
    
pipeline = nlp.Pipeline(stages=[
    documentAssembler,
    tokenizer,
    embeddings,
    ner_model,
    ner_converter,
    reDL
])

empty_df = spark.createDataFrame([[""]]).toDF("text")

model = pipeline.fit(empty_df)

text = """
Central Expressway, Suite 200, Dallas, TX 75080.

Background

The Supplier wishes to appoint the Distributor as its non-exclusive distributor for the promotion and sale of the Products within the Territory (both as defined below), and the Distributor wishes to promote and sell the Products within the Territory on the terms of this agreement.

Agreed terms

1. """

data = spark.createDataFrame([[text]]).toDF("text")
model = pipeline.fit(data)
res = model.transform(data)

Results

+-----------+---------------+-------------+-----------+------------------------------------------------+---------------+-------------+-----------+------------------------------------------------+----------+
|relation   |entity1        |entity1_begin|entity1_end|chunk1                                          |entity2        |entity2_begin|entity2_end|chunk2                                          |confidence|
+-----------+---------------+-------------+-----------+------------------------------------------------+---------------+-------------+-----------+------------------------------------------------+----------+
|has_subject|WHEREAS_ACTION |76           |92         |wishes to appoint                               |WHEREAS_SUBJECT|63           |74         |The Supplier                                    |0.9994367 |
|has_subject|WHEREAS_OBJECT |94           |141        |the Distributor as its non-exclusive distributor|WHEREAS_SUBJECT|63           |74         |The Supplier                                    |0.92683166|
|has_subject|WHEREAS_SUBJECT|236          |250        |the Distributor                                 |WHEREAS_SUBJECT|63           |74         |The Supplier                                    |0.9829159 |
|has_object |WHEREAS_ACTION |76           |92         |wishes to appoint                               |WHEREAS_OBJECT |94           |141        |the Distributor as its non-exclusive distributor|0.900727  |
|has_object |WHEREAS_OBJECT |94           |141        |the Distributor as its non-exclusive distributor|WHEREAS_OBJECT |279          |290        |the Products                                    |0.96618503|
|has_subject|WHEREAS_ACTION |252          |268        |wishes to promote                               |WHEREAS_SUBJECT|236          |250        |the Distributor                                 |0.99969923|
+-----------+---------------+-------------+-----------+------------------------------------------------+---------------+-------------+-----------+------------------------------------------------+----------+

Model Information

Model Name: legre_whereas_md
Compatibility: Legal NLP 1.0.0+
License: Licensed
Edition: Official
Language: en
Size: 402.2 MB

References

Manual annotations on CUAD dataset

Benchmarking

       label    Recall Precision        F1   Support
  has_object     0.974     0.991     0.983       116
 has_subject     0.977     0.986     0.981       213
       other     0.993     0.978     0.985       271
         Avg     0.981     0.985     0.983       -
Weighted-Avg     0.983     0.983     0.983       -