Legal Indemnification Relation Extraction (sm, Bidirectional)

Description

IMPORTANT: Don’t run this model on the whole legal agreement. Instead:

  • Split by paragraphs. You can use notebook 1 in Finance or Legal as inspiration;
  • Use the legclf_indemnification_clause Text Classifier to select only these paragraphs;

This is a Relation Extraction model to group the different entities extracted with the Indemnification NER model (see legner_bert_indemnifications in Models Hub). This model is a sm model without meaningful directions in the relations (the model was not trained to understand if the direction of the relation is from left to right or right to left).

There are bigger models in Models Hub trained also with directed relationships.

Predicted Entities

is_indemnification_subject, is_indemnification_object, is_indemnification_indobject

Live Demo Copy S3 URI

How to use

documentAssembler = nlp.DocumentAssembler()\
        .setInputCol("text")\
        .setOutputCol("document")

sentencizer = nlp.SentenceDetectorDLModel\
        .pretrained("sentence_detector_dl", "en") \
        .setInputCols(["document"])\
        .setOutputCol("sentence")
                      
tokenizer = nlp.Tokenizer()\
        .setInputCols(["sentence"])\
        .setOutputCol("token")

tokenClassifier = legal.BertForTokenClassification.pretrained("legner_bert_indemnifications", "en", "legal/models")\
  .setInputCols("token", "sentence")\
  .setOutputCol("label")\
  .setCaseSensitive(True)

ner_converter = nlp.NerConverter()\
    .setInputCols(["sentence","token","label"])\
    .setOutputCol("ner_chunk")

# ONLY NEEDED IF YOU WANT TO FILTER RELATION PAIRS OR SYNTACTIC DISTANCE
# =================
pos_tagger = nlp.PerceptronModel()\
    .pretrained() \
    .setInputCols(["sentence", "token"])\
    .setOutputCol("pos_tags")

dependency_parser = nlp.DependencyParserModel() \
    .pretrained("dependency_conllu", "en") \
    .setInputCols(["sentence", "pos_tags", "token"]) \
    .setOutputCol("dependencies")

#Set a filter on pairs of named entities which will be treated as relation candidates
re_filter = legal.RENerChunksFilter()\
    .setInputCols(["ner_chunk", "dependencies"])\
    .setOutputCol("re_ner_chunks")\
    .setMaxSyntacticDistance(20)\
    .setRelationPairs(['INDEMNIFICATION_SUBJECT-INDEMNIFICATION_ACTION', 'INDEMNIFICATION_SUBJECT-INDEMNIFICATION_INDIRECT_OBJECT', 'INDEMNIFICATION_ACTION-INDEMNIFICATION', 'INDEMNIFICATION_ACTION-INDEMNIFICATION_INDIRECT_OBJECT'])
# =================

reDL = legal.RelationExtractionDLModel()\
    .pretrained("legre_indemnifications", "en", "legal/models")\
    .setPredictionThreshold(0.5)\
    .setInputCols(["re_ner_chunks", "sentence"])\
    .setOutputCol("relations")

nlpPipeline = nlp.Pipeline(stages=[
        documentAssembler,
        sentencizer,
        tokenizer,
        tokenClassifier,
        ner_converter,
        pos_tagger,
        dependency_parser,
        re_filter,
        reDL])

empty_data = spark.createDataFrame([[""]]).toDF("text")

model = nlpPipeline.fit(empty_data)

text='''The Company shall indemnify and hold harmless HOC against any losses, claims, damages or liabilities to which it may become subject under the 1933 Act or otherwise, insofar as such losses, claims, damages or liabilities (or actions in respect thereof) arise out of or are based upon '''

data = spark.createDataFrame([[text]]).toDF("text")
model = nlpPipeline.fit(data)
lmodel = LightPipeline(model)
res = lmodel.annotate(text)

Results

relation	entity1	entity1_begin	entity1_end	chunk1	entity2	entity2_begin	entity2_end	chunk2	confidence
1	is_indemnification_subject	INDEMNIFICATION_SUBJECT	4	10	Company	INDEMNIFICATION_ACTION	32	44	hold harmless	0.8847967
2	is_indemnification_indobject	INDEMNIFICATION_SUBJECT	4	10	Company	INDEMNIFICATION_INDIRECT_OBJECT	46	48	HOC	0.96191925
3	is_indemnification_indobject	INDEMNIFICATION_ACTION	12	26	shall indemnify	INDEMNIFICATION_INDIRECT_OBJECT	46	48	HOC	0.7332646
10	is_indemnification_object	INDEMNIFICATION_ACTION	32	44	hold harmless	INDEMNIFICATION	70	75	claims	0.9728908
11	is_indemnification_object	INDEMNIFICATION_ACTION	32	44	hold harmless	INDEMNIFICATION	78	84	damages	0.9727499
12	is_indemnification_object	INDEMNIFICATION_ACTION	32	44	hold harmless	INDEMNIFICATION	89	99	liabilities	0.964168

Model Information

Model Name: legre_indemnifications
Compatibility: Legal NLP 1.0.0+
License: Licensed
Edition: Official
Language: en
Size: 405.9 MB

References

In-house annotated examples from CUAD legal dataset

Benchmarking

                       label    Recall Precision        F1   Support
is_indemnification_indobject     0.966     1.000     0.982        29
is_indemnification_object        0.929     0.929     0.929        42
is_indemnification_subject       0.931     0.931     0.931        29
no_rel                           0.950     0.941     0.945       100
Avg.                             0.944     0.950     0.947        -
Weighted-Avg.                    0.945     0.945     0.945        -