Legal Relation Extraction (Parties, Alias, Dates, Document Type) (Lg, Unidirectional)

Description

IMPORTANT: Don’t run this model on the whole legal agreement. Instead:

  • Split by paragraphs. You can use notebook 1 in Finance or Legal as inspiration;
  • Use the legclf_introduction_clause Text Classifier to select only these paragraphs;
  • This is a Legal Relation Extraction model, which can be used after the NER Model for extracting Parties, Document Types, Effective Dates and Aliases, called legner_contract_doc_parties.

As an output, you will get the relations linking the different concepts together, if such relation exists. The list of relations is:

  • dated_as: A Document has an Effective Date
  • has_alias: The Alias of a Party all along the document
  • has_collective_alias: An Alias hold by several parties at the same time
  • signed_by: Between a Party and the document they signed

This is a lg model with Unidirectional Relations, meaning that the model retrieves in chunk1 the left side of the relation (source), and in chunk2 the right side (target).

Predicted Entities

dated_as, has_alias, has_collective_alias, signed_by

Copy S3 URI

How to use

document_assembler = nlp.DocumentAssembler()\
  .setInputCol("text")\
  .setOutputCol("document")

sen = nlp.SentenceDetector()\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = nlp.Tokenizer()\
    .setInputCols("sentence")\
    .setOutputCol("token")

embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_embeddings_legal_roberta_base", "en") \
    .setInputCols("sentence", "token")\
    .setOutputCol("embeddings")\
    .setMaxSentenceLength(512)

pos_tagger = nlp.PerceptronModel()\
    .pretrained("pos_clinical", "en", "clinical/models") \
    .setInputCols(["sentence", "token"])\
    .setOutputCol("pos_tags")
    
dependency_parser = nlp.DependencyParserModel()\
    .pretrained("dependency_conllu", "en")\
    .setInputCols(["sentence", "pos_tags", "token"])\
    .setOutputCol("dependencies")

ner_model = legal.NerModel.pretrained('legner_contract_doc_parties_lg', 'en', 'legal/models')\
    .setInputCols(["sentence", "token", "embeddings"])\
    .setOutputCol("ner")

ner_converter = nlp.NerConverter()\
    .setInputCols(["sentence","token","ner"])\
    .setOutputCol("ner_chunk")

re_ner_chunk_filter = legal.RENerChunksFilter() \
    .setInputCols(["ner_chunks", "dependencies"])\
    .setOutputCol("re_ner_chunks")\
    .setMaxSyntacticDistance(7)\
    .setRelationPairs(["DOC-EFFDATE", "DOC-PARTY", "PARTY-FORMER_NAME", "PARTY-ALIAS"])

re_model = legal.RelationExtractionDLModel().pretrained('legre_contract_doc_parties_lg', 'en', 'legal/models')\
    .setPredictionThreshold(0.5)\
    .setInputCols(["re_ner_chunks", "sentence"])\
    .setOutputCol("relations")

nlpPipeline = nlp.Pipeline(stages=[
    document_assembler,
    sen,
    tokenizer,
    embeddings,
    pos_tagger,
    dependency_parser,
    ner_model,
    ner_converter,
    re_ner_chunk_filter,
    re_model
    ])

empty_df = spark.createDataFrame([[""]]).toDF("text")

model = nlpPipeline.fit(empty_df)

text="""THIS Lease Agreement ,  is made and  entered  into this  _____day of May,  2006 by and between Apple, Inc.,  (hereinafter called "Landlord"),  and IMI Global,  Inc., with a mailing address of ___,  (hereinafter referred as "Tenant")."""

data = spark.createDataFrame([[text]]).toDF("text")

result = model.transform(data)

Results


+---------+-----------------+--------------------+-----------------+----------------+----------+------------------+
|relations|relations_entity1|    relations_chunk1|relations_entity2|relations_chunk2|confidence|syntactic_distance|
+---------+-----------------+--------------------+-----------------+----------------+----------+------------------+
| dated_as|              DOC|THIS Lease Agreement|          EFFDATE|   of May,  2006| 0.9999546|                 6|
|signed_by|              DOC|THIS Lease Agreement|            PARTY|      Apple, Inc|  0.988555|                 5|
|signed_by|              DOC|THIS Lease Agreement|            PARTY|IMI Global,  Inc| 0.9568861|                 7|
|has_alias|            PARTY|          Apple, Inc|            ALIAS|        Landlord|0.99999475|                 4|
|has_alias|            PARTY|    IMI Global,  Inc|            ALIAS|          Tenant| 0.9999893|                 4|
+---------+-----------------+--------------------+-----------------+----------------+----------+------------------+

Model Information

Model Name: legre_contract_doc_parties_lg
Compatibility: Legal NLP 1.0.0+
License: Licensed
Edition: Official
Language: en
Size: 406.0 MB

Benchmarking

              
Label                         Recall  Precision     F1   Support
dated_as                      1.000     1.000     1.000     19
has_alias                     1.000     1.000     1.000     29
has_collective_alias          1.000     1.000     1.000     25
signed_by                     1.000     1.000     1.000     47
Avg.                          1.000     1.000     1.000     -
Weighted-Avg.                 1.000     1.000     1.000     -