Legal NER - Whereas Clauses (sm)

Description

IMPORTANT: Don’t run this model on the whole legal agreement. Instead:

  • Split by paragraphs. You can use notebook 1 in Finance or Legal as inspiration;
  • Use the legclf_cuad_whereas_clause Text Classifier to select only these paragraphs;

This is a Legal NER Model, able to process WHEREAS clauses, to detect the SUBJECT (Who?), the ACTION, the OBJECT (what?) and, in some cases, the INDIRECT OBJECT (to whom?) of the clause.

Predicted Entities

WHEREAS_SUBJECT, WHEREAS_OBJECT, WHEREAS_ACTION

Live Demo Copy S3 URI

How to use

documentAssembler = nlp.DocumentAssembler()\
        .setInputCol("text")\
        .setOutputCol("document")
        
sentenceDetector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl","xx")\
        .setInputCols(["document"])\
        .setOutputCol("sentence")

tokenizer = nlp.Tokenizer()\
        .setInputCols(["sentence"])\
        .setOutputCol("token")

embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_embeddings_legal_roberta_base","en") \
    .setInputCols(["sentence", "token"]) \
    .setOutputCol("embeddings")

ner_model = legal.NerModel.pretrained('legner_whereas', 'en', 'legal/models')\
        .setInputCols(["sentence", "token", "embeddings"])\
        .setOutputCol("ner")

ner_converter = nlp.NerConverter()\
        .setInputCols(["sentence","token","ner"])\
        .setOutputCol("ner_chunk")

nlpPipeline = nlp.Pipeline(stages=[
        documentAssembler,
        sentenceDetector,
        tokenizer,
        embeddings,
        ner_model,
        ner_converter])

empty_data = spark.createDataFrame([[""]]).toDF("text")

model = nlpPipeline.fit(empty_data)

text = ["""WHEREAS, Seller and Buyer have entered into that certain Stock Purchase Agreement, dated November 14, 2018 (the "Stock Purchase Agreement"); WHEREAS, pursuant to the Stock Purchase Agreement, Seller has agreed to sell and transfer, and Buyer has agreed to purchase and acquire, all of Seller's right, title and interest in and to Armstrong Wood Products, Inc., a Delaware corporation ("AWP") and its Subsidiaries, the Company and HomerWood Hardwood Flooring Company, a Delaware corporation ("HHFC," and together with the Company, the "Company Subsidiaries" and together with AWP, the "Company Entities" and each a "Company Entity") by way of a purchase by Buyer and sale by Seller of the Shares, all upon the terms and condition set forth therein;"""]

res = model.transform(spark.createDataFrame([text]).toDF("text"))

Results

+------------+-----------------+
|       token|        ner_label|
+------------+-----------------+
|     WHEREAS|                O|
|           ,|                O|
|      Seller|B-WHEREAS_SUBJECT|
|         and|                O|
|       Buyer|B-WHEREAS_SUBJECT|
|        have| B-WHEREAS_ACTION|
|     entered| I-WHEREAS_ACTION|
|        into| I-WHEREAS_ACTION|
|        that| B-WHEREAS_OBJECT|
|     certain| I-WHEREAS_OBJECT|
|       Stock| I-WHEREAS_OBJECT|
|    Purchase| I-WHEREAS_OBJECT|
|   Agreement| I-WHEREAS_OBJECT|
|           ,|                O|
|       dated|                O|
|    November|                O|
|          14|                O|
|           ,|                O|
|        2018|                O|
|           (|                O|
|         the|                O|
|           "|                O|
|       Stock|                O|
|    Purchase|                O|
|   Agreement|                O|
|         ");|                O|
|     WHEREAS|                O|
|           ,|                O|
|    pursuant|                O|
|          to|                O|
|         the|                O|
|       Stock|                O|
|    Purchase|                O|
|   Agreement|                O|
|           ,|                O|
|      Seller|B-WHEREAS_SUBJECT|
|         has| B-WHEREAS_ACTION|
|      agreed| I-WHEREAS_ACTION|
|          to| I-WHEREAS_ACTION|
|        sell| I-WHEREAS_ACTION|
|         and|                O|
|    transfer|                O|
|           ,|                O|
|         and|                O|
|       Buyer|B-WHEREAS_SUBJECT|
|         has| B-WHEREAS_ACTION|
|      agreed| I-WHEREAS_ACTION|
|          to| I-WHEREAS_ACTION|
|    purchase| I-WHEREAS_ACTION|
|         and|                O|
|     acquire|                O|
|           ,|                O|
|         all|                O|
|          of|                O|
|    Seller's|                O|
|       right|                O|
|           ,|                O|
|       title|                O|
|         and|                O|
|    interest|                O|
|          in|                O|
|         and|                O|
|          to|                O|
|   Armstrong|                O|
|        Wood|                O|
|    Products|                O|
|           ,|                O|
|         Inc|                O|
|          .,|                O|
|           a|                O|
|    Delaware|                O|
| corporation|                O|
|          ("|                O|
|         AWP|                O|
|          ")|                O|
|         and|                O|
|         its|                O|
|Subsidiaries|                O|
|           ,|                O|
|         the|                O|
|     Company|                O|
|         and|                O|
|   HomerWood|                O|
|    Hardwood|                O|
|    Flooring|                O|
|     Company|                O|
|           ,|                O|
|           a|                O|
|    Delaware|                O|
| corporation|                O|
|          ("|                O|
|        HHFC|                O|
|          ,"|                O|
+------------+-----------------+

Model Information

Model Name: legner_whereas
Type: legal
Compatibility: Legal NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en
Size: 16.5 MB

References

Manual annotations on CUAD dataset

Benchmarking

label                tp      fp     fn      prec          rec            f1
B-WHEREAS_SUBJECT    191     14     15      0.9317073     0.92718446     0.9294404
I-WHEREAS_ACTION     202     38     59      0.84166664    0.77394634     0.8063872
I-WHEREAS_SUBJECT    52      8      16      0.8666667     0.7647059      0.8125
B-WHEREAS_OBJECT     101     63     68      0.61585367    0.5976331      0.6066066
B-WHEREAS_ACTION     152     19     16      0.8888889     0.9047619      0.89675516
I-WHEREAS_OBJECT     361     194    194     0.65045047    0.65045047     0.65045047
Macro-average	     1059    336    368     0.7992056     0.76978034     0.784217
Micro-average	     1059    336    368     0.7591398     0.74211633     0.75053155