Legal NER - Whereas Clauses (Md)

Description

IMPORTANT: Don’t run this model on the whole legal agreement. Instead:

  • Split by paragraphs. You can use notebook 1 in Finance or Legal as inspiration;
  • Use the legclf_cuad_whereas_clause Text Classifier to select only these paragraphs;

This is a Legal NER Model, able to process WHEREAS clauses, to detect the SUBJECT (Who?), the ACTION, the OBJECT (what?) and, in some cases, the INDIRECT OBJECT (to whom?) of the clause.

This is a md (medium version) of the classifier, trained with more data and being more resistent to false positives outside the specific section, which may help to run it at whole document level (although not recommended).

Predicted Entities

WHEREAS_SUBJECT, WHEREAS_OBJECT, WHEREAS_ACTION

Live Demo Copy S3 URI

How to use

documentAssembler = nlp.DocumentAssembler()\
        .setInputCol("text")\
        .setOutputCol("document")
        
sentenceDetector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl","xx")\
        .setInputCols(["document"])\
        .setOutputCol("sentence")

tokenizer = nlp.Tokenizer()\
        .setInputCols(["sentence"])\
        .setOutputCol("token")

embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_embeddings_legal_roberta_base","en") \
    .setInputCols(["sentence", "token"]) \
    .setOutputCol("embeddings")

ner_model = legal.NerModel.pretrained('legner_whereas_md', 'en', 'legal/models')\
        .setInputCols(["sentence", "token", "embeddings"])\
        .setOutputCol("ner")

ner_converter = nlp.NerConverter()\
        .setInputCols(["sentence","token","ner"])\
        .setOutputCol("ner_chunk")

nlpPipeline = nlp.Pipeline(stages=[
        documentAssembler,
        sentenceDetector,
        tokenizer,
        embeddings,
        ner_model,
        ner_converter])

empty_data = spark.createDataFrame([[""]]).toDF("text")

model = nlpPipeline.fit(empty_data)

text = ["""WHEREAS, Seller and Buyer have entered into that certain Stock Purchase Agreement, dated November 14, 2018 (the "Stock Purchase Agreement"); WHEREAS, pursuant to the Stock Purchase Agreement, Seller has agreed to sell and transfer, and Buyer has agreed to purchase and acquire, all of Seller's right, title and interest in and to Armstrong Wood Products, Inc., a Delaware corporation ("AWP") and its Subsidiaries, the Company and HomerWood Hardwood Flooring Company, a Delaware corporation ("HHFC," and together with the Company, the "Company Subsidiaries" and together with AWP, the "Company Entities" and each a "Company Entity") by way of a purchase by Buyer and sale by Seller of the Shares, all upon the terms and condition set forth therein;"""]

res = model.transform(spark.createDataFrame([text]).toDF("text"))

Results

+------------+-----------------+
|       token|        ner_label|
+------------+-----------------+
|     WHEREAS|                O|
|           ,|                O|
|      Seller|B-WHEREAS_SUBJECT|
|         and|                O|
|       Buyer|B-WHEREAS_SUBJECT|
|        have| B-WHEREAS_ACTION|
|     entered| I-WHEREAS_ACTION|
|        into| I-WHEREAS_ACTION|
|        that| B-WHEREAS_OBJECT|
|     certain| I-WHEREAS_OBJECT|
|       Stock| I-WHEREAS_OBJECT|
|    Purchase| I-WHEREAS_OBJECT|
|   Agreement| I-WHEREAS_OBJECT|
|           ,|                O|
|       dated|                O|
|    November|                O|
|          14|                O|
|           ,|                O|
|        2018|                O|
|           (|                O|
|         the|                O|
|           "|                O|
|       Stock|                O|
|    Purchase|                O|
|   Agreement|                O|
|         ");|                O|
|     WHEREAS|                O|
|           ,|                O|
|    pursuant|                O|
|          to|                O|
|         the|                O|
|       Stock|                O|
|    Purchase|                O|
|   Agreement|                O|
|           ,|                O|
|      Seller|B-WHEREAS_SUBJECT|
|         has| B-WHEREAS_ACTION|
|      agreed| I-WHEREAS_ACTION|
|          to| I-WHEREAS_ACTION|
|        sell| I-WHEREAS_ACTION|
|         and|                O|
|    transfer|                O|
|           ,|                O|
|         and|                O|
|       Buyer|B-WHEREAS_SUBJECT|
|         has| B-WHEREAS_ACTION|
|      agreed| I-WHEREAS_ACTION|
|          to| I-WHEREAS_ACTION|
|    purchase| I-WHEREAS_ACTION|
|         and|                O|
|     acquire|                O|
|           ,|                O|
|         all|                O|
|          of|                O|
|    Seller's|                O|
|       right|                O|
|           ,|                O|
|       title|                O|
|         and|                O|
|    interest|                O|
|          in|                O|
|         and|                O|
|          to|                O|
|   Armstrong|                O|
|        Wood|                O|
|    Products|                O|
|           ,|                O|
|         Inc|                O|
|          .,|                O|
|           a|                O|
|    Delaware|                O|
| corporation|                O|
|          ("|                O|
|         AWP|                O|
|          ")|                O|
|         and|                O|
|         its|                O|
|Subsidiaries|                O|
|           ,|                O|
|         the|                O|
|     Company|                O|
|         and|                O|
|   HomerWood|                O|
|    Hardwood|                O|
|    Flooring|                O|
|     Company|                O|
|           ,|                O|
|           a|                O|
|    Delaware|                O|
| corporation|                O|
|          ("|                O|
|        HHFC|                O|
|          ,"|                O|
+------------+-----------------+

Model Information

Model Name: legner_whereas_md
Compatibility: Legal NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en
Size: 16.1 MB

References

Manual annotations on CUAD dataset

Benchmarking

label	 tp	 fp	 fn	 prec	 rec	 f1
B-WHEREAS_SUBJECT	 95	 12	 5	 0.88785046	 0.95	 0.9178744
I-WHEREAS_ACTION	 112	 36	 13	 0.7567568	 0.896	 0.82051283
I-WHEREAS_SUBJECT	 31	 6	 6	 0.8378378	 0.8378378	 0.8378378
B-WHEREAS_OBJECT	 59	 33	 30	 0.6413044	 0.66292137	 0.6519337
B-WHEREAS_ACTION	 87	 12	 3	 0.8787879	 0.96666664	 0.9206349
I-WHEREAS_OBJECT	 221	 108	 65	 0.67173254	 0.77272725	 0.71869916
Macro-average	 605 207 122 0.77904505 0.8476922 0.81192017
Micro-average	 605 207 122 0.7450739 0.83218706 0.78622484