Description
IMPORTANT: Don’t run this model on the whole legal agreement. Instead:
- Split by paragraphs. You can use notebook 1 in Finance or Legal as inspiration;
- Use the legclf_warranty_clauseText Classifier to select only these paragraphs;
This is a Legal Named Entity Recognition Model to identify the Subject (who), Action (what), Object(the indemnification) and Indirect Object (to whom) from Warranty clauses.
This is a md (medium version) of the classifier, trained with more data and being more resistent to false positives outside the specific section, which may help to run it at whole document level (although not recommended).
Predicted Entities
WARRANTY, WARRANTY_ACTION, WARRANTY_SUBJECT, WARRANTY_INDIRECT_OBJECT
How to use
documentAssembler = nlp.DocumentAssembler()\
        .setInputCol("text")\
        .setOutputCol("document")
        
sentenceDetector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl","xx")\
        .setInputCols(["document"])\
        .setOutputCol("sentence")
tokenizer = nlp.Tokenizer()\
        .setInputCols(["sentence"])\
        .setOutputCol("token")
embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_embeddings_legal_roberta_base","en") \
    .setInputCols(["sentence", "token"]) \
    .setOutputCol("embeddings")
ner_model = legal.NerModel.pretrained('legner_warranty_md', 'en', 'legal/models')\
        .setInputCols(["sentence", "token", "embeddings"])\
        .setOutputCol("ner")
ner_converter = nlp.NerConverter()\
        .setInputCols(["sentence","token","ner"])\
        .setOutputCol("ner_chunk")
nlpPipeline = nlp.Pipeline(stages=[documentAssembler,sentenceDetector,tokenizer,embeddings,ner_model,ner_converter])
data = spark.createDataFrame([["""8 . Representations and Warranties SONY hereby makes the following representations and warranties to PURCHASER , each of which shall be true and correct as of the date hereof and as of the Closing Date , and shall be unaffected by any investigation heretofore or hereafter made : 8.1 Power and Authority SONY has the right and power to enter into this IP Agreement and to transfer the Transferred Patents and to grant the license set forth in Section 3.1 ."""]]).toDF("text")
result = nlpPipeline.fit(data).transform(data)
Results
+--------------------------------------------------------------------------+------------------------+
|chunk                                                                     |entity                  |
+--------------------------------------------------------------------------+------------------------+
|SONY                                                                      |WARRANTY_SUBJECT        |
|makes the following representations and warranties                        |WARRANTY_ACTION         |
|PURCHASER                                                                 |WARRANTY_INDIRECT_OBJECT|
|shall be true and correct as of the date hereof and as of the Closing Date|WARRANTY                |
|shall be unaffected by any investigation                                  |WARRANTY                |
|SONY                                                                      |WARRANTY_SUBJECT        |
|has the right and power to enter into this IP Agreement                   |WARRANTY                |
+--------------------------------------------------------------------------+------------------------+
Model Information
| Model Name: | legner_warranty_md | 
| Compatibility: | Legal NLP 1.0.0+ | 
| License: | Licensed | 
| Edition: | Official | 
| Input Labels: | [sentence, token, embeddings] | 
| Output Labels: | [ner] | 
| Language: | en | 
| Size: | 16.1 MB | 
References
In-house annotated examples from CUAD legal dataset
Benchmarking
label	 tp	 fp	 fn	 prec	 rec	 f1
I-WARRANTY_SUBJECT	 23	 9	 19	 0.71875	 0.54761904	 0.62162155
B-WARRANTY	 111	 36	 34	 0.75510204	 0.76551723	 0.760274
B-WARRANTY_SUBJECT	 55	 31	 33	 0.6395349	 0.625	 0.6321839
I-WARRANTY_INDIRECT_OBJECT	 18	 6	 3	 0.75	 0.85714287	 0.79999995
I-WARRANTY_ACTION	 77	 8	 14	 0.90588236	 0.84615386	 0.875
B-WARRANTY_ACTION	 36	 4	 4	 0.9	 0.9	 0.9
I-WARRANTY	 1686	 487	 313	 0.7758859	 0.8434217	 0.8082455
B-WARRANTY_INDIRECT_OBJECT	 34	 12	 6	 0.73913044	 0.85	 0.79069775
Macro-average	 2040 593 426 0.7730357 0.7793569 0.7761834
Micro-average	 2040 593 426 0.77478164 0.8272506 0.80015695