Description
IMPORTANT: Don’t run this model on the whole legal agreement. Instead:
- Split by paragraphs. You can use notebook 1 in Finance or Legal as inspiration;
- Use the
legclf_warranty_clause
Text Classifier to select only these paragraphs;
This is a Legal Named Entity Recognition Model to identify the Subject (who), Action (what), Object(the indemnification) and Indirect Object (to whom) from Warranty clauses.
This is a md
(medium version) of the classifier, trained with more data and being more resistent to false positives outside the specific section, which may help to run it at whole document level (although not recommended).
Predicted Entities
WARRANTY
, WARRANTY_ACTION
, WARRANTY_SUBJECT
, WARRANTY_INDIRECT_OBJECT
How to use
documentAssembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentenceDetector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl","xx")\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = nlp.Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_embeddings_legal_roberta_base","en") \
.setInputCols(["sentence", "token"]) \
.setOutputCol("embeddings")
ner_model = legal.NerModel.pretrained('legner_warranty_md', 'en', 'legal/models')\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")
ner_converter = nlp.NerConverter()\
.setInputCols(["sentence","token","ner"])\
.setOutputCol("ner_chunk")
nlpPipeline = nlp.Pipeline(stages=[documentAssembler,sentenceDetector,tokenizer,embeddings,ner_model,ner_converter])
data = spark.createDataFrame([["""8 . Representations and Warranties SONY hereby makes the following representations and warranties to PURCHASER , each of which shall be true and correct as of the date hereof and as of the Closing Date , and shall be unaffected by any investigation heretofore or hereafter made : 8.1 Power and Authority SONY has the right and power to enter into this IP Agreement and to transfer the Transferred Patents and to grant the license set forth in Section 3.1 ."""]]).toDF("text")
result = nlpPipeline.fit(data).transform(data)
Results
+--------------------------------------------------------------------------+------------------------+
|chunk |entity |
+--------------------------------------------------------------------------+------------------------+
|SONY |WARRANTY_SUBJECT |
|makes the following representations and warranties |WARRANTY_ACTION |
|PURCHASER |WARRANTY_INDIRECT_OBJECT|
|shall be true and correct as of the date hereof and as of the Closing Date|WARRANTY |
|shall be unaffected by any investigation |WARRANTY |
|SONY |WARRANTY_SUBJECT |
|has the right and power to enter into this IP Agreement |WARRANTY |
+--------------------------------------------------------------------------+------------------------+
Model Information
Model Name: | legner_warranty_md |
Compatibility: | Legal NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Size: | 16.1 MB |
References
In-house annotated examples from CUAD legal dataset
Benchmarking
label tp fp fn prec rec f1
I-WARRANTY_SUBJECT 23 9 19 0.71875 0.54761904 0.62162155
B-WARRANTY 111 36 34 0.75510204 0.76551723 0.760274
B-WARRANTY_SUBJECT 55 31 33 0.6395349 0.625 0.6321839
I-WARRANTY_INDIRECT_OBJECT 18 6 3 0.75 0.85714287 0.79999995
I-WARRANTY_ACTION 77 8 14 0.90588236 0.84615386 0.875
B-WARRANTY_ACTION 36 4 4 0.9 0.9 0.9
I-WARRANTY 1686 487 313 0.7758859 0.8434217 0.8082455
B-WARRANTY_INDIRECT_OBJECT 34 12 6 0.73913044 0.85 0.79069775
Macro-average 2040 593 426 0.7730357 0.7793569 0.7761834
Micro-average 2040 593 426 0.77478164 0.8272506 0.80015695