Description
IMPORTANT: Don’t run this model on the whole legal agreement. Instead:
- Split by paragraphs. You can use notebook 1 in Finance or Legal as inspiration;
- Use the
legclf_cuad_confidentiality_clause
Text Classifier to select only these paragraphs;
This is a Legal Named Entity Recognition Model to identify the Subject (who), Action (web), Object(the indemnification) and Indirect Object (to whom) from Confidentiality clauses.
Predicted Entities
CONFIDENTIALITY
, CONFIDENTIALITY_ACTION
, CONFIDENTIALITY_INDIRECT_OBJECT
, CONFIDENTIALITY_SUBJECT
How to use
documentAssembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentenceDetector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl","xx")\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = nlp.Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_embeddings_legal_roberta_base","en") \
.setInputCols(["sentence", "token"]) \
.setOutputCol("embeddings")
ner_model = legal.NerModel.pretrained('legner_confidentiality', 'en', 'legal/models')\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")
ner_converter = nlp.NerConverter()\
.setInputCols(["sentence","token","ner"])\
.setOutputCol("ner_chunk")
nlpPipeline = nlp.Pipeline(stages=[documentAssembler,sentenceDetector,tokenizer,embeddings,ner_model,ner_converter])
data = spark.createDataFrame([["Each party will promptly return to the other upon request any Confidential Information of the other party then in its possession or under its control."]]).toDF("text")
result = nlpPipeline.fit(data).transform(data)
Results
+------------------------+-------------------------------+
|chunk |entity |
+------------------------+-------------------------------+
|Each party |CONFIDENTIALITY_SUBJECT |
|will promptly return |CONFIDENTIALITY_ACTION |
|other |CONFIDENTIALITY_INDIRECT_OBJECT|
|Confidential Information|CONFIDENTIALITY |
+------------------------+-------------------------------+
Model Information
Model Name: | legner_confidentiality |
Compatibility: | Legal NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Size: | 16.3 MB |
References
In-house annotated examples from CUAD legal dataset
Benchmarking
label precision recall f1-score support
B-CONFIDENTIALITY 0.9077 0.9219 0.9147 64
B-CONFIDENTIALITY_ACTION 1.0000 1.0000 1.0000 53
B-CONFIDENTIALITY_INDIRECT_OBJECT 0.9419 0.9529 0.9474 85
B-CONFIDENTIALITY_SUBJECT 0.9697 1.0000 0.9846 32
I-CONFIDENTIALITY 0.9302 0.9091 0.9195 88
I-CONFIDENTIALITY_ACTION 1.0000 0.9825 0.9912 57
I-CONFIDENTIALITY_INDIRECT_OBJECT 0.9744 0.8444 0.9048 45
I-CONFIDENTIALITY_SUBJECT 1.0000 1.0000 1.0000 25
O 0.9913 0.9950 0.9932 1604
accuracy - - 0.9839 2053
macro-avg 0.9683 0.9562 0.9617 2053
weighted-avg 0.9839 0.9839 0.9838 2053