Legal NER for NDA (Termination Clause)

Description

This is a NER model, aimed to be run only after detecting the TERMINATION clause with a proper classifier (use legmulticlf_mnda_sections_paragraph_other for that purpose). It will extract the following entities: TERM_DATE , and REF_TERM_DATE.

Predicted Entities

TERM_DATE, REF_TERM_DATE

Download Copy S3 URI

How to use

document_assembler = nlp.DocumentAssembler()\
        .setInputCol("text")\
        .setOutputCol("document")
        
sentence_detector = nlp.SentenceDetector()\
        .setInputCols(["document"])\
        .setOutputCol("sentence")

tokenizer = nlp.Tokenizer()\
        .setInputCols(["sentence"])\
        .setOutputCol("token")

embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_embeddings_legal_roberta_base","en") \
        .setInputCols(["sentence", "token"]) \
        .setOutputCol("embeddings")\
        .setMaxSentenceLength(512)\
        .setCaseSensitive(True)

ner_model = legal.NerModel.pretrained("legner_nda_termination", "en", "legal/models")\
        .setInputCols(["sentence", "token", "embeddings"])\
        .setOutputCol("ner")

ner_converter = nlp.NerConverter()\
        .setInputCols(["sentence", "token", "ner"])\
        .setOutputCol("ner_chunk")

nlpPipeline = nlp.Pipeline(stages=[
        document_assembler,
        sentence_detector,
        tokenizer,
        embeddings,
        ner_model,
        ner_converter])

empty_data = spark.createDataFrame([[""]]).toDF("text")

model = nlpPipeline.fit(empty_data)

text = ["""Except as otherwise specified herein, the obligations of the parties set forth in this Agreement shall terminate and be of no further force and effect eighteen months from the date hereof."""]

result = model.transform(spark.createDataFrame([text]).toDF("text"))

Results

+---------------+-------------+
|chunk          |ner_label    |
+---------------+-------------+
|eighteen months|TERM_DATE    |
|date hereof    |REF_TERM_DATE|
+---------------+-------------+

Model Information

Model Name: legner_nda_termination
Compatibility: Legal NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en
Size: 16.2 MB

References

In-house annotations on the Non-disclosure Agreements

Benchmarking

label            precision  recall  f1-score  support 
B-TERM_DATE      1.00       0.92    0.96      12      
I-TERM_DATE      0.97       1.00    0.98      28      
B-REF_TERM_DATE  0.91       0.91    0.91      11      
I-REF_TERM_DATE  1.00       0.90    0.95      10      
micro-avg        0.97       0.95    0.96      61      
macro-avg        0.97       0.93    0.95      61      
weighted-avg     0.97       0.95    0.96      61