Legal NER for NDA (Required Disclosure Clauses)

Description

This is a NER model, aimed to be run only after detecting the REQ_DISCL clause with a proper classifier (use legmulticlf_mnda_sections_paragraph_other model for that purpose). It will extract the following entities: DISCLOSURE_BASIS, REQ_DISCLOSURE_CONFID, REQ_DISCLOSURE_COOPERATION, REQ_DISCLOSURE_LEGAL, REQ_DISCLOSURE_NOTICE, REQ_DISCLOSURE_PARTY, REQ_DISCLOSURE_REMEDY, and REQ_OBLIGATION_ACTION.

Predicted Entities

DISCLOSURE_BASIS, REQ_DISCLOSURE_CONFID, REQ_DISCLOSURE_COOPERATION, REQ_DISCLOSURE_LEGAL, REQ_DISCLOSURE_NOTICE, REQ_DISCLOSURE_PARTY, REQ_DISCLOSURE_REMEDY, REQ_OBLIGATION_ACTION

Download Copy S3 URI

How to use

document_assembler = nlp.DocumentAssembler()\
        .setInputCol("text")\
        .setOutputCol("document")
        
sentence_detector = nlp.SentenceDetector()\
        .setInputCols(["document"])\
        .setOutputCol("sentence")

tokenizer = nlp.Tokenizer()\
        .setInputCols(["sentence"])\
        .setOutputCol("token")

embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_embeddings_legal_roberta_base","en") \
        .setInputCols(["sentence", "token"]) \
        .setOutputCol("embeddings")\
        .setMaxSentenceLength(512)\
        .setCaseSensitive(True)

ner_model = legal.NerModel.pretrained("legner_nda_req_discl", "en", "legal/models")\
        .setInputCols(["sentence", "token", "embeddings"])\
        .setOutputCol("ner")

ner_converter = nlp.NerConverter()\
        .setInputCols(["sentence", "token", "ner"])\
        .setOutputCol("ner_chunk")

nlpPipeline = nlp.Pipeline(stages=[
        document_assembler,
        sentence_detector,
        tokenizer,
        embeddings,
        ner_model,
        ner_converter
])

empty_data = spark.createDataFrame([[""]]).toDF("text")

model = nlpPipeline.fit(empty_data)

text = ["""If the Discloser waives the Recipient’s compliance with the agreement or fails to obtain a protective order or other appropriate remedies, the Recipient will furnish only that portion of the Confidential Information that is legally required to be disclosed and will use its best efforts to obtain confidential treatment for such Confidential Information."""]

result = model.transform(spark.createDataFrame([text]).toDF("text"))

Results

+----------------------+--------------------------+
|chunk                 |ner_label                 |
+----------------------+--------------------------+
|Discloser             |REQ_DISCLOSURE_PARTY      |
|obtain                |REQ_OBLIGATION_ACTION     |
|protective order      |REQ_DISCLOSURE_REMEDY     |
|appropriate remedies  |REQ_DISCLOSURE_REMEDY     |
|furnish               |REQ_OBLIGATION_ACTION     |
|legally required      |REQ_DISCLOSURE_LEGAL      |
|best efforts          |REQ_DISCLOSURE_COOPERATION|
|obtain                |REQ_OBLIGATION_ACTION     |
|confidential treatment|REQ_DISCLOSURE_CONFID     |
+----------------------+--------------------------+

Model Information

Model Name: legner_nda_req_discl
Compatibility: Legal NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en
Size: 16.3 MB

References

In-house annotations on the Non-disclosure Agreements

Benchmarking

label                       precision  recall  f1-score  support 
DISCLOSURE_BASIS            0.77       0.70    0.73      57      
REQ_DISCLOSURE_CONFID       0.96       0.93    0.95      29      
REQ_DISCLOSURE_COOPERATION  1.00       0.94    0.97      17      
REQ_DISCLOSURE_LEGAL        0.93       0.77    0.84      35      
REQ_DISCLOSURE_NOTICE       0.89       0.89    0.89      19      
REQ_DISCLOSURE_PARTY        1.00       0.89    0.94      38      
REQ_DISCLOSURE_REMEDY       1.00       1.00    1.00      52      
REQ_OBLIGATION_ACTION       0.95       0.86    0.90      121     
macro-avg                   0.94       0.86    0.90      368     
macro-avg                   0.94       0.87    0.90      368     
weighted-avg                0.93       0.86    0.90      368