Legal NER for NDA (Names of the Parties Clauses)

Description

This is a NER model, aimed to be run only after detecting the NAMES_OF_PARTIES clause with a proper classifier (use legmulticlf_mnda_sections_paragraph_other model for that purpose). It will extract the following entities: ALIAS, EFFDATE_NUMERIC, LOCATION, and PARTY.

Predicted Entities

ALIAS, EFFDATE_NUMERIC, LOCATION, PARTY

Download Copy S3 URI

How to use

document_assembler = nlp.DocumentAssembler()\
        .setInputCol("text")\
        .setOutputCol("document")
        
sentence_detector = nlp.SentenceDetector()\
        .setInputCols(["document"])\
        .setOutputCol("sentence")

tokenizer = nlp.Tokenizer()\
        .setInputCols(["sentence"])\
        .setOutputCol("token")

embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_embeddings_legal_roberta_base","en") \
        .setInputCols(["sentence", "token"]) \
        .setOutputCol("embeddings")\
        .setMaxSentenceLength(512)\
        .setCaseSensitive(True)

ner_model = legal.NerModel.pretrained("legner_nda_names_of_parties", "en", "legal/models")\
        .setInputCols(["sentence", "token", "embeddings"])\
        .setOutputCol("ner")

ner_converter = nlp.NerConverter()\
        .setInputCols(["sentence", "token", "ner"])\
        .setOutputCol("ner_chunk")

nlpPipeline = nlp.Pipeline(stages=[
        document_assembler,
        sentence_detector,
        tokenizer,
        embeddings,
        ner_model,
        ner_converter
])

empty_data = spark.createDataFrame([[""]]).toDF("text")

model = nlpPipeline.fit(empty_data)

text = ["""This Confidentiality Agreement (this "Agreement") is dated effective as of the 4th day of June 2001, between Amerada Hess Corporation, a Delaware corporation ("AHC"), and Triton Energy Limited, a Cayman Islands company (the "Company")."""]

result = model.transform(spark.createDataFrame([text]).toDF("text"))

Results

+------------------------+---------------+
|chunk                   |ner_label      |
+------------------------+---------------+
|4th day of June 2001    |EFFDATE_NUMERIC|
|Amerada Hess Corporation|PARTY          |
|Delaware                |LOCATION       |
|AHC                     |ALIAS          |
|Triton Energy Limited   |PARTY          |
|Cayman Islands          |LOCATION       |
|Company                 |ALIAS          |
+------------------------+---------------+

Model Information

Model Name: legner_nda_names_of_parties
Compatibility: Legal NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en
Size: 16.3 MB

References

In-house annotations on the Non-disclosure Agreements

Benchmarking

label            precision  recall  f1-score  support 
ALIAS            0.92       0.96    0.94      25      
EFFDATE_NUMERIC  0.90       0.96    0.93      27      
LOCATION         1.00       0.93    0.96      14      
PARTY            0.77       0.88    0.82      26      
micro-avg        0.88       0.93    0.91      92      
macro-avg        0.90       0.93    0.91      92      
weighted-avg     0.88       0.93    0.91      92