Description
This is a NER model, aimed to be run only after detecting the TERMINATION
clause with a proper classifier (use legmulticlf_mnda_sections_paragraph_other for that purpose). It will extract the following entities: TERM_DATE
, and REF_TERM_DATE
.
Predicted Entities
TERM_DATE
, REF_TERM_DATE
How to use
document_assembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentence_detector = nlp.SentenceDetector()\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = nlp.Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_embeddings_legal_roberta_base","en") \
.setInputCols(["sentence", "token"]) \
.setOutputCol("embeddings")\
.setMaxSentenceLength(512)\
.setCaseSensitive(True)
ner_model = legal.NerModel.pretrained("legner_nda_termination", "en", "legal/models")\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")
ner_converter = nlp.NerConverter()\
.setInputCols(["sentence", "token", "ner"])\
.setOutputCol("ner_chunk")
nlpPipeline = nlp.Pipeline(stages=[
document_assembler,
sentence_detector,
tokenizer,
embeddings,
ner_model,
ner_converter])
empty_data = spark.createDataFrame([[""]]).toDF("text")
model = nlpPipeline.fit(empty_data)
text = ["""Except as otherwise specified herein, the obligations of the parties set forth in this Agreement shall terminate and be of no further force and effect eighteen months from the date hereof."""]
result = model.transform(spark.createDataFrame([text]).toDF("text"))
Results
+---------------+-------------+
|chunk |ner_label |
+---------------+-------------+
|eighteen months|TERM_DATE |
|date hereof |REF_TERM_DATE|
+---------------+-------------+
Model Information
Model Name: | legner_nda_termination |
Compatibility: | Legal NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Size: | 16.2 MB |
References
In-house annotations on the Non-disclosure Agreements
Benchmarking
label precision recall f1-score support
B-TERM_DATE 1.00 0.92 0.96 12
I-TERM_DATE 0.97 1.00 0.98 28
B-REF_TERM_DATE 0.91 0.91 0.91 11
I-REF_TERM_DATE 1.00 0.90 0.95 10
micro-avg 0.97 0.95 0.96 61
macro-avg 0.97 0.93 0.95 61
weighted-avg 0.97 0.95 0.96 61