Description
This is a NER model, aimed to be run only after detecting the RETURN_OF_CONF_INFO
clause with a proper classifier (use legmulticlf_mnda_sections_paragraph_other
model for that purpose). It will extract the following entities: ARCHIVAL_PURPOSE
, and LEGAL_PURPOSE
.
Predicted Entities
ARCHIVAL_PURPOSE
, LEGAL_PURPOSE
How to use
document_assembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentence_detector = nlp.SentenceDetector()\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = nlp.Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_embeddings_legal_roberta_base","en") \
.setInputCols(["sentence", "token"]) \
.setOutputCol("embeddings")\
.setMaxSentenceLength(512)\
.setCaseSensitive(True)
ner_model = legal.NerModel.pretrained("legner_nda_return_of_conf_info", "en", "legal/models")\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")
ner_converter = nlp.NerConverter()\
.setInputCols(["sentence", "token", "ner"])\
.setOutputCol("ner_chunk")
nlpPipeline = nlp.Pipeline(stages=[
document_assembler,
sentence_detector,
tokenizer,
embeddings,
ner_model,
ner_converter
])
empty_data = spark.createDataFrame([[""]]).toDF("text")
model = nlpPipeline.fit(empty_data)
text = ["""Notwithstanding the foregoing, the Recipient and its Representatives may retain copies of the Confidential Information to the extent that such retention is required to demonstrate compliance with applicable law or governmental rule or regulation, to the extent included in any board or executive documents relating to the proposed business relationship, and in its archives for backup purposes subject to the confidentiality provisions of this Agreement."""]
result = model.transform(spark.createDataFrame([text]).toDF("text"))
Results
+--------------+----------------+
|chunk |ner_label |
+--------------+----------------+
|applicable law|LEGAL_PURPOSE |
|governmental |LEGAL_PURPOSE |
|regulation |LEGAL_PURPOSE |
|archives |ARCHIVAL_PURPOSE|
|backup |ARCHIVAL_PURPOSE|
+--------------+----------------+
Model Information
Model Name: | legner_nda_return_of_conf_info |
Compatibility: | Legal NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Size: | 16.3 MB |
References
In-house annotations on the Non-disclosure Agreements
Benchmarking
label precision recall f1-score support
ARCHIVAL_PURPOSE 0.94 1.00 0.97 16
LEGAL_PURPOSE 0.78 0.85 0.81 33
micro-avg 0.83 0.90 0.86 49
macro-avg 0.86 0.92 0.89 49
weighted-avg 0.83 0.90 0.86 49