Description
This is a NER model, aimed to be run only after detecting the EXCEPTIONS
clause with a proper classifier (use legmulticlf_mnda_sections_paragraph_other for that purpose). It will extract the following entities: EXCLUDED_INFO
, and EXCLUSION_GROUND
.
Predicted Entities
EXCLUDED_INFO
, EXCLUSION_GROUND
How to use
document_assembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentence_detector = nlp.SentenceDetector()\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = nlp.Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_embeddings_legal_roberta_base","en") \
.setInputCols(["sentence", "token"]) \
.setOutputCol("embeddings")\
.setMaxSentenceLength(512)\
.setCaseSensitive(True)
ner_model = legal.NerModel.pretrained("legner_nda_exceptions", "en", "legal/models")\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")
ner_converter = nlp.NerConverter()\
.setInputCols(["sentence", "token", "ner"])\
.setOutputCol("ner_chunk")
nlpPipeline = nlp.Pipeline(stages=[
document_assembler,
sentence_detector,
tokenizer,
embeddings,
ner_model,
ner_converter])
empty_data = spark.createDataFrame([[""]]).toDF("text")
model = nlpPipeline.fit(empty_data)
text = ["""( ii ) was within the Recipient’s or its Recipient Representatives possession prior to its being furnished to the Recipient or its Recipient Representatives by or on behalf of the Provider pursuant here to , provided that the source of such information was not bound by a confidentiality agreement with, or other contractual, legal or fiduciary obligation of confidentiality to, the Provider or any other party with respect to such information."""]
result = model.transform(spark.createDataFrame([text]).toDF("text"))
Results
+----------+----------------+
|chunk |ner_label |
+----------+----------------+
|possession|EXCLUDED_INFO |
|prior to |EXCLUSION_GROUND|
+----------+----------------+
Model Information
Model Name: | legner_nda_exceptions |
Compatibility: | Legal NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Size: | 16.3 MB |
References
In-house annotations on the Non-disclosure Agreements
Benchmarking
label precision recall f1-score support
B-EXCLUDED_INFO 0.84 0.91 0.87 34
B-EXCLUSION_GROUND 0.85 0.91 0.88 32
I-EXCLUSION_GROUND 0.91 0.76 0.83 51
I-EXCLUDED_INFO 1.00 0.50 0.67 4
micro-avg 0.87 0.83 0.85 121
macro-avg 0.90 0.77 0.81 121
weighted-avg 0.88 0.83 0.85 121