Description
This is a NER model, aimed to be run only after detecting the NON_SOLIC
clause with a proper classifier (use legmulticlf_mnda_sections_paragraph_other
model for that purpose). It will extract the following entities: NON_SOLIC_ACTION
, NON_SOLIC_OBJECT
, NON_SOLIC_IND_OBJECT
, and NON_SOLIC_SUBJECT
.
Predicted Entities
NON_SOLIC_ACTION
, NON_SOLIC_OBJECT
, NON_SOLIC_IND_OBJECT
, NON_SOLIC_SUBJECT
How to use
document_assembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentence_detector = nlp.SentenceDetector()\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = nlp.Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_embeddings_legal_roberta_base","en") \
.setInputCols(["sentence", "token"]) \
.setOutputCol("embeddings")\
.setMaxSentenceLength(512)\
.setCaseSensitive(True)
ner_model = legal.NerModel.pretrained("legner_nda_non_solicitation", "en", "legal/models")\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")
ner_converter = nlp.NerConverter()\
.setInputCols(["sentence", "token", "ner"])\
.setOutputCol("ner_chunk")
nlpPipeline = nlp.Pipeline(stages=[
document_assembler,
sentence_detector,
tokenizer,
embeddings,
ner_model,
ner_converter
])
empty_data = spark.createDataFrame([[""]]).toDF("text")
model = nlpPipeline.fit(empty_data)
text = ["""During the 12-month period commencing on the date of this Agreement, each Party agrees that it will not permit any of its officers, directors, or employees or any direct or indirect subsidiary or any employment agency retained by a Party or any such subsidiary, in each case who is or becomes aware of the negotiation of a possible Transaction between the Parties, to solicit for employment with such Party or with any of its direct or indirect subsidiaries any employee of the other Party or any of its direct or indirect subsidiaries;"""]
result = model.transform(spark.createDataFrame([text]).toDF("text"))
Results
+----------+--------------------+
|chunk |ner_label |
+----------+--------------------+
|Party |NON_SOLIC_SUBJECT |
|officers |NON_SOLIC_IND_OBJECT|
|directors |NON_SOLIC_IND_OBJECT|
|employees |NON_SOLIC_IND_OBJECT|
|agency |NON_SOLIC_SUBJECT |
|solicit |NON_SOLIC_ACTION |
|employment|NON_SOLIC_OBJECT |
|employee |NON_SOLIC_IND_OBJECT|
+----------+--------------------+
Model Information
Model Name: | legner_nda_non_solicitation |
Compatibility: | Legal NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Size: | 16.3 MB |
References
In-house annotations on the Non-disclosure Agreements
Benchmarking
label precision recall f1-score support
NON_SOLIC_ACTION 1.00 0.95 0.98 22
NON_SOLIC_IND_OBJECT 0.67 0.90 0.76 29
NON_SOLIC_OBJECT 0.81 0.94 0.87 18
NON_SOLIC_SUBJECT 0.86 0.86 0.86 21
micro-avg 0.80 0.91 0.85 90
macro-avg 0.83 0.91 0.87 90
weighted-avg 0.82 0.91 0.86 90