Description
This is a NER model, aimed to be run only after detecting the USE_OF_CONF_INFO  clause with a proper classifier (use legmulticlf_mnda_sections_paragraph_other for that purpose). It will extract the following entities: RESTRICTED_ACTION, RESTRICTED_SUBJECT, RESTRICTED_OBJECT, and RESTRICTED_IND_OBJECT.
Predicted Entities
RESTRICTED_ACTION, RESTRICTED_SUBJECT, RESTRICTED_OBJECT, RESTRICTED_IND_OBJECT
How to use
document_assembler = nlp.DocumentAssembler()\
        .setInputCol("text")\
        .setOutputCol("document")
        
sentence_detector = nlp.SentenceDetector()\
        .setInputCols(["document"])\
        .setOutputCol("sentence")
tokenizer = nlp.Tokenizer()\
        .setInputCols(["sentence"])\
        .setOutputCol("token")
embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_embeddings_legal_roberta_base","en") \
        .setInputCols(["sentence", "token"]) \
        .setOutputCol("embeddings")\
        .setMaxSentenceLength(512)\
        .setCaseSensitive(True)
ner_model = legal.NerModel.pretrained("legner_nda_confidential_information_restricted", "en", "legal/models")\
        .setInputCols(["sentence", "token", "embeddings"])\
        .setOutputCol("ner")
ner_converter = nlp.NerConverter()\
        .setInputCols(["sentence", "token", "ner"])\
        .setOutputCol("ner_chunk")
nlpPipeline = nlp.Pipeline(stages=[
        document_assembler,
        sentence_detector,
        tokenizer,
        embeddings,
        ner_model,
        ner_converter])
empty_data = spark.createDataFrame([[""]]).toDF("text")
model = nlpPipeline.fit(empty_data)
text = ["""The recipient may use the proprietary information solely for the purpose of performing its obligations under a separate agreement with the disclosing party, and may not disclose such information to any third party without the prior written consent of the disclosing party."""]
result = model.transform(spark.createDataFrame([text]).toDF("text"))
Results
+-----------+---------------------+
|chunk      |ner_label            |
+-----------+---------------------+
|recipient  |RESTRICTED_SUBJECT   |
|disclose   |RESTRICTED_ACTION    |
|information|RESTRICTED_OBJECT    |
|third party|RESTRICTED_IND_OBJECT|
+-----------+---------------------+
Model Information
| Model Name: | legner_nda_confidential_information_restricted | 
| Compatibility: | Legal NLP 1.0.0+ | 
| License: | Licensed | 
| Edition: | Official | 
| Input Labels: | [sentence, token, embeddings] | 
| Output Labels: | [ner] | 
| Language: | en | 
| Size: | 16.3 MB | 
References
In-house annotations on the Non-disclosure Agreements
Benchmarking
label                  precision  recall  f1-score  support 
RESTRICTED_ACTION      0.92       0.94    0.93      36      
RESTRICTED_IND_OBJECT  1.00       0.93    0.97      15      
RESTRICTED_OBJECT      0.74       1.00    0.85      26      
RESTRICTED_SUBJECT     0.72       0.90    0.80      29      
micro-avg              0.82       0.94    0.88      106     
macro-avg              0.85       0.94    0.89      106     
weighted-avg           0.83       0.94    0.88      106