Description
This is a Legal NER model which helps to retrieve Courts/Arbitrations, Rules and Resolution Means from legal agreements.
Predicted Entities
RESOLUT_MEANS, RULES_NAME, COURT_NAME
How to use
documentAssembler = nlp.DocumentAssembler()\
        .setInputCol("text")\
        .setOutputCol("document")
        
sentenceDetector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl","xx")\
        .setInputCols(["document"])\
        .setOutputCol("sentence")
tokenizer = nlp.Tokenizer()\
        .setInputCols(["sentence"])\
        .setOutputCol("token")
embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_embeddings_legal_roberta_base","en") \
    .setInputCols(["sentence", "token"]) \
    .setOutputCol("embeddings")
ner_model = legal.NerModel().pretrained("legner_dispute_clauses","en","legal/models")\
        .setInputCols(["sentence", "token", "embeddings"])\
        .setOutputCol("ner")
ner_converter = nlp.NerConverter()\
        .setInputCols(["sentence","token","ner"])\
        .setOutputCol("ner_chunk")
nlpPipeline = nlp.Pipeline(stages=[
        documentAssembler,
        sentenceDetector,
        tokenizer,
        embeddings,
        ner_model,
        ner_converter])
empty_data = spark.createDataFrame([[""]]).toDF("text")
model = nlpPipeline.fit(empty_data)
text = ["""The contract includes a dispute clause that requires the parties to follow the rules of judicial arbitration set forth by the United Nations Commission on International Trade Law (UNCITRAL) Rules of Arbitration and the jurisdiction of the International Chamber of Commerce court in the event of a dispute."""]
res = model.transform(spark.createDataFrame([text]).toDF("text"))
Results
+-------------+---------------+
|        token|      ner_label|
+-------------+---------------+
|          The|              O|
|     contract|              O|
|     includes|              O|
|            a|              O|
|      dispute|              O|
|       clause|              O|
|         that|              O|
|     requires|              O|
|          the|              O|
|      parties|              O|
|           to|              O|
|       follow|              O|
|          the|              O|
|        rules|              O|
|           of|              O|
|     judicial|B-RESOLUT_MEANS|
|  arbitration|I-RESOLUT_MEANS|
|          set|              O|
|        forth|              O|
|           by|              O|
|          the|              O|
|       United|   B-RULES_NAME|
|      Nations|   I-RULES_NAME|
|   Commission|   I-RULES_NAME|
|           on|   I-RULES_NAME|
|International|   I-RULES_NAME|
|        Trade|   I-RULES_NAME|
|          Law|   I-RULES_NAME|
|            (|   I-RULES_NAME|
|     UNCITRAL|   I-RULES_NAME|
|            )|   I-RULES_NAME|
|        Rules|   I-RULES_NAME|
|           of|   I-RULES_NAME|
|  Arbitration|   I-RULES_NAME|
|          and|              O|
|          the|              O|
| jurisdiction|              O|
|           of|              O|
|          the|              O|
|International|   B-COURT_NAME|
|      Chamber|   I-COURT_NAME|
|           of|   I-COURT_NAME|
|     Commerce|   I-COURT_NAME|
|        court|              O|
|           in|              O|
|          the|              O|
|        event|              O|
|           of|              O|
|            a|              O|
|      dispute|              O|
|            .|              O|
+-------------+---------------+
Model Information
| Model Name: | legner_dispute_clauses | 
| Compatibility: | Legal NLP 1.0.0+ | 
| License: | Licensed | 
| Edition: | Official | 
| Input Labels: | [sentence, token, embeddings] | 
| Output Labels: | [ner] | 
| Language: | en | 
| Size: | 16.2 MB | 
References
In-house annotations of the CUAD dataset
Benchmarking
label	 tp	 fp	 fn	 prec	 rec	 f1
B-RESOLUT_MEANS	 14	 4	 6	 0.7777778	 0.7	 0.73684216
B-RULES_NAME	 15	 0	 5	 1.0	 0.75	 0.85714287
I-RESOLUT_MEANS	 12	 0	 3	 1.0	 0.8	 0.88888896
B-COURT_NAME	 26	 6	 6	 0.8125	 0.8125	 0.8125
I-RULES_NAME	 101	 7	 19	 0.9351852	 0.84166664	 0.8859649
I-COURT_NAME	 166	 23	 24	 0.87830687	 0.8736842	 0.87598944
Macro-average	 334 40 63 0.9006283 0.7963085 0.8452619
Micro-average	 334 40 63 0.8930481 0.84130985 0.8664072