Description
This is a Legal NER model which helps to retrieve Courts/Arbitrations, Rules and Resolution Means from legal agreements.
Predicted Entities
RESOLUT_MEANS
, RULES_NAME
, COURT_NAME
How to use
documentAssembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentenceDetector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl","xx")\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = nlp.Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_embeddings_legal_roberta_base","en") \
.setInputCols(["sentence", "token"]) \
.setOutputCol("embeddings")
ner_model = legal.NerModel().pretrained("legner_dispute_clauses","en","legal/models")\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")
ner_converter = nlp.NerConverter()\
.setInputCols(["sentence","token","ner"])\
.setOutputCol("ner_chunk")
nlpPipeline = nlp.Pipeline(stages=[
documentAssembler,
sentenceDetector,
tokenizer,
embeddings,
ner_model,
ner_converter])
empty_data = spark.createDataFrame([[""]]).toDF("text")
model = nlpPipeline.fit(empty_data)
text = ["""The contract includes a dispute clause that requires the parties to follow the rules of judicial arbitration set forth by the United Nations Commission on International Trade Law (UNCITRAL) Rules of Arbitration and the jurisdiction of the International Chamber of Commerce court in the event of a dispute."""]
res = model.transform(spark.createDataFrame([text]).toDF("text"))
Results
+-------------+---------------+
| token| ner_label|
+-------------+---------------+
| The| O|
| contract| O|
| includes| O|
| a| O|
| dispute| O|
| clause| O|
| that| O|
| requires| O|
| the| O|
| parties| O|
| to| O|
| follow| O|
| the| O|
| rules| O|
| of| O|
| judicial|B-RESOLUT_MEANS|
| arbitration|I-RESOLUT_MEANS|
| set| O|
| forth| O|
| by| O|
| the| O|
| United| B-RULES_NAME|
| Nations| I-RULES_NAME|
| Commission| I-RULES_NAME|
| on| I-RULES_NAME|
|International| I-RULES_NAME|
| Trade| I-RULES_NAME|
| Law| I-RULES_NAME|
| (| I-RULES_NAME|
| UNCITRAL| I-RULES_NAME|
| )| I-RULES_NAME|
| Rules| I-RULES_NAME|
| of| I-RULES_NAME|
| Arbitration| I-RULES_NAME|
| and| O|
| the| O|
| jurisdiction| O|
| of| O|
| the| O|
|International| B-COURT_NAME|
| Chamber| I-COURT_NAME|
| of| I-COURT_NAME|
| Commerce| I-COURT_NAME|
| court| O|
| in| O|
| the| O|
| event| O|
| of| O|
| a| O|
| dispute| O|
| .| O|
+-------------+---------------+
Model Information
Model Name: | legner_dispute_clauses |
Compatibility: | Legal NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Size: | 16.2 MB |
References
In-house annotations of the CUAD dataset
Benchmarking
label tp fp fn prec rec f1
B-RESOLUT_MEANS 14 4 6 0.7777778 0.7 0.73684216
B-RULES_NAME 15 0 5 1.0 0.75 0.85714287
I-RESOLUT_MEANS 12 0 3 1.0 0.8 0.88888896
B-COURT_NAME 26 6 6 0.8125 0.8125 0.8125
I-RULES_NAME 101 7 19 0.9351852 0.84166664 0.8859649
I-COURT_NAME 166 23 24 0.87830687 0.8736842 0.87598944
Macro-average 334 40 63 0.9006283 0.7963085 0.8452619
Micro-average 334 40 63 0.8930481 0.84130985 0.8664072