Description
This NER models extracts legal roles in an agreement, such as Borrower
, Supplier
, Agent
, Attorney
, Pursuant
, etc.
Predicted Entities
ROLE
, O
How to use
documenter = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentencizer = nlp.SentenceDetector()\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = nlp.Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_embeddings_legal_roberta_base","en") \
.setInputCols(["sentence", "token"]) \
.setOutputCol("embeddings")
ner = legal.NerModel.pretrained('legner_roles', 'en', 'legal/models')\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")\
ner_converter = nlp.NerConverter()\
.setInputCols(["sentence", "token", "ner"])\
.setOutputCol("ner_chunk")
pipeline = nlp.Pipeline(stages=[documenter, sentencizer, tokenizer, embeddings, ner, ner_converter])
empty = spark.createDataFrame([[example]]).toDF("text")
tr_results = model.transform(spark.createDataFrame([[example]]).toDF('text'))
Results
+-------+---------+---------+
|sent_id|chunk |ner_label|
+-------+---------+---------+
|1 |Lender |ROLE |
|1 |Lender's |ROLE |
|1 |principal|ROLE |
|1 |Lender |ROLE |
|2 |pursuant |ROLE |
|3 |Lenders |ROLE |
|3 |Lenders |ROLE |
|3 |Lenders |ROLE |
|4 |Lenders |ROLE |
|7 |Agent |ROLE |
|14 |Lenders |ROLE |
|14 |Borrowers|ROLE |
|14 |Lender |ROLE |
|14 |Agent |ROLE |
|14 |Lender |ROLE |
|15 |Agent |ROLE |
|15 |Lender |ROLE |
|15 |pursuant |ROLE |
|15 |Agent |ROLE |
|15 |Borrowers|ROLE |
+-------+---------+---------+
Model Information
Model Name: | legner_roles |
Compatibility: | Legal NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Size: | 16.2 MB |
References
CUAD dataset and synthetic data
Benchmarking
label tp fp fn prec rec f1
B-ROLE 19095 16 77 0.9991628 0.9959837 0.9975707
I-ROLE 162 1 0 0.993865 1.0 0.9969231
Macro-average 19257 17 77 0.9965139 0.99799186 0.9972523
Micro-average 19257 17 77 0.999118 0.9960174 0.9975653