Legal Roles NER

Description

This NER models extracts legal roles in an agreement, such as Borrower, Supplier, Agent, Attorney, Pursuant, etc.

Predicted Entities

ROLE, O

Copy S3 URI

How to use

documenter = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentencizer = nlp.SentenceDetector()\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = nlp.Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_embeddings_legal_roberta_base","en") \
    .setInputCols(["sentence", "token"]) \
    .setOutputCol("embeddings")

ner = legal.NerModel.pretrained('legner_roles', 'en', 'legal/models')\
    .setInputCols(["sentence", "token", "embeddings"])\
    .setOutputCol("ner")\

ner_converter = nlp.NerConverter()\
    .setInputCols(["sentence", "token", "ner"])\
    .setOutputCol("ner_chunk")

pipeline = nlp.Pipeline(stages=[documenter, sentencizer, tokenizer, embeddings, ner, ner_converter])

empty = spark.createDataFrame([[example]]).toDF("text")

tr_results = model.transform(spark.createDataFrame([[example]]).toDF('text'))

Results

+-------+---------+---------+
|sent_id|chunk    |ner_label|
+-------+---------+---------+
|1      |Lender   |ROLE     |
|1      |Lender's |ROLE     |
|1      |principal|ROLE     |
|1      |Lender   |ROLE     |
|2      |pursuant |ROLE     |
|3      |Lenders  |ROLE     |
|3      |Lenders  |ROLE     |
|3      |Lenders  |ROLE     |
|4      |Lenders  |ROLE     |
|7      |Agent    |ROLE     |
|14     |Lenders  |ROLE     |
|14     |Borrowers|ROLE     |
|14     |Lender   |ROLE     |
|14     |Agent    |ROLE     |
|14     |Lender   |ROLE     |
|15     |Agent    |ROLE     |
|15     |Lender   |ROLE     |
|15     |pursuant |ROLE     |
|15     |Agent    |ROLE     |
|15     |Borrowers|ROLE     |
+-------+---------+---------+

Model Information

Model Name: legner_roles
Compatibility: Legal NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en
Size: 16.2 MB

References

CUAD dataset and synthetic data

Benchmarking

label	 tp	 fp	 fn	 prec	 rec	 f1
B-ROLE	 19095	 16	 77	 0.9991628	 0.9959837	 0.9975707
I-ROLE	 162	 1	 0	 0.993865	 1.0	 0.9969231
Macro-average	19257 17 77 0.9965139 0.99799186 0.9972523
Micro-average	19257 17 77 0.999118 0.9960174 0.9975653