Legal NER - License / Permission Clauses (Bert, sm)

Description

This model aims to detect License grants / permissions in agreements, provided by a Subject (PERMISSION_SUBJECT) to a Recipient (PERMISSION_INDIRECT_OBJECT). THe permission itself is in PERMISSION tag.

There is a lighter (non-transformer based) version of this model available as legner_grants_md.

Predicted Entities

PERMISSION, PERMISSION_SUBJECT, PERMISSION_OBJECT, PERMISSION_INDIRECT_OBJECT

Copy S3 URI

How to use

documentAssembler = nlp.DocumentAssembler()\
  .setInputCol("text")\
  .setOutputCol("document")

tokenizer = nlp.Tokenizer()\
  .setInputCols("document")\
  .setOutputCol("token")

tokenClassifier = legal.BertForTokenClassification.pretrained("legner_bert_grants", "en", "legal/models")\
  .setInputCols("token", "document")\
  .setOutputCol("label")\
  .setCaseSensitive(True)

pipeline =  nlp.Pipeline(stages=[
  documentAssembler,
  tokenizer,
  tokenClassifier
    ]
)

import pandas as pd

p_model = pipeline.fit(spark.createDataFrame(pd.DataFrame({'text': ['']})))

text = """Fox grants to Licensee a limited, exclusive (except as otherwise may be provided in this Agreement), 
non-transferable (except as permitted in Paragraph 17(d)) right and license"""

res = p_model.transform(spark.createDataFrame([[text]]).toDF("text"))

from pyspark.sql import functions as F

res.select(F.explode(F.arrays_zip('token.result', 'label.result')).alias("cols")) \
               .select(F.expr("cols['0']").alias("token"),
                       F.expr("cols['1']").alias("ner_label"))\
               .show(20, truncate=100)

Results

+----------------+----------------------------+
|           token|                   ner_label|
+----------------+----------------------------+
|             Fox|        B-PERMISSION_SUBJECT|
|          grants|                           O|
|              to|                           O|
|        Licensee|B-PERMISSION_INDIRECT_OBJECT|
|               a|                           O|
|         limited|                B-PERMISSION|
|               ,|                I-PERMISSION|
|       exclusive|                I-PERMISSION|
|               (|                I-PERMISSION|
|          except|                I-PERMISSION|
|              as|                I-PERMISSION|
|       otherwise|                I-PERMISSION|
|             may|                I-PERMISSION|
|              be|                I-PERMISSION|
|        provided|                I-PERMISSION|
|              in|                I-PERMISSION|
|            this|                I-PERMISSION|
|       Agreement|                I-PERMISSION|
|              ),|                I-PERMISSION|
|non-transferable|                I-PERMISSION|
+----------------+----------------------------+

Model Information

Model Name: legner_bert_grants
Type: legal
Compatibility: Legal NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence, token]
Output Labels: [ner]
Language: en
Size: 412.2 MB
Case sensitive: true
Max sentence length: 128

References

Manual annotations on CUAD dataset

Benchmarking

                       label  precision    recall  f1-score   support
                B-PERMISSION       0.88      0.79      0.83        38
B-PERMISSION_INDIRECT_OBJECT       0.85      0.94      0.89        36
        B-PERMISSION_SUBJECT       0.89      0.85      0.87        40
                I-PERMISSION       0.80      0.69      0.74       342
                           O       0.94      0.97      0.95      1827
                    accuracy         -         -       0.92      2292
                   macro-avg       0.85      0.81      0.86      2292
                weighted-avg       0.91      0.92      0.91      2292