Legal Zero-shot Relation Extraction

Description

This is a Zero-shot Relation Extraction Model, meaning that it does not require any training data, just few examples of of the relations types you are looking for, to output a proper result.

Make sure you keep the proper syntax of the relations you want to extract. For example:

re_model.setRelationalCategories({
    "GRANTS_TO": ["{OBLIGATION_SUBJECT} grants {OBLIGATION_INDIRECT_OBJECT}"],
    "GRANTS": ["{OBLIGATION_SUBJECT} grants {OBLIGATION_ACTION}"]
})
  • The keys of the dictionary are the name of the relations (GRANTS_TO, GRANTS)
  • The values are list of sentences with similar examples of the relation
  • The values in brackets are the NER labels extracted by an NER component before

Predicted Entities

Copy S3 URI

How to use

documentAssembler = nlp.DocumentAssembler()\
  .setInputCol("text")\
  .setOutputCol("document")

sparktokenizer = nlp.Tokenizer()\
  .setInputCols("document")\
  .setOutputCol("token")

tokenClassifier = legal.BertForTokenClassifier.pretrained('legner_obligations','en', 'legal/models')\
  .setInputCols("token", "document")\
  .setOutputCol("ner")\
  .setCaseSensitive(True)

ner_converter = nlp.NerConverter()\
    .setInputCols(["document", "token", "ner"])\
    .setOutputCol("ner_chunk")

re_model = legal.ZeroShotRelationExtractionModel.pretrained("legre_zero_shot", "en", "legal/models")\
    .setInputCols(["ner_chunk", "sentence"]) \
    .setOutputCol("relations")

# Remember it's 2 curly brackets instead of one if you are using Spark NLP < 4.0
re_model.setRelationalCategories({
    "GRANTS_TO": ["{OBLIGATION_SUBJECT} grants {OBLIGATION_INDIRECT_OBJECT}"],
    "GRANTS": ["{OBLIGATION_SUBJECT} grants {OBLIGATION_ACTION}"]
})

pipeline = sparknlp.base.Pipeline() \
    .setStages([document_assembler,  
                sparktokenizer,
                tokenClassifier, 
                ner_converter,
                re_model
               ])
               
# create Spark DF

sample_text = """Fox grants to Licensee a limited, exclusive right and license"""

data = spark.createDataFrame([[sample_text]]).toDF("text")
model = pipeline.fit(data)
results = model.transform(data)

# ner output
results.selectExpr("explode(ner_chunk) as ner").show(truncate=False)

# relations output
results.selectExpr("explode(relations) as relation").show(truncate=False)

Results

+----------------------------------------------------------------------------------------------------------------------------+
|ner                                                                                                                         |
+----------------------------------------------------------------------------------------------------------------------------+
|[chunk, 0, 2, Fox, [entity -> OBLIGATION_SUBJECT, sentence -> 0, chunk -> 0, confidence -> 0.6905101], []]                  |
|[chunk, 4, 9, grants, [entity -> OBLIGATION_ACTION, sentence -> 0, chunk -> 1, confidence -> 0.7512371], []]                |
|[chunk, 14, 21, Licensee, [entity -> OBLIGATION_INDIRECT_OBJECT, sentence -> 0, chunk -> 2, confidence -> 0.8294538], []]   |
|[chunk, 23, 31, a limited, [entity -> OBLIGATION, sentence -> 0, chunk -> 3, confidence -> 0.7429814], []]                  |
|[chunk, 34, 60, exclusive right and license, [entity -> OBLIGATION, sentence -> 0, chunk -> 4, confidence -> 0.9236847], []]|
+----------------------------------------------------------------------------------------------------------------------------+

+-------------+
|relation     |
+-------------+
|[category, 0, 91, GRANTS, [entity1_begin -> 0, relation -> GRANTS, hypothesis -> Fox grants grants, confidence -> 0.7592092, nli_prediction -> entail, entity1 -> OBLIGATION_SUBJECT, syntactic_distance -> undefined, chunk2 -> grants, entity2_end -> 9, entity1_end -> 2, entity2_begin -> 4, entity2 -> OBLIGATION_ACTION, chunk1 -> Fox, sentence -> 0], []]                       |
|[category, 92, 185, GRANTS_TO, [entity1_begin -> 0, relation -> GRANTS_TO, hypothesis -> Fox grants Licensee, confidence -> 0.9822127, nli_prediction -> entail, entity1 -> OBLIGATION_SUBJECT, syntactic_distance -> undefined, chunk2 -> Licensee, entity2_end -> 21, entity1_end -> 2, entity2_begin -> 14, entity2 -> OBLIGATION_INDIRECT_OBJECT, chunk1 -> Fox, sentence -> 0], []]|
+-------------+

Model Information

Model Name: legre_zero_shot
Type: legal
Compatibility: Legal NLP 1.0.0+
License: Licensed
Edition: Official
Language: en
Size: 406.4 MB
Case sensitive: true

References

Bert Base (cased) trained on the GLUE MNLI dataset.