Description
IMPORTANT: Don’t run this model on the whole legal agreement. Instead:
- Split by paragraphs. You can use notebook 1 in Finance or Legal as inspiration;
- Use the
legclf_cuad_whereas_clause
Text Classifier to select only these paragraphs;
This is a Relation Extraction model to infer relations between elements in WHEREAS clauses, more specifically the SUBJECT, the ACTION and the OBJECT. There are two relations possible: has_subject
and has_object
.
You can also use legpipe_whereas
which includes this model and its NER and also depedency parsing, to carry out chunk extraction using grammatical features (the dependency tree).
This model is a sm
model without meaningful directions in the relations (the model was not trained to understand if the direction of the relation is from left to right or right to left).
There are bigger models in Models Hub trained also with directed relationships.
Predicted Entities
has_subject
, has_object
How to use
documentAssembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
tokenizer = nlp.Tokenizer()\
.setInputCols("document")\
.setOutputCol("token")
embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_embeddings_legal_roberta_base","en") \
.setInputCols(["document", "token"]) \
.setOutputCol("embeddings")
ner_model = legal.NerModel.pretrained('legner_whereas', 'en', 'legal/models')\
.setInputCols(["document", "token", "embeddings"])\
.setOutputCol("ner")
ner_converter = nlp.NerConverter()\
.setInputCols(["document","token","ner"])\
.setOutputCol("ner_chunk")
reDL = legal.RelationExtractionDLModel\
.pretrained("legre_whereas", "en", "legal/models")\
.setPredictionThreshold(0.5)\
.setInputCols(["ner_chunk", "document"])\
.setOutputCol("relations")
pipeline = nlp.Pipeline(stages=[
documentAssembler,
tokenizer,
embeddings,
ner_model,
ner_converter,
reDL
])
text = """
WHEREAS VerticalNet owns and operates a series of online communities ( as defined below ) that are accessible via the world wide web , each of which is designed to be an online gathering place for businesses of a certain type or within a certain industry ;
"""
data = spark.createDataFrame([[text]]).toDF("text")
model = pipeline.fit(data)
res = model.transform(data)
Results
relation entity1 entity1_begin entity1_end chunk1 entity2 entity2_begin entity2_end chunk2 confidence
has_subject WHEREAS_SUBJECT 11 21 VerticalNet WHEREAS_ACTION 32 39 operates 0.9982886
has_subject WHEREAS_SUBJECT 11 21 VerticalNet WHEREAS_OBJECT 41 70 a series of online communities 0.9890683
has_object WHEREAS_ACTION 32 39 operates WHEREAS_OBJECT 41 70 a series of online communities 0.7831568
Model Information
Model Name: | legre_whereas |
Type: | legal |
Compatibility: | Legal NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Language: | en |
Size: | 409.9 MB |
References
Manual annotations on CUAD dataset
Benchmarking
label Recall Precision F1 Support
has_object 0.946 0.981 0.964 56
has_subject 0.952 0.988 0.969 83
no_rel 1.000 0.970 0.985 161
Avg. 0.966 0.980 0.973 -
Weighted-Avg. 0.977 0.977 0.977 -