Description
This model requires legner_bert_grants
as an NER in the pipeline. It’s a md
model with Unidirectional Relations, meaning that the model retrieves in chunk1 the left side of the relation (source), and in chunk2 the right side (target).
Predicted Entities
allows
, is_allowed_to
How to use
documentAssembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentencizer = nlp.SentenceDetectorDLModel\
.pretrained("sentence_detector_dl", "en") \
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = nlp.Tokenizer()\
.setInputCols("sentence")\
.setOutputCol("token")
pos_tagger = nlp.PerceptronModel()\
.pretrained() \
.setInputCols(["sentence", "token"])\
.setOutputCol("pos_tags")
dependency_parser = nlp.DependencyParserModel() \
.pretrained("dependency_conllu", "en") \
.setInputCols(["sentence", "pos_tags", "token"]) \
.setOutputCol("dependencies")
ner_model = legal.BertForTokenClassification.pretrained("legner_bert_grants", "en", "legal/models")\
.setInputCols("token", "sentence")\
.setOutputCol("ner")\
.setCaseSensitive(True)
ner_converter = nlp.NerConverter() \
.setInputCols(["sentence","token","ner"]) \
.setOutputCol("ner_chunk")
re_filter = legal.RENerChunksFilter()\
.setInputCols(["ner_chunk", "dependencies"])\
.setOutputCol("re_ner_chunks")\
.setMaxSyntacticDistance(10)\
.setRelationPairs(['PERMISSION_SUBJECT-PERMISSION_INDIRECT_OBJECT','PERMISSION_INDIRECT_OBJECT-PERMISSION'])
reDL = legal.RelationExtractionDLModel.pretrained("legre_grants_md", "en", "legal/models") \
.setPredictionThreshold(0.9) \
.setInputCols(["re_ner_chunks", "sentence"]) \
.setOutputCol("relations")
pipeline = nlp.Pipeline(stages=[documentAssembler,sentencizer, tokenizer,pos_tagger,dependency_parser, ner_model, ner_converter,re_filter, reDL])
text = """Appointment Subject to payment of the Annual Minimum Commitment ("AMC" - defined herein), Diversinet hereby grants to Reseller an exclusive, non- transferable and non-assignable right to market, sell, and sub-license those Diversinet products listed in Schedule 2 (the "Products") within the territory listed in Schedule 3 (the "Territory") to Canadian headquartered companies, and governmental and broader public sector entities located in Canada. """
data = spark.createDataFrame([[text]]).toDF("text")
model = pipeline.fit(data)
res = model.transform(data)
Results
+-------------+--------------------------+-------------+-----------+----------+--------------------------+-------------+-----------+----------------------------------------------------------------------------+----------+
|relation |entity1 |entity1_begin|entity1_end|chunk1 |entity2 |entity2_begin|entity2_end|chunk2 |confidence|
+-------------+--------------------------+-------------+-----------+----------+--------------------------+-------------+-----------+----------------------------------------------------------------------------+----------+
|allows |PERMISSION_SUBJECT |92 |101 |Diversinet|PERMISSION_INDIRECT_OBJECT|120 |127 |Reseller |0.99999297|
|is_allowed_to|PERMISSION_INDIRECT_OBJECT|120 |127 |Reseller |PERMISSION |132 |145 |exclusive, non |0.9999945 |
|is_allowed_to|PERMISSION_INDIRECT_OBJECT|120 |127 |Reseller |PERMISSION |148 |223 |transferable and non-assignable right to market, sell, and sub-license those|0.99987125|
+-------------+--------------------------+-------------+-----------+----------+--------------------------+-------------+-----------+----------------------------------------------------------------------------+----------+
Model Information
Model Name: | legre_grants_md |
Compatibility: | Legal NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Language: | en |
Size: | 402.2 MB |
References
Manual annotations on CUAD dataset
Benchmarking
label Recall Precision F1 Support
allows 1.000 1.000 1.000 32
is_allowed_to 1.000 1.000 1.000 36
other 1.000 1.000 1.000 32
Avg 1.000 1.000 1.000 -
Weighted-Avg 1.000 1.000 1.000 -