Description
This is a Relation Extraction model aimed to be used in notice clauses, to retrieve relations between entities as NOTICE_PARTY, ADDRESS, EMAIL, TITLE etc. Make sure you run this model only on the NER entities in notice clauses, after you filter them using legclf_notice_clause
Predicted Entities
has_notice_party
, has_address
, has_person
, has_phone
, has_fax
, has_title
, has_email
, has_department
How to use
document_assembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")\
sentence_detector = nlp.SentenceDetectorDLModel.pretrained()\
.setInputCols("document")\
.setOutputCol("sentence")\
.setCustomBounds(["\n\n"])\
.setUseCustomBoundsOnly(True)
tokenizer = nlp.Tokenizer()\
.setInputCols("sentence")\
.setOutputCol("token")
pos_tagger = nlp.PerceptronModel.pretrained()\
.setInputCols(["sentence", "token"])\
.setOutputCol("pos_tags")
dependency_parser = nlp.DependencyParserModel() \
.pretrained("dependency_conllu", "en") \
.setInputCols(["sentence", "pos_tags", "token"]) \
.setOutputCol("dependencies")
embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_embeddings_legal_roberta_base","en") \
.setInputCols(["sentence", "token"]) \
.setOutputCol("embeddings")
ner_model = legal.NerModel.pretrained('legner_notice_clause', 'en', 'legal/models') \
.setInputCols(["sentence", "token", "embeddings"]) \
.setOutputCol("ner")
ner_converter = nlp.NerConverter() \
.setInputCols(["sentence","token","ner"]) \
.setOutputCol("ner_chunk")
re_filter = legal.RENerChunksFilter()\
.setInputCols(["ner_chunk", "dependencies"])\
.setOutputCol("re_ner_chunks")\
.setMaxSyntacticDistance(12)\
.setRelationPairs(['NAME-NOTICE_PARTY','NAME-ADDRESS','NAME-PERSON', 'NAME-TITLE','NAME-EMAIL','NAME-PHONE', 'NAME-FAX', 'NAME-DEPARTMENT'])
reDL = legal.RelationExtractionDLModel.pretrained("legre_notice_clause_xs", "en", "legal/models") \
.setPredictionThreshold(0.1) \
.setInputCols(["re_ner_chunks", "sentence"]) \
.setOutputCol("relations")
pipeline = nlp.Pipeline(stages=[document_assembler,
sentence_detector,
tokenizer,
pos_tagger,
dependency_parser,
embeddings,
ner_model,
ner_converter,
re_filter,
reDL])
empty_df = spark.createDataFrame([['']]).toDF("text")
re_model = pipeline.fit(empty_df)
light_model = nlp.LightPipeline(re_model)
text = """The addresses for notices shall be: IBM MSL 8501 IBM Drive 200 Baker Avenue Charlotte, NC 28262 Concord, MA 01742 Attn: MSL Project Office Attn: General Counsel Telephone: 704-594-1964 Telephone: 978-287-5630 Facsimile: 704-594-4108 Facsimile: 978-287-5635 Either Party may change its address for this section by giving written notice to the other Party."""
result = light_model.fullAnnotate(text)
Results
| relation | entity1 | entity1_begin | entity1_end | chunk1 | entity2 | entity2_begin | entity2_end | chunk2 | confidence |
|---------------------|------------|------------------|----------------|------------|---------------|------------------|----------------|------------------------------------------------------|---------------|
| has_address | NAME | 36 | 42 | IBM MSL | ADDRESS | 44 | 112 | 8501 IBM Drive 200 Baker Avenue Charlotte, NC ... | 0.9997987 |
| has_notice_party | NAME | 36 | 42 | IBM MSL | DEPARTMENT | 120 | 137 | MSL Project Office | 0.34552842 |
| has_title | NAME | 36 | 42 | IBM MSL | TITLE | 145 | 159 | General Counsel | 0.48349348 |
| has_phone | NAME | 36 | 42 | IBM MSL | PHONE | 173 | 184 | 704-594-1964 | 0.99517375 |
| has_phone | NAME | 36 | 42 | IBM MSL | PHONE | 197 | 208 | 978-287-5630 | 0.9961247 |
| has_fax | NAME | 36 | 42 | IBM MSL | FAX | 221 | 232 | 704-594-4108 | 0.99340916 |
| has_fax | NAME | 36 | 42 | IBM MSL | FAX | 245 | 256 | 978-287-5635 | 0.97187006 |
Model Information
Model Name: | legre_notice_clause_xs |
Compatibility: | Legal NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Language: | en |
Size: | 402.6 MB |
References
In-house dataset
Benchmarking
label Recall Precision F1 Support
has_address 0.976 1.000 0.988 41
has_department 0.667 1.000 0.800 3
has_email 1.000 1.000 1.000 7
has_fax_phone 1.000 1.000 1.000 8
has_notice_party 1.000 0.955 0.977 42
has_person 1.000 0.938 0.968 15
has_title 0.875 0.933 0.903 16
other 1.000 1.000 1.000 68
Avg. 0.940 0.978 0.954 -
Weighted-Avg. 0.980 0.980 0.979 -