Description
This is a small Assertion Status Model aimed to detect temporality (PRESENT, PAST, FUTURE) or Certainty (POSSIBLE) in your legal documents
Predicted Entities
PRESENT
, PAST
, FUTURE
, POSSIBLE
How to use
# YOUR NER HERE
# ...
embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base","en") \
.setInputCols(["sentence", "token"]) \
.setOutputCol("embeddings")
chunk_converter = nlp.ChunkConverter() \
.setInputCols(["entity"]) \
.setOutputCol("ner_chunk")
assertion = legal.AssertionDLModel.pretrained("legassertion_time", "en", "legal/models")\
.setInputCols(["sentence", "ner_chunk", "embeddings"]) \
.setOutputCol("assertion")
nlpPipeline = nlp.Pipeline(stages=[
documentAssembler,
tokenizer,
embeddings,
ner,
chunk_converter,
assertion
])
empty_data = spark.createDataFrame([[""]]).toDF("text")
model = nlpPipeline.fit(empty_data)
lp = LightPipeline(model)
texts = ["The subsidiaries of Atlantic Inc will participate in a merging operation",
"The Conditions and Warranties of this agreement might be modified"]
lp.annotate(texts)
Results
chunk,begin,end,entity_type,assertion
Atlantic Inc,20,31,ORG,FUTURE
chunk,begin,end,entity_type,assertion
Conditions and Warranties,4,28,DOC,POSSIBLE
Model Information
Model Name: | legassertion_time |
Compatibility: | Legal NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [document, doc_chunk, embeddings] |
Output Labels: | [assertion] |
Language: | en |
Size: | 2.2 MB |
References
In-house annotations on financial and legal corpora
Benchmarking
label tp fp fn prec rec f1
PRESENT 201 11 16 0.9481132 0.9262672 0.937063
POSSIBLE 171 3 6 0.9827586 0.9661017 0.974359
FUTURE 119 6 4 0.952 0.9674796 0.959677
PAST 270 16 10 0.9440559 0.9642857 0.954063
Macro-average 761 36 36 0.9567319 0.9560336 0.9563826
Micro-average 761 36 36 0.9548306 0.9548306 0.9548306