Description
This is a medium (md) Assertion Status Model aimed to detect temporality (PRESENT, PAST, FUTURE) or Certainty (POSSIBLE) in your legal documents, which may improve the results of the legassertion_time
(small) model.
Predicted Entities
PRESENT
, PAST
, FUTURE
, POSSIBLE
How to use
document_assembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentence_detector = nlp.SentenceDetector()\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = nlp.Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_embeddings_legal_roberta_base", "en") \
.setInputCols("sentence", "token") \
.setOutputCol("embeddings_ner")\
ner_model = legal.NerModel.pretrained('legner_contract_doc_parties', 'en', 'legal/models')\
.setInputCols(["sentence", "token", "embeddings_ner"])\
.setOutputCol("ner")
ner_converter = nlp.NerConverter()\
.setInputCols(["sentence", "token", "ner"])\
.setOutputCol("ner_chunk")\
.setWhiteList(["DOC", "EFFDATE", "PARTY"]) # We will check time only on these
assertion = legal.AssertionDLModel.pretrained("legassertion_time_md", "en", "legal/models")\
.setInputCols(["sentence", "ner_chunk", "embeddings "]) \
.setOutputCol("assertion")
nlpPipeline = nlp.Pipeline(stages=[
document_assembler,
sentence_detector,
tokenizer,
embeddings_ner,
ner_model,
ner_converter,
assertion
])
empty_data = spark.createDataFrame([[""]]).toDF("text")
model = nlpPipeline.fit(empty_data)
lp = LightPipeline(model)
texts = ["The subsidiaries of Atlantic Inc will participate in a merging operation",
"The Conditions and Warranties of this agreement might be modified"]
lp.annotate(texts)
Results
chunk,begin,end,entity_type,assertion
Atlantic Inc,20,31,ORG,FUTURE
chunk,begin,end,entity_type,assertion
Conditions and Warranties,4,28,DOC,POSSIBLE
Model Information
Model Name: | legassertion_time_md |
Compatibility: | Legal NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [document, doc_chunk, embeddings] |
Output Labels: | [assertion] |
Language: | en |
Size: | 2.2 MB |
References
In-house annotations augmented with synthetic data.
Benchmarking
label tp fp fn prec rec f1
PRESENT 115 11 5 0.9126984 0.9583333 0.9349593
POSSIBLE 79 5 4 0.9404762 0.9518072 0.9461077
PAST 54 5 11 0.91525424 0.83076924 0.8709678
FUTURE 77 3 4 0.9625 0.9506173 0.95652175
Macro-average 325 24 24 0.9327322 0.9228818 0.92778087
Micro-average 325 24 24 0.9312321 0.9312321 0.9312321