Description
This is an Assertion Status Model aimed to detect temporality (PRESENT, PAST, FUTURE) or Certainty (POSSIBLE) in your financial documents
Predicted Entities
PRESENT
, PAST
, FUTURE
, POSSIBLE
How to use
document_assembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
tokenizer = nlp.Tokenizer()\
.setInputCols(["document"])\
.setOutputCol("token")
embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base","en") \
.setInputCols(["document", "token"]) \
.setOutputCol("embeddings")
ner = finance.BertForTokenClassification.pretrained("finner_bert_roles","en","finance/models")\
.setInputCols("token", "document")\
.setOutputCol("ner")\
.setCaseSensitive(True)
chunk_converter = nlp.NerConverter() \
.setInputCols(["document", "token", "ner"]) \
.setOutputCol("ner_chunk")
assertion = finance.AssertionDLModel.pretrained("finassertion_time", "en", "finance/models")\
.setInputCols(["document", "ner_chunk", "embeddings"]) \
.setOutputCol("assertion")
nlpPipeline = nlp.Pipeline(stages=[
document_assembler,
tokenizer,
embeddings,
ner,
chunk_converter,
assertion
])
empty_data = spark.createDataFrame([[""]]).toDF("text")
model = nlpPipeline.fit(empty_data)
lp = nlp.LightPipeline(model)
texts = ["John Crawford will be hired by Atlantic Inc as CTO"]
lp.annotate(texts)
Results
chunk,begin,end,entity_type,assertion
CTO,47,49,ROLE,FUTURE
Model Information
Model Name: | finassertion_time |
Compatibility: | Finance NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [document, doc_chunk, embeddings] |
Output Labels: | [assertion] |
Language: | en |
Size: | 2.2 MB |
References
In-house annotations on financial and legal corpora
Benchmarking
label tp fp fn prec rec f1
PRESENT 201 11 16 0.9481132 0.92626727 0.937063
POSSIBLE 171 3 6 0.98275864 0.9661017 0.9743589
FUTURE 119 6 4 0.952 0.96747965 0.95967746
PAST 270 16 10 0.9440559 0.96428573 0.9540636
Macro-average 761 36 36 0.9567319 0.9560336 0.95638263
Micro-average 761 36 36 0.9548306 0.9548306 0.9548306