Description
This is an Assertion Status Model aimed to detect temporality (PRESENT, PAST, FUTURE) or Certainty (POSSIBLE) in your financial documents
Predicted Entities
PRESENT, PAST, FUTURE, POSSIBLE
How to use
document_assembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
tokenizer = nlp.Tokenizer()\
.setInputCols(["document"])\
.setOutputCol("token")
embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base","en") \
.setInputCols(["document", "token"]) \
.setOutputCol("embeddings")
ner = finance.BertForTokenClassification.pretrained("finner_bert_roles","en","finance/models")\
.setInputCols("token", "document")\
.setOutputCol("ner")\
.setCaseSensitive(True)
chunk_converter = nlp.NerConverter() \
.setInputCols(["document", "token", "ner"]) \
.setOutputCol("ner_chunk")
assertion = finance.AssertionDLModel.pretrained("finassertion_time", "en", "finance/models")\
.setInputCols(["document", "ner_chunk", "embeddings"]) \
.setOutputCol("assertion")
nlpPipeline = nlp.Pipeline(stages=[
document_assembler,
tokenizer,
embeddings,
ner,
chunk_converter,
assertion
])
empty_data = spark.createDataFrame([[""]]).toDF("text")
model = nlpPipeline.fit(empty_data)
lp = nlp.LightPipeline(model)
texts = ["John Crawford will be hired by Atlantic Inc as CTO"]
lp.annotate(texts)
Results
chunk,begin,end,entity_type,assertion
CTO,47,49,ROLE,FUTURE
Model Information
| Model Name: | finassertion_time |
| Compatibility: | Finance NLP 1.0.0+ |
| License: | Licensed |
| Edition: | Official |
| Input Labels: | [document, doc_chunk, embeddings] |
| Output Labels: | [assertion] |
| Language: | en |
| Size: | 2.2 MB |
References
In-house annotations on financial and legal corpora
Benchmarking
label tp fp fn prec rec f1
PRESENT 201 11 16 0.9481132 0.92626727 0.937063
POSSIBLE 171 3 6 0.98275864 0.9661017 0.9743589
FUTURE 119 6 4 0.952 0.96747965 0.95967746
PAST 270 16 10 0.9440559 0.96428573 0.9540636
Macro-average 761 36 36 0.9567319 0.9560336 0.95638263
Micro-average 761 36 36 0.9548306 0.9548306 0.9548306