Description
This is a Text Classification model aimed to detect at sentence or paragraph level, if there is a Forward-looking statements (FLS).
FLS are beliefs and opinions about firm’s future events or results, usually present in documents as Financial Reports. Identifying forward-looking statements from corporate reports can assist investors in financial analysis.
This model was trained originally on 3,500 manually annotated sentences from Management Discussion and Analysis section of annual reports of Russell 3000 firms and then finetuned in house by JSL on low-performant examples.
Predicted Entities
Specific FLS
, Non-specific FLS
, Not FLS
How to use
document_assembler = nlp.DocumentAssembler() \
.setInputCol('text') \
.setOutputCol('document')
tokenizer = nlp.Tokenizer() \
.setInputCols(['document']) \
.setOutputCol('token')
sequenceClassifier = finance.BertForSequenceClassification.pretrained("finclf_bert_fls", "en", "finance/models")\
.setInputCols(["document",'token'])\
.setOutputCol("class")
pipeline = nlp.Pipeline(stages=[
document_assembler,
tokenizer,
sequenceClassifier
])
# couple of simple examples
example = spark.createDataFrame([["Global economy will increase during the next year."]]).toDF("text")
result = pipeline.fit(example).transform(example)
# result is a DataFrame
result.select("text", "class.result").show()
Results
+--------------------+--------------+
| text| result|
+--------------------+--------------+
|Global economy wi...|[Specific FLS]|
+--------------------+--------------+
Model Information
Model Name: | finclf_bert_fls |
Type: | finance |
Compatibility: | Finance NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [document, token] |
Output Labels: | [class] |
Language: | en |
Size: | 412.2 MB |
Case sensitive: | true |
Max sentence length: | 512 |
References
In-house annotations on 10K financial reports and reports from Russell 3000 firms
Benchmarking
label precision recall f1-score support
Specific_FLS 0.96 0.93 0.94 311
Non-specific_FLS 0.91 0.94 0.92 215
Not_FLS 0.84 0.87 0.85 70
accuracy - - 0.92 596
macro-avg 0.90 0.91 0.91 596
weighted-avg 0.93 0.92 0.92 596