Description
This is a Text Cassification model, which can help you identify if a model is an Earning Call
, a Broker Report
, a 10K filing
or something else.
Predicted Entities
earning_call
, broker_report
, 10k
, other
How to use
documentAssembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")
embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_cased", "en") \
.setInputCols("document") \
.setOutputCol("sentence_embeddings")
docClassifier = finance.ClassifierDLModel.pretrained("finclf_earning_broker_10k", "en", "finance/models")\
.setInputCols(["sentence_embeddings"])\
.setOutputCol("label") \
nlpPipeline = nlp.Pipeline(stages=[
documentAssembler,
embeddings,
docClassifier])
text = """Varun Beverages
Investors are advised to refer through important disclosures made at the last page of the Research Report.
Motilal Oswal research is available on www.motilaloswal.com/Institutional -Equities, Bloomberg, Thomson Reuters, Factset and S&P Capital. Research Analyst: Sumant Kumar (Sumant.Kumar@MotilalOswal.com)
Research Analyst: Meet Jain (Meet.Jain@ Motilal Oswal.com) / Omkar Shintre (Omkar.Shintre @Motilal Oswal.com)"""
sdf = spark.createDataFrame([[text]]).toDF("text")
fit = nlpPipeline.fit(sdf)
res = fit.transform(sdf)
res = res.select('label.result')
Results
[broker_report]
Model Information
Model Name: | finclf_earning_broker_10k |
Compatibility: | Finance NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence_embeddings] |
Output Labels: | [label] |
Language: | en |
Size: | 22.8 MB |
References
- Scrapped broker reports, earning calls, and 10K filings from the internet
- Other financial documents
Benchmarking
label precision recall f1-score support
10k 1.00 1.00 1.00 17
broker_report 1.00 1.00 1.00 18
earning_call 1.00 1.00 1.00 19
other 1.00 1.00 1.00 98
accuracy - - 1.00 152
macro-avg 1.00 1.00 1.00 152
weighted-avg 1.00 1.00 1.00 152