Description
This is a Multilabel classification model trained on different news scrapped from the internet and in-house annotations and label grouping. As this model is Multilabel, you can get as an output of a financial new, an array of 0 (no classes detected), 1(one class) or N (n classes detected).
The available classes are:
- acq: Acquisition / Purchase operations
- finance: Generic financial news
- fuel: News about fuel and energy sources
- jobs: News about jobs, employment rates, etc.
- livestock: News about animales and livestock
- mineral: News about mineral as copper, gold, silver, coal, etc.
- plant: News about greens, plants, cereals, etc
- trade: Trading news
Predicted Entities
acq
, finance
, fuel
, jobs
, livestock
, mineral
, plant
, trade
How to use
documentAssembler = nlp.DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document") \
.setCleanupMode("shrink")
embeddings = nlp.UniversalSentenceEncoder.pretrained() \
.setInputCols("document") \
.setOutputCol("embeddings")
docClassifier = nlp.MultiClassifierDLModel.pretrained("finmulticlf_news", "en","finance/models")\
.setInputCols("embeddings") \
.setOutputCol("category")
pipeline = nlp.Pipeline() \
.setStages(
[
documentAssembler,
embeddings,
docClassifier
]
)
empty_data = spark.createDataFrame([[""]]).toDF("text")
pipelineModel = pipeline.fit(empty_data)
text = ["""
ECUADOR HAS TRADE SURPLUS IN FIRST FOUR MONTHS Ecuador posted a trade surplus of 10.6 mln dlrs in the first four months of 1987 compared with a surplus of 271.7 mln in the same period in 1986, the central bank of Ecuador said in its latest monthly report. Ecuador suspended sales of crude oil, its principal export product, in March after an earthquake destroyed part of its oil-producing infrastructure. Exports in the first four months of 1987 were around 639 mln dlrs and imports 628.3 mln, compared with 771 mln and 500 mln respectively in the same period last year. Exports of crude and products in the first four months were around 256.1 mln dlrs, compared with 403.3 mln in the same period in 1986. The central bank said that between January and May Ecuador sold 16.1 mln barrels of crude and 2.3 mln barrels of products, compared with 32 mln and 2.7 mln respectively in the same period last year. Ecuador's international reserves at the end of May were around 120.9 mln dlrs, compared with 118.6 mln at the end of April and 141.3 mln at the end of May 1986, the central bank said. gold reserves were 165.7 mln dlrs at the end of May compared with 124.3 mln at the end of April.
"""]
lmodel = LightPipeline(pipelineModel)
results = lmodel.annotate(text)
Results
['finance', 'trade']
Model Information
Model Name: | finmulticlf_news |
Compatibility: | Finance NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [embeddings] |
Output Labels: | [category] |
Language: | en |
Size: | 12.9 MB |
References
News scrapped from the Internet and manual in-house annotations
Benchmarking
label precision recall f1-score support
acq 0.94 0.92 0.93 718
finance 0.95 0.96 0.96 1499
fuel 0.91 0.86 0.88 286
jobs 0.86 0.57 0.69 21
livestock 0.93 0.44 0.60 57
mineral 0.87 0.62 0.72 121
plant 0.89 0.88 0.89 301
trade 0.79 0.72 0.75 113
micro-avg 0.93 0.90 0.92 3116
macro-avg 0.89 0.75 0.80 3116
weighted-avg 0.93 0.90 0.91 3116
samples-avg 0.91 0.91 0.91 3116