Description
This is an NER model, aimed to detect Stock Exchanges / Stock Market names or abbreviations. It was trained with wikipedia texts about companies.
Predicted Entities
STOCK_EXCHANGE
, O
How to use
documenter = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentencizer = nlp.SentenceDetector()\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = nlp.Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base", "en") \
.setInputCols("sentence", "token") \
.setOutputCol("embeddings")\
.setMaxSentenceLength(512)
chunks = finance.NerConverterInternal()\
.setInputCols(["sentence", "token", "ner"])\
.setOutputCol("ner_chunk")
ner = finance.NerModel().pretrained("finner_wiki_stockexchange", "en", "finance/models")\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")
pipe = nlp.Pipeline(stages=[documenter, sentencizer, tokenizer, embeddings, ner, chunks])
model = pipe.fit(df)
res = model.transform(df)
res.select(F.explode(F.arrays_zip(res.ner_chunk.result, res.ner_chunk.begin, res.ner_chunk.end, res.ner_chunk.metadata)).alias("cols")) \
.select(F.expr("cols['3']['sentence']").alias("sentence_id"),
F.expr("cols['0']").alias("chunk"),
F.expr("cols['2']").alias("end"),
F.expr("cols['3']['entity']").alias("ner_label"))\
.filter("ner_label!='O'")\
.show(truncate=False)
Results
+-----------+------+---+--------------+
|sentence_id|chunk |end|ner_label |
+-----------+------+---+--------------+
|0 |NASDAQ|126|STOCK_EXCHANGE|
+-----------+------+---+--------------+
Model Information
Model Name: | finner_wiki_stockexchange |
Compatibility: | Finance NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Size: | 1.1 MB |
References
Wikipedia
Benchmarking
label tp fp fn prec rec f1
I-STOCK_EXCHANGE 21 0 0 1.0 1.0 1.0
B-STOCK_EXCHANGE 18 1 0 0.94736844 1.0 0.972973
Macro-average 39 1 0 0.9736842 1.0 0.9866667
Micro-average 39 1 0 0.975 1.0 0.98734176