Resolve Company Names to Tickers using Wikidata

Description

This model helps you retrieve the TICKER of a company using a previously detected ORG entity with NER.

It also retrieves the normalized company name as per Wikidata, which can be retrieved from aux_label column in metadata.

Predicted Entities

Copy S3 URI

How to use

documentAssembler = nlp.DocumentAssembler()\
      .setInputCol("text")\
      .setOutputCol("ner_chunk")

embeddings = nlp.UniversalSentenceEncoder.pretrained("tfhub_use", "en") \
      .setInputCols("ner_chunk") \
      .setOutputCol("sentence_embeddings")
    
resolver = finance.SentenceEntityResolverModel.pretrained("finel_wiki_parentorgs_tickers", "en", "finance/models")\
      .setInputCols(["sentence_embeddings"]) \
      .setOutputCol("normalized_name")\
      .setDistanceFunction("EUCLIDEAN")

pipelineModel = nlp.Pipeline(
      stages = [
          documentAssembler,
          embeddings,
          resolver
      ])

lp = nlp.LightPipeline(pipelineModel)
test_pred = lp.fullAnnotate('Alphabet Incorporated')
print(test_pred[0]['normalized_name'][0].result)
print(test_pred[0]['normalized_name'][0].metadata['all_k_aux_labels'].split(':::')[0])

Results

GOOGL
Aux data: Alphabet Inc.

Model Information

Model Name: finel_wiki_parentorgs_ticker
Compatibility: Finance NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence_embeddings]
Output Labels: [original_company_name]
Language: en
Size: 2.8 MB
Case sensitive: false

References

Wikipedia dump about company subsidiaries