Resolve Company Names to Tickers using Wikidata

Description

This model helps you retrieve the TICKER of a company using a previously detected ORG entity with NER.

It also retrieves the normalized company name as per Wikidata, which can be retrieved from aux_label column in metadata.

Predicted Entities

Download Copy S3 URI

How to use

documentAssembler = nlp.DocumentAssembler()\
      .setInputCol("text")\
      .setOutputCol("ner_chunk")

embeddings = nlp.UniversalSentenceEncoder.pretrained("tfhub_use", "en") \
      .setInputCols("ner_chunk") \
      .setOutputCol("sentence_embeddings")
    
resolver = finance.SentenceEntityResolverModel.pretrained("finel_wiki_parentorgs_tickers", "en", "finance/models")\
      .setInputCols(["sentence_embeddings"]) \
      .setOutputCol("normalized_name")\
      .setDistanceFunction("EUCLIDEAN")

pipelineModel = nlp.Pipeline(
      stages = [
          documentAssembler,
          embeddings,
          resolver
      ])

lp = nlp.LightPipeline(pipelineModel)
test_pred = lp.fullAnnotate('Alphabet Incorporated')
print(test_pred[0]['normalized_name'][0].result)
print(test_pred[0]['normalized_name'][0].metadata['all_k_aux_labels'].split(':::')[0])

Results

GOOGL
Aux data: Alphabet Inc.

Model Information

Model Name:	finel_wiki_parentorgs_ticker
Compatibility:	Finance NLP 1.0.0+
License:	Licensed
Edition:	Official
Input Labels:	[sentence_embeddings]
Output Labels:	[original_company_name]
Language:	en
Size:	2.8 MB
Case sensitive:	false

References

Wikipedia dump about company subsidiaries

PREVIOUSNormalize Parent Companies Names using Wikidata

NEXTDispute Clause Binary Classifier