Normalize Parent Companies Names using Wikidata

Description

This is an Entity Resolution model, aimed to normalize a previously extracted ORG entity, using its reference name in WIkidata. This is useful to then use finel_wiki_parentorgs Chunk Mapping model and get information of the subsidiaries, countries, stock exchange, etc.

It also retrieves the TICKER, which can be retrieved from aux_label column in metadata.

Predicted Entities

Copy S3 URI

How to use

documentAssembler = nlp.DocumentAssembler()\
      .setInputCol("text")\
      .setOutputCol("ner_chunk")

embeddings = nlp.UniversalSentenceEncoder.pretrained("tfhub_use", "en") \
      .setInputCols("ner_chunk") \
      .setOutputCol("sentence_embeddings")
    
resolver = finance.SentenceEntityResolverModel.pretrained("finel_wiki_parentorgs", "en", "finance/models")\
      .setInputCols(["sentence_embeddings"]) \
      .setOutputCol("normalized_name")\
      .setDistanceFunction("EUCLIDEAN")

pipelineModel = nlp.Pipeline(
      stages = [
          documentAssembler,
          embeddings,
          resolver
      ])

lp = nlp.LightPipeline(pipelineModel)
test_pred = lp.fullAnnotate('ALPHABET')
print(test_pred[0]['normalized_name'][0].result)
print(test_pred[0]['normalized_name'][0].metadata['all_k_aux_labels'].split(':::')[0])

Results

Alphabet Inc.
Aux data: GOOGL

Model Information

Model Name: finel_wiki_parentorgs
Compatibility: Finance NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence_embeddings]
Output Labels: [original_company_name]
Language: en
Size: 2.8 MB
Case sensitive: false

References

Wikidata dump about company holdings using SparQL