Description
This is an Entity Resolution model, aimed to normalize a previously extracted ORG entity, using its reference name in WIkidata. This is useful to then use finel_wiki_parentorgs
Chunk Mapping model and get information of the subsidiaries, countries, stock exchange, etc.
It also retrieves the TICKER, which can be retrieved from aux_label
column in metadata.
Predicted Entities
How to use
documentAssembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("ner_chunk")
embeddings = nlp.UniversalSentenceEncoder.pretrained("tfhub_use", "en") \
.setInputCols("ner_chunk") \
.setOutputCol("sentence_embeddings")
resolver = finance.SentenceEntityResolverModel.pretrained("finel_wiki_parentorgs", "en", "finance/models")\
.setInputCols(["sentence_embeddings"]) \
.setOutputCol("normalized_name")\
.setDistanceFunction("EUCLIDEAN")
pipelineModel = nlp.Pipeline(
stages = [
documentAssembler,
embeddings,
resolver
])
lp = nlp.LightPipeline(pipelineModel)
test_pred = lp.fullAnnotate('ALPHABET')
print(test_pred[0]['normalized_name'][0].result)
print(test_pred[0]['normalized_name'][0].metadata['all_k_aux_labels'].split(':::')[0])
Results
Alphabet Inc.
Aux data: GOOGL
Model Information
Model Name: | finel_wiki_parentorgs |
Compatibility: | Finance NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence_embeddings] |
Output Labels: | [original_company_name] |
Language: | en |
Size: | 2.8 MB |
Case sensitive: | false |
References
Wikidata dump about company holdings using SparQL