Company Name Normalization using Nasdaq Stock Screener

Description

This is a Financial Entity Resolver model, trained to obtain normalized versions of Company Names, registered in NASDAQ Stock Screener. You can use this model after extracting a company name using any NER, and you will obtain the official name of the company as per NASDAQ Stock Screener.

After this, you can use finmapper_nasdaq_company_name_stock_screener to augment and obtain more information about a company using NASDAQ Stock Screener, including Ticker, Sector, Country, etc.

Predicted Entities

Download Copy S3 URI

How to use

documentAssembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

tokenizer = nlp.Tokenizer()\
    .setInputCols(["document"])\
    .setOutputCol("token")

embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base","en") \
    .setInputCols(["document", "token"]) \
    .setOutputCol("embeddings")

ner_model = finance.NerModel.pretrained("finner_orgs_prods_alias", "en", "finance/models")\
    .setInputCols(["document", "token", "embeddings"])\
    .setOutputCol("ner")

ner_converter = nlp.NerConverter()\
    .setInputCols(["document","token","ner"])\
    .setOutputCol("ner_chunk")

chunkToDoc = nlp.Chunk2Doc()\
    .setInputCols("ner_chunk")\
    .setOutputCol("ner_chunk_doc")

bge_embeddings = nlp.BGEEmbeddings.pretrained("finance_bge_base_embeddings", "en", "finance/models")\ 
      .setInputCols("ner_chunk_doc") \
      .setOutputCol("sentence_embeddings")
    
fe_er_model = finance.SentenceEntityResolverModel.pretrained("finel_nasdaq_company_name_stock_screener_fe", "en", "finance/models") \
    .setInputCols(["sentence_embeddings"]) \
    .setOutputCol("normalized")\
    .setDistanceFunction("EUCLIDEAN")

nlpPipeline = nlp.Pipeline(stages=[
     documentAssembler,
     tokenizer,
     embeddings,
     ner_model,
     ner_converter,
     chunkToDoc,
     bge_embeddings,
     fe_er_model
])

text = """NIKE is an American multinational corporation that is engaged in the design, development, manufacturing, and worldwide marketing and sales of footwear, apparel, equipment, accessories, and services."""

test_data = spark.createDataFrame([[text]]).toDF("text")

model = nlpPipeline.fit(test_data)

lp = nlp.LightPipeline(model)

result = lp.annotate(text)

result["normalized"]

Results

['Nike Inc. Common Stock']

Model Information

Model Name:	finel_nasdaq_company_name_stock_screener_fe
Compatibility:	Finance NLP 1.0.0+
License:	Licensed
Edition:	Official
Input Labels:	[sentence_embeddings]
Output Labels:	[normalized]
Language:	en
Size:	115.7 MB
Case sensitive:	false

References

https://www.nasdaq.com/market-activity/stocks/screener

PREVIOUSCompany Name Normalization (Edgar Database)

NEXTLegal BGE Embeddings