Company Name Normalization (Edgar Database)

Description

This is an Entity Linking / Entity Resolution model, which allows you to map an extracted Company Name from any NER model, to the name used by SEC in Edgar Database. This can come in handy to afterwards use Edgar Chunk Mappers with the output of this resolution, to carry out data augmentation and retrieve additional information stored in Edgar Database about a company. For more information about data augmentation, check Chunk Mapping task in Models Hub.

Predicted Entities

Copy S3 URI

How to use

documentAssembler = nlp.DocumentAssembler()\
      .setInputCol("text")\
      .setOutputCol("ner_chunk")

embeddings = nlp.BGEEmbeddings.pretrained("finance_bge_base_embeddings", "en", "finance/models")\ 
      .setInputCols("ner_chunk") \
      .setOutputCol("sentence_embeddings")
    
resolver = finance.SentenceEntityResolverModel.pretrained("finel_edgar_company_name_fe", "en", "finance/models") \
    .setInputCols(["sentence_embeddings"]) \
    .setOutputCol("normalized")\
    .setDistanceFunction("EUCLIDEAN")

pipelineModel = nlp.Pipeline(
      stages = [
          documentAssembler,
          embeddings,
          resolver
      ])

lp = LightPipeline(pipelineModel)

lp.fullAnnotate("AmeriCann Inc")

Results

|   chunks   |   begin   |   end   |         code          |                                                                                                                                                                                                                                        all_codes                                                                                                                                                                                                                                         |                                                                                                                                                                                                           resolutions                                                                                                                                                                                                           |                                                                                                 all_distances                                                                                                 |
|:----------:|:---------:|:-------:|:---------------------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
|     0      |   CONTACT GOLD  |    0    |    11   |  Contact Gold Corp.  |  [Contact Gold Corp., Contact Minerals Corp., Source Gold Corp., GENERAL GOLD CORP, Gold Alan D, INTERNET GOLD GOLDEN LINES LTD, METALINE CONTACT MINES, GOLD STEPHEN J, AuRico Gold Inc., ISHARES GOLD TRUST, GLOBAL GOLD CORP, Golden Minerals Co, Sprott Physical Gold Trust, FOCUS GOLD Corp, GOLDEN CYCLE GOLD CORP]  |  [Contact Gold Corp., Contact Minerals Corp., Source Gold Corp., GENERAL GOLD CORP, Gold Alan D, INTERNET GOLD GOLDEN LINES LTD, METALINE CONTACT MINES, GOLD STEPHEN J, AuRico Gold Inc., ISHARES GOLD TRUST, GLOBAL GOLD CORP, Golden Minerals Co, Sprott Physical Gold Trust, FOCUS GOLD Corp, GOLDEN CYCLE GOLD CORP]  |  [0.0684, 0.3294, 0.3476, 0.3541, 0.3548, 0.3635, 0.3698, 0.3879, 0.3902, 0.3916, 0.3933, 0.3958, 0.3964, 0.3969, 0.3974]  |

Model Information

Model Name: finel_edgar_company_name_fe
Compatibility: Finance NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence_embeddings]
Output Labels: [original_company_name]
Language: en
Size: 1.2 GB
Case sensitive: false

References

In-house scrapping and postprocessing of SEC Edgar Database