Company Name Normalization to Edgar Database

Description

This is an Entity Linking / Entity Resolution model, which allows you to map an extracted Company Name from any NER model, to the name used by SEC in Edgar Database. This can come in handy to afterwards use Edgar Chunk Mappers with the output of this resolution, to carry out data augmentation and retrieve additional information stored in Edgar Database about a company. For more information about data augmentation, check Chunk Mapping task in Models Hub.

Predicted Entities

Live Demo Copy S3 URI

How to use

documentAssembler = nlp.DocumentAssembler()\
      .setInputCol("text")\
      .setOutputCol("ner_chunk")

embeddings = nlp.UniversalSentenceEncoder.pretrained("tfhub_use", "en") \
      .setInputCols("ner_chunk") \
      .setOutputCol("sentence_embeddings")
    
resolver = legal.SentenceEntityResolverModel.pretrained("legel_edgar_company_name", "en", "legal/models")\
      .setInputCols(["ner_chunk", "sentence_embeddings"]) \
      .setOutputCol("irs_code")\
      .setDistanceFunction("EUCLIDEAN")

pipelineModel = nlp..Pipeline(
      stages = [
          documentAssembler,
          embeddings,
          resolver])

lp = LightPipeline(pipelineModel)

lp.fullAnnotate("CONTACT GOLD")

Results

+--------------+----------+---------------------------------------------------------+--------------------------------------------------------------------------------------------+-------------------------------------------+
|        chunk |    code  |                                               all_codes |                                                                                resolutions |                             all_distances |
+--------------+----------+---------------------------------------------------------+--------------------------------------------------------------------------------------------+-------------------------------------------+
| CONTACT GOLD | 981369960| [981369960, 271989147, 208531222, 273566922, 270348508] |[Contact Gold Corp, Guskin Gold Corp, Yinfu Gold Corp, MAGELLAN GOLD Corp, Star Gold Corp]  |  [0.1733, 0.3700, 0.3867, 0.4103, 0.4121] |
+--------------+----------+---------------------------------------------------------+--------------------------------------------------------------------------------------------+-------------------------------------------+

Model Information

Model Name: legel_edgar_company_name
Type: legal
Compatibility: Legal NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence_embeddings]
Output Labels: [original_company_name]
Language: en
Size: 315.1 MB
Case sensitive: false

References

In-house scrapping and postprocessing of SEC Edgar Database