Company Name to IRS (Edgar database)

Description

This is an Entity Linking / Entity Resolution model, which allows you to retrieve the IRS number of a company given its name, using SEC Edgar database.

Predicted Entities

Live Demo Copy S3 URI

How to use

documentAssembler = nlp.DocumentAssembler()\
      .setInputCol("text")\
      .setOutputCol("ner_chunk")

embeddings = nlp.UniversalSentenceEncoder.pretrained("tfhub_use", "en") \
      .setInputCols("ner_chunk") \
      .setOutputCol("sentence_embeddings")
    
resolver = finance.SentenceEntityResolverModel.pretrained("finel_edgar_irs", "en", "finance/models")\
      .setInputCols(["ner_chunk", "sentence_embeddings"]) \
      .setOutputCol("irs_code")\
      .setDistanceFunction("EUCLIDEAN")

pipelineModel = nlp.Pipeline(
      stages = [
          documentAssembler,
          embeddings,
          resolver])

lp = LightPipeline(pipelineModel)

lp.fullAnnotate("CONTACT GOLD")

Results

+--------------+-----------+---------------------------------------------------------+--------------------------------------------------------+-------------------------------------------+
|         chunk|     code  |                                                all_codes|                                            resolutions |                              all_distances|
+--------------+-----------+---------------------------------------------------------+--------------------------------------------------------+-------------------------------------------+
| CONTACT GOLD |  981369960| [981369960, 271989147, 208531222, 273566922, 270348508] |[981369960, 271989147, 208531222, 273566922, 270348508] |  [0.1733, 0.3700, 0.3867, 0.4103, 0.4121] |
+--------------+-----------+---------------------------------------------------------+--------------------------------------------------------+-------------------------------------------+

Model Information

Model Name: finel_edgar_irs
Type: finance
Compatibility: Finance NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence_embeddings]
Output Labels: [company_irs_number]
Language: en
Size: 313.8 MB
Case sensitive: false

References

In-house scrapping and postprocessing of SEC Edgar Database