Description
This is an Entity Linking / Entity Resolution model, which allows you to retrieve the IRS number of a company given its name, using SEC Edgar database.
Predicted Entities
How to use
documentAssembler = nlp.DocumentAssembler()\
      .setInputCol("text")\
      .setOutputCol("ner_chunk")
embeddings = nlp.UniversalSentenceEncoder.pretrained("tfhub_use", "en") \
      .setInputCols("ner_chunk") \
      .setOutputCol("sentence_embeddings")
    
resolver = finance.SentenceEntityResolverModel.pretrained("finel_edgar_irs", "en", "finance/models")\
      .setInputCols(["ner_chunk", "sentence_embeddings"]) \
      .setOutputCol("irs_code")\
      .setDistanceFunction("EUCLIDEAN")
pipelineModel = nlp.Pipeline(
      stages = [
          documentAssembler,
          embeddings,
          resolver])
lp = LightPipeline(pipelineModel)
lp.fullAnnotate("CONTACT GOLD")
Results
+--------------+-----------+---------------------------------------------------------+--------------------------------------------------------+-------------------------------------------+
|         chunk|     code  |                                                all_codes|                                            resolutions |                              all_distances|
+--------------+-----------+---------------------------------------------------------+--------------------------------------------------------+-------------------------------------------+
| CONTACT GOLD |  981369960| [981369960, 271989147, 208531222, 273566922, 270348508] |[981369960, 271989147, 208531222, 273566922, 270348508] |  [0.1733, 0.3700, 0.3867, 0.4103, 0.4121] |
+--------------+-----------+---------------------------------------------------------+--------------------------------------------------------+-------------------------------------------+
Model Information
| Model Name: | finel_edgar_irs | 
| Type: | finance | 
| Compatibility: | Finance NLP 1.0.0+ | 
| License: | Licensed | 
| Edition: | Official | 
| Input Labels: | [sentence_embeddings] | 
| Output Labels: | [company_irs_number] | 
| Language: | en | 
| Size: | 313.8 MB | 
| Case sensitive: | false | 
References
In-house scrapping and postprocessing of SEC Edgar Database