Description
This is an Entity Linking / Entity Resolution model, which allows you to retrieve the IRS number of a company given its name, using SEC Edgar database.
Predicted Entities
How to use
documentAssembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("ner_chunk")
embeddings = nlp.UniversalSentenceEncoder.pretrained("tfhub_use", "en") \
.setInputCols("ner_chunk") \
.setOutputCol("sentence_embeddings")
resolver = legal.SentenceEntityResolverModel.pretrained("legel_edgar_irs", "en", "legal/models")\
.setInputCols(["ner_chunk", "sentence_embeddings"]) \
.setOutputCol("irs_code")\
.setDistanceFunction("EUCLIDEAN")
pipelineModel = nlp.Pipeline(
stages = [
documentAssembler,
embeddings,
resolver])
lp = LightPipeline(pipelineModel)
lp.fullAnnotate("CONTACT GOLD")
Results
+--------------+-----------+---------------------------------------------------------+--------------------------------------------------------+-------------------------------------------+
| chunk| code | all_codes| resolutions | all_distances|
+--------------+-----------+---------------------------------------------------------+--------------------------------------------------------+-------------------------------------------+
| CONTACT GOLD | 981369960| [981369960, 271989147, 208531222, 273566922, 270348508] |[981369960, 271989147, 208531222, 273566922, 270348508] | [0.1733, 0.3700, 0.3867, 0.4103, 0.4121] |
+--------------+-----------+---------------------------------------------------------+--------------------------------------------------------+-------------------------------------------+
Model Information
Model Name: | legel_edgar_irs |
Type: | legal |
Compatibility: | Legal NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence_embeddings] |
Output Labels: | [company_irs_number] |
Language: | en |
Size: | 313.8 MB |
Case sensitive: | false |
References
In-house scrapping and postprocessing of SEC Edgar Database