Description
This is an Entity Linking / Entity Resolution model, which allows you to map an extracted Company Name from any NER model, to the name used by SEC in Edgar Database. This can come in handy to afterwards use Edgar Chunk Mappers with the output of this resolution, to carry out data augmentation and retrieve additional information stored in Edgar Database about a company. For more information about data augmentation, check Chunk Mapping
task in Models Hub.
Predicted Entities
How to use
documentAssembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("ner_chunk")
embeddings = nlp.UniversalSentenceEncoder.pretrained("tfhub_use", "en") \
.setInputCols("ner_chunk") \
.setOutputCol("sentence_embeddings")
resolver = legal.SentenceEntityResolverModel.pretrained("legel_edgar_company_name", "en", "legal/models")\
.setInputCols(["ner_chunk", "sentence_embeddings"]) \
.setOutputCol("irs_code")\
.setDistanceFunction("EUCLIDEAN")
pipelineModel = nlp..Pipeline(
stages = [
documentAssembler,
embeddings,
resolver])
lp = LightPipeline(pipelineModel)
lp.fullAnnotate("CONTACT GOLD")
Results
+--------------+----------+---------------------------------------------------------+--------------------------------------------------------------------------------------------+-------------------------------------------+
| chunk | code | all_codes | resolutions | all_distances |
+--------------+----------+---------------------------------------------------------+--------------------------------------------------------------------------------------------+-------------------------------------------+
| CONTACT GOLD | 981369960| [981369960, 271989147, 208531222, 273566922, 270348508] |[Contact Gold Corp, Guskin Gold Corp, Yinfu Gold Corp, MAGELLAN GOLD Corp, Star Gold Corp] | [0.1733, 0.3700, 0.3867, 0.4103, 0.4121] |
+--------------+----------+---------------------------------------------------------+--------------------------------------------------------------------------------------------+-------------------------------------------+
Model Information
Model Name: | legel_edgar_company_name |
Type: | legal |
Compatibility: | Legal NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence_embeddings] |
Output Labels: | [original_company_name] |
Language: | en |
Size: | 315.1 MB |
Case sensitive: | false |
References
In-house scrapping and postprocessing of SEC Edgar Database