Description
This is a Financial Chunk Mapper which will retrieve, given a ticker, extra information about the company, including:
- Company Name
- Stock Exchange
- Section
- Sic codes
- Section
- Industry
- Category
- Currency
- Location
- Previous names (first_name)
- Company type (INC, CORP, etc)
- and some more.
Predicted Entities
How to use
document_assembler = nlp.DocumentAssembler()\
.setInputCol('text')\
.setOutputCol('document')
tokenizer = nlp.Tokenizer()\
.setInputCols("document")\
.setOutputCol("token")
embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base","en") \
.setInputCols(["document", "token"]) \
.setOutputCol("embeddings")
ner_model = finance.NerModel.pretrained('finner_orgs_prods_alias', 'en', 'finance/models')\
.setInputCols(["document", "token", "embeddings"])\
.setOutputCol("ner")
ner_converter = nlp.NerConverter()\
.setInputCols(["document", "token", "ner"])\
.setOutputCol("ner_chunk")
# Optional: To normalize the ORG name using NASDAQ data before the mapping
##########################################################################
chunkToDoc = nlp.Chunk2Doc()\
.setInputCols("ner_chunk")\
.setOutputCol("ner_chunk_doc")
chunk_embeddings = nlp.UniversalSentenceEncoder.pretrained("tfhub_use_lg", "en")\
.setInputCols(["ner_chunk_doc"])\
.setOutputCol("chunk_embeddings")
use_er_model = finance.SentenceEntityResolverModel.pretrained('finel_nasdaq_data_company_name', 'en', 'finance/models')\
.setInputCols("chunk_embeddings")\
.setOutputCol('normalized')\
.setDistanceFunction("EUCLIDEAN")
##########################################################################
CM = finance.ChunkMapperModel()\
.pretrained('finmapper_nasdaq_data_company_name', 'en', 'finance/models')\
.setInputCols(["normalized"])\
.setOutputCol("mappings") #or ner_chunk without normalization
pipeline = nlp.Pipeline().setStages([document_assembler,
tokenizer,
embeddings,
ner_model,
ner_converter,
chunkToDoc, # Optional for normalization
chunk_embeddings, # Optional for normalization
use_er_model, # Optional for normalization
CM])
text = """GLEASON CORP is a company which ..."""
test_data = spark.createDataFrame([[text]]).toDF("text")
model = pipeline.fit(test_data)
lp = nlp.LightPipeline(model)
lp.fullAnnotate(text)
Results
Row(annotatorType='labeled_dependency', begin=0, end=11, relation='ticker', result='GLE1'...)
Row(annotatorType='labeled_dependency', begin=0, end=11, relation='name', result='GLEASON CORP'...)
Row(annotatorType='labeled_dependency', begin=0, end=11, relation='exchange', result='NYSE'...)
Row(annotatorType='labeled_dependency', begin=0, end=11, relation='category' result='Domestic Common Stock'...)
Model Information
Model Name: | finmapper_nasdaq_data_company_name |
Compatibility: | Finance NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [ner_chunk] |
Output Labels: | [mappings] |
Language: | en |
Size: | 989.1 KB |
References
NASDAQ Database