Financial Relation Extraction (Tickers)

Description

This model can be used to extract the Ticker of Companies or Product names. A Ticker (stock symbol) is a unique series of letters assigned to a security for trading purposes. For example:

Company: Apple Inc. Ticker: AAPL

Predicted Entities

has_ticker

Copy S3 URI

How to use

document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")
        
sentence_detector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl","xx")\
    .setInputCols(["document"])\
    .setOutputCol("sentence")\
        
tokenizer = nlp.Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base","en") \
    .setInputCols(["sentence", "token"]) \
    .setOutputCol("embeddings")
ner_model_org = finance.NerModel.pretrained("finner_orgs_prods_alias","en","finance/models")\
    .setInputCols(["sentence", "token", "embeddings"])\
    .setOutputCol("ner_org")
ner_converter_org = nlp.NerConverter()\
    .setInputCols(["sentence","token","ner_org"])\
    .setOutputCol("ner_chunk_org")\
    .setWhiteList(['ORG'])

ner_model_ticker = finance.NerModel.pretrained("finner_ticker", "en", "finance/models")\
    .setInputCols(["sentence", "token", "embeddings"])\
    .setOutputCol("ner_ticker")

ner_converter_ticker = nlp.NerConverter() \
    .setInputCols(["sentence", "token", "ner_ticker"]) \
    .setOutputCol("ner_chunk_ticker")

chunk_merger = finance.ChunkMergeApproach()\
    .setInputCols("ner_chunk_ticker", "ner_chunk_org")\
    .setOutputCol('ner_chunk')\
    .setMergeOverlapping(True)

pos = nlp.PerceptronModel.pretrained("pos_anc", 'en')\
     .setInputCols("sentence", "token")\
     .setOutputCol("pos")
    
dependency_parser = nlp.DependencyParserModel().pretrained("dependency_conllu", "en")\
    .setInputCols(["sentence", "pos", "token"])\
    .setOutputCol("dependencies")

re_ner_chunk_filter = finance.RENerChunksFilter()\
    .setInputCols(["ner_chunk", "dependencies"])\
    .setOutputCol("re_ner_chunk")\
    .setRelationPairs(["ORG-TICKER"])\
    .setMaxSyntacticDistance(4)

re_Model = finance.RelationExtractionDLModel.pretrained("finre_has_ticker", "en", "finance/models")\
    .setInputCols(["ner_chunk", "sentence"])\
    .setOutputCol("relations")\
    .setPredictionThreshold(0.2)

pipeline = nlp.Pipeline(stages=[
    document_assembler, 
    sentence_detector,
    tokenizer,
    embeddings,
    ner_model_org,
    ner_converter_org,
    ner_model_ticker,
    ner_converter_ticker,
    chunk_merger,
    pos,
    dependency_parser,
    re_ner_chunk_filter,
    re_Model])

empty_df = spark.createDataFrame([['']]).toDF("text")
re_model = pipeline.fit(empty_df)
text="""'MTH - Meritage Homes Corporation Reports Disappointing Revenue. RECN, Resources Connection Inc. Shareholder Raymond James Trust Has Decreased Holding'"""
light_model = nlp.LightPipeline(re_model)
light_model.fullAnnotate(text)

Results

|   relation | entity1 | entity1_begin | entity1_end | chunk1 | entity2 | entity2_begin | entity2_end |                     chunk2 | confidence |
|-----------:|--------:|--------------:|------------:|-------:|--------:|--------------:|------------:|---------------------------:|-----------:|
| has_ticker |  TICKER |             0 |           2 |    MTH |     ORG |             6 |          31 | Meritage Homes Corporation | 0.99532026 |
| has_ticker |  TICKER |            64 |          67 |   RECN |     ORG |            70 |          93 |   Resources Connection Inc | 0.97409964 |

Model Information

Model Name: finre_has_ticker
Compatibility: Finance NLP 1.0.0+
License: Licensed
Edition: Official
Language: en
Size: 409.9 MB

References

Manual annotations on tweets

Benchmarking

label          Recall Precision  F1     Support     
has_ticker     0.717  0.827      0.768    60 
Avg.           0.717  0.827      0.768    -  
Weighted-Avg.  0.717  0.827      0.768    -