Financial Relation Extraction (Alias)

Description

This model can be used to extract Aliases of Companies or Product names. An “Alias” is a named used in a document to refer to the original name of a company or product. Examples:

John Snow Labs, also known as JSL
John Snow Labs (“JSL”)
etc

Predicted Entities

has_alias, has_collective_alias

Download Copy S3 URI

How to use

documentAssembler = nlp.DocumentAssembler()\
        .setInputCol("text")\
        .setOutputCol("document")

tokenizer = nlp.Tokenizer()\
        .setInputCols(["document"])\
        .setOutputCol("token")

embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base","en") \
        .setInputCols(["document", "token"]) \
        .setOutputCol("embeddings")

ner_model = finance.NerModel.pretrained("finner_orgs_prods_alias", "en", "finance/models")\
        .setInputCols(["document", "token", "embeddings"])\
        .setOutputCol("ner")

ner_converter = nlp.NerConverter()\
        .setInputCols(["document","token","ner"])\
        .setOutputCol("ner_chunk")

reDL = finance.RelationExtractionDLModel()\
    .pretrained("finre_org_prod_alias", "en", "finance/models")\
    .setPredictionThreshold(0.1)\
    .setInputCols(["ner_chunk", "document"])\
    .setOutputCol("relations")

nlpPipeline = nlp.Pipeline(stages=[
        documentAssembler,
        tokenizer,
        embeddings,
        ner_model,
        ner_converter,
        reDL])

empty_data = spark.createDataFrame([[""]]).toDF("text")

model = nlpPipeline.fit(empty_data)

text='''
On March 12, 2020 we closed a Loan and Security Agreement with Hitachi Capital America Corp. ("Hitachi") the terms of which are described in this report which replaced our credit facility with Western Alliance Bank.
'''

lmodel = LightPipeline(model)
lmodel.fullAnnotate(text)

Results

relation	entity1	entity1_begin	entity1_end	chunk1	entity2	entity2_begin	entity2_end	chunk2	confidence
has_alias	ORG	64	92	Hitachi Capital America Corp.	ALIAS	96	102	Hitachi	0.9983972

Model Information

Model Name:	finre_org_prod_alias
Type:	finance
Compatibility:	Finance NLP 1.0.0+
License:	Licensed
Edition:	Official
Language:	en
Size:	409.9 MB

References

Manual annotations on CUAD dataset and 10K filings

Benchmarking

label                    Recall    Precision    F1        Support
has_alias                0.920     1.000        0.958     50
has_collective_alias     1.000     0.750        0.857      6
no_rel                   1.000     0.957        0.978     44
Avg.                     0.973     0.902        0.931      -
Weighted-Avg.            0.960     0.966        0.961      -

PREVIOUSFinancial ORG, PRODUCT and ALIAS NER (Small)

NEXTLegal ORG, PRODUCT and ALIAS NER (small)