Description
IMPORTANT: Don’t run this model on the whole financial report. Instead:
- Split by paragraphs;
- Use the
finclf_acquisitions_item
Text Classifier to select only these paragraphs;
This model is a md
model, meaning that the directions in the relations are meaningful: chunk1
is the source of the relation, chunk2
is the target.
The aim of this model is to retrieve acquisition or subsidiary relationships between Organizations, included when the acquisition was carried out (“was_acquired”) and by whom (“was_acquired_by”). Subsidiaries are tagged with the relationship “is_subsidiary_of”.
Predicted Entities
was_acquired
, was_acquired_by
, is_subsidiary_of
, other
How to use
documentAssembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentencizer = nlp.SentenceDetectorDLModel\
.pretrained("sentence_detector_dl", "en") \
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = nlp.Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
bert_embeddings= nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base","en")\
.setInputCols(["sentence", "token"])\
.setOutputCol("bert_embeddings")
ner_model_date = finance.NerModel.pretrained("finner_sec_dates", "en", "finance/models")\
.setInputCols(["sentence", "token", "bert_embeddings"])\
.setOutputCol("ner_dates")
ner_converter_date = nlp.NerConverter()\
.setInputCols(["sentence","token","ner_dates"])\
.setOutputCol("ner_chunk_date")
ner_model_org= finance.NerModel.pretrained("finner_orgs_prods_alias", "en", "finance/models")\
.setInputCols(["sentence", "token", "bert_embeddings"])\
.setOutputCol("ner_orgs")
ner_converter_org = nlp.NerConverter()\
.setInputCols(["sentence","token","ner_orgs"])\
.setOutputCol("ner_chunk_org")\
.setWhiteList(['ORG', 'PRODUCT', 'ALIAS'])
chunk_merger = finance.ChunkMergeApproach()\
.setInputCols('ner_chunk_org', "ner_chunk_date")\
.setOutputCol('ner_chunk')
reDL = finance.RelationExtractionDLModel().pretrained('finre_acquisitions_subsidiaries_md', 'en', 'finance/models')\
.setInputCols(["ner_chunk", "sentence"])\
.setOutputCol("relations")
nlpPipeline = nlp.Pipeline(stages=[
documentAssembler,
sentencizer,
tokenizer,
bert_embeddings,
ner_model_date,
ner_converter_date,
ner_model_org,
ner_converter_org,
chunk_merger,
reDL])
empty_data = spark.createDataFrame([[""]]).toDF("text")
model = nlpPipeline.fit(empty_data)
text = "Whatsapp, Inc. was acquired by Meta, Inc"
lmodel = LightPipeline(model)
results = lmodel.fullAnnotate(text)
rel_df = get_relations_df (results)
rel_df = rel_df[rel_df['relation']!='no_rel']
print(rel_df.to_string(index=False))
Results
relation entity1 entity1_begin entity1_end chunk1 entity2 entity2_begin entity2_end chunk2 confidence
was_acquired_by ORG 0 13 Whatsapp, Inc. ORG 31 34 Meta 0.9527305
Model Information
Model Name: | finre_acquisitions_subsidiaries_md |
Compatibility: | Finance NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Language: | en |
Size: | 405.7 MB |
References
In-house annotations on SEC 10K filings and Wikidata
Benchmarking
label Recall Precision F1 Support
is_subsidiary_of 0.583 0.618 0.600 36
other 0.975 0.948 0.961 243
was_acquired 0.836 0.895 0.864 61
was_acquired_by 0.767 0.780 0.773 60
Avg. 0.790 0.810 0.800 406
Weighted-Avg. 0.887 0.885 0.886 406