Financial Relation Extraction (Work Experience, Sm, Bidirectional)

Description

IMPORTANT: Don’t run this model on the whole financial report. Instead:

  • Split by paragraphs. You can use notebook 1 in Finance or Legal as inspiration;
  • Use the finclf_work_experience_item Text Classifier to select only these paragraphs;

This model allows you to analyzed present and past job positions of people, extracting relations between PERSON, ORG, ROLE and DATE. This model requires an NER with the mentioned entities, as finner_org_per_role and can also be combined with finassertiondl_past_roles to detect if the entities are mentioned to have happened in the PAST or not (although you can also infer that from the relations as had_role_until).

This model is a sm model without meaningful directions in the relations (the model was not trained to understand if the direction of the relation is from left to right or right to left). There are bigger models in Models Hub trained also with directed relationships.

Predicted Entities

has_role, had_role_until, has_role_from, works_for, has_role_in_company

Live Demo Copy S3 URI

How to use

document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentence_detector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl","en")\
    .setInputCols(["document"])\
    .setOutputCol("sentence")\

tokenizer = nlp.Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base","en") \
    .setInputCols(["sentence", "token"]) \
    .setOutputCol("embeddings")

ner_model = finance.NerModel.pretrained('finner_org_per_role_date', 'en', 'finance/models')\
    .setInputCols(["sentence", "token", "embeddings"])\
    .setOutputCol("ner")

ner_converter = nlp.NerConverter()\
    .setInputCols(["sentence","token","ner"])\
    .setOutputCol("ner_chunk")

pos = nlp.PerceptronModel.pretrained()\
    .setInputCols(["sentence", "token"])\
    .setOutputCol("pos")

dependency_parser = nlp.DependencyParserModel().pretrained("dependency_conllu", "en")\
    .setInputCols(["sentence", "pos", "token"])\
    .setOutputCol("dependencies")

re_ner_chunk_filter = finance.RENerChunksFilter()\
    .setInputCols(["ner_chunk", "dependencies"])\
    .setOutputCol("re_ner_chunk")\
    .setRelationPairs(["PERSON-ROLE, ORG-ROLE, DATE-ROLE, PERSON-ORG"])\
    .setMaxSyntacticDistance(5)

re_Model = finance.RelationExtractionDLModel.pretrained("finre_work_experience", "en", "finance/models")\
    .setInputCols(["re_ner_chunk", "sentence"])\
    .setOutputCol("relations")\
    .setPredictionThreshold(0.5)

pipeline = nlp.Pipeline(stages=[
    document_assembler, 
    sentence_detector,
    tokenizer,
    embeddings,
    ner_model,
    ner_converter,
    pos,
    dependency_parser,
    re_ner_chunk_filter,
    re_Model
])

empty_df = spark.createDataFrame([['']]).toDF("text")

re_model = pipeline.fit(empty_df)

light_model = LightPipeline(re_model)

text_list = ["""We have experienced significant changes in our senior management team over the past several years, including the appointments of Mark Schmitz as our Executive Vice President and Chief Operating Officer in 2019.""",
             """In January 2019, Jose Cil was assigned the CEO of Restaurant Brands International, and Daniel Schwartz was assigned the Executive Chairman of the company.""",
             ]

results = light_model.fullAnnotate(text_list)

Results

has_role	    PERSON	129	140	Mark Schmitz	ROLE	149	172	Executive Vice President	0.8707728
has_role	    PERSON	129	140	Mark Schmitz	ROLE	178	200	Chief Operating Officer	0.97559035
has_role_from	ROLE	149	172	Executive Vice President	DATE	205	208	2019	0.9327241
has_role_from	ROLE	178	200	Chief Operating Officer	DATE	205	208	2019	0.90718126
has_role_from	DATE	3	14	January 2019	ROLE	43	45	CEO	0.996639
has_role_from	DATE	3	14	January 2019	ROLE	120	137	Executive Chairman	0.9964874
has_role	    PERSON	17	24	Jose Cil	ROLE	43	45	CEO	0.8917691
has_role	    PERSON	17	24	Jose Cil	ROLE	120	137	Executive Chairman	0.8527716
has_role	    ROLE	43	45	CEO	PERSON	87	101	Daniel Schwartz	0.5765097
has_role	    PERSON	87	101	Daniel Schwartz	ROLE	120	137	Executive Chairman	0.79235893

Model Information

Model Name: finre_work_experience
Compatibility: Finance NLP 1.0.0+
License: Licensed
Edition: Official
Language: en
Size: 409.9 MB

References

Manual annotations on CUAD dataset, 10K filings and Wikidata

Benchmarking

 label                Recall   Precision  F1       Support
 had_role_until       0.972    0.972      0.972    36  
 has_role             0.986    0.980      0.983    146 
 has_role_from        0.983    0.983      0.983    58  
 has_role_in_company  0.954    0.969      0.961    65  
 works_for            0.933    0.933      0.933    15  
 Avg.                 0.966    0.967      0.966     -   
 Weighted-Avg.        0.975    0.975      0.975     -