Map Companies to their Acquisitions (wikipedia, en)

Description

This models allows you to, given an extracter ORG, retrieve all the parent / subsidiary /companies acquired and/or in the same group than it.

IMPORTANT: This requires an exact match as the name appears in Wikidata. If you are not sure the name is the same, pleas run finmapper_wikipedia_parentcompanies to normalize the company name first.

Predicted Entities

Copy S3 URI

How to use

documentAssembler = nlp.DocumentAssembler()\
        .setInputCol("text")\
        .setOutputCol("document")
        
sentenceDetector = nlp.SentenceDetector()\
        .setInputCols(["document"])\
        .setOutputCol("sentence")

tokenizer = nlp.Tokenizer()\
        .setInputCols(["sentence"])\
        .setOutputCol("token")

embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base","en") \
        .setInputCols(["sentence", "token"]) \
        .setOutputCol("embeddings")

ner_model = finance.NerModel.pretrained('finner_orgs_prods_alias', 'en', 'finance/models')\
        .setInputCols(["sentence", "token", "embeddings"])\
        .setOutputCol("ner")

ner_converter = nlp.NerConverter()\
        .setInputCols(["sentence","token","ner"])\
        .setOutputCol("ner_chunk")

# Optional: To normalize the ORG name using Wikipedia data before the mapping
##########################################################################
chunkToDoc = nlp.Chunk2Doc()\
        .setInputCols("ner_chunk")\
        .setOutputCol("ner_chunk_doc")

chunk_embeddings = nlp.UniversalSentenceEncoder.pretrained("tfhub_use", "en") \
      .setInputCols("ner_chunk_doc") \
      .setOutputCol("sentence_embeddings")
    
use_er_model = finance.SentenceEntityResolverModel.pretrained("finel_wikipedia_parentcompanies", "en", "finance/models") \
      .setInputCols(["ner_chunk_doc", "sentence_embeddings"]) \
      .setOutputCol("normalized")\
      .setDistanceFunction("EUCLIDEAN")
##########################################################################

cm = finance.ChunkMapperModel()\
      .pretrained("finmapper_wikipedia_parentcompanies", "en", "finance/models")\
      .setInputCols(["normalized"])\ 
      .setOutputCol("mappings") # or ner_chunk for non normalized versions

nlpPipeline = nlp.Pipeline(stages=[
        documentAssembler,
        sentenceDetector,
        tokenizer,
        embeddings,
        ner_model,
        ner_converter,
        chunkToDoc,
        chunk_embeddings,
        use_er_model,
        cm
])

text = ["""Barclays is an American multinational bank which operates worldwide."""]

test_data = spark.createDataFrame([text]).toDF("text")

model = nlpPipeline.fit(test_data)

lp = nlp.LightPipeline(model)

lp.annotate(text)

Results

{'mappings': ['https://www.wikidata.org/entity/Q245343',
   'Barclays@en-ca',
   'https://www.wikidata.org/prop/direct/P355',
   'is_parent_of',
   'London Stock Exchange@en',
   'BARC',
   'בנק ברקליס@he',
   'https://www.wikidata.org/entity/Q29488227'],
...

Model Information

Model Name: finmapper_wikipedia_parentcompanies
Compatibility: Finance NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [ner_chunk]
Output Labels: [mappings]
Language: en
Size: 852.6 KB

References

Wikidata