Description
This models allows you to, given an extracter ORG, retrieve all the parent / subsidiary /companies acquired and/or in the same group than it.
IMPORTANT: This requires an exact match as the name appears in Wikidata. If you are not sure the name is the same, pleas run finmapper_wikipedia_parentcompanies
to normalize the company name first.
Predicted Entities
How to use
documentAssembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentenceDetector = nlp.SentenceDetector()\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = nlp.Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base","en") \
.setInputCols(["sentence", "token"]) \
.setOutputCol("embeddings")
ner_model = finance.NerModel.pretrained('finner_orgs_prods_alias', 'en', 'finance/models')\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")
ner_converter = nlp.NerConverter()\
.setInputCols(["sentence","token","ner"])\
.setOutputCol("ner_chunk")
# Optional: To normalize the ORG name using Wikipedia data before the mapping
##########################################################################
chunkToDoc = nlp.Chunk2Doc()\
.setInputCols("ner_chunk")\
.setOutputCol("ner_chunk_doc")
chunk_embeddings = nlp.UniversalSentenceEncoder.pretrained("tfhub_use", "en") \
.setInputCols("ner_chunk_doc") \
.setOutputCol("sentence_embeddings")
use_er_model = finance.SentenceEntityResolverModel.pretrained("finel_wikipedia_parentcompanies", "en", "finance/models") \
.setInputCols(["ner_chunk_doc", "sentence_embeddings"]) \
.setOutputCol("normalized")\
.setDistanceFunction("EUCLIDEAN")
##########################################################################
cm = finance.ChunkMapperModel()\
.pretrained("finmapper_wikipedia_parentcompanies", "en", "finance/models")\
.setInputCols(["normalized"])\
.setOutputCol("mappings") # or ner_chunk for non normalized versions
nlpPipeline = nlp.Pipeline(stages=[
documentAssembler,
sentenceDetector,
tokenizer,
embeddings,
ner_model,
ner_converter,
chunkToDoc,
chunk_embeddings,
use_er_model,
cm
])
text = ["""Barclays is an American multinational bank which operates worldwide."""]
test_data = spark.createDataFrame([text]).toDF("text")
model = nlpPipeline.fit(test_data)
lp = nlp.LightPipeline(model)
lp.annotate(text)
Results
{'mappings': ['https://www.wikidata.org/entity/Q245343',
'Barclays@en-ca',
'https://www.wikidata.org/prop/direct/P355',
'is_parent_of',
'London Stock Exchange@en',
'BARC',
'בנק ברקליס@he',
'https://www.wikidata.org/entity/Q29488227'],
...
Model Information
Model Name: | finmapper_wikipedia_parentcompanies |
Compatibility: | Finance NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [ner_chunk] |
Output Labels: | [mappings] |
Language: | en |
Size: | 852.6 KB |
References
Wikidata