Description
This models allows you to identify ORG and PRODUCTS mentioned in the text to be from a competitor. By default, if nothing is mentioned, it returns NO_COMPETITOR
.
Predicted Entities
NO_COMPETITOR
, COMPETITOR
How to use
# Annotator that transforms a text column from dataframe into an Annotation ready for NLP
from johnsnowlabs import *
documentAssembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
# Sentence Detector annotator, processes various sentences per line
sentenceDetector = nlp.SentenceDetector()\
.setInputCols(["document"])\
.setOutputCol("sentence")
# nlp.Tokenizer splits words in a relevant format for NLP
tokenizer = nlp.Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base","en") \
.setInputCols(["sentence", "token"]) \
.setOutputCol("embeddings")
ner_model_org = finance.NerModel.pretrained("finner_orgs_prods_alias", "en", "finance/models")\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")
ner_converter = finance.NerConverterInternal() \
.setInputCols(["sentence", "token", "ner"]) \
.setOutputCol("ner_chunk")\
.setWhiteList(['ORG', 'PRODUCT'])
assertion = finance.AssertionDLModel.pretrained("finassertion_competitors", "en", "finance/models")\
.setInputCols(["sentence", "ner_chunk", "embeddings"]) \
.setOutputCol("assertion")
nlpPipeline = nlp.Pipeline(stages=[
documentAssembler,
sentenceDetector,
tokenizer,
embeddings,
ner_model_org,
ner_converter,
assertion
])
text = "Our competitors include the following by general category: legacy antivirus product providers, such as McAfee LLC and Broadcom Inc."
data = spark.createDataFrame([[text]]).toDF("text")
model = nlpPipeline.fit(data)
model.transform(spark.createDataFrame([[text]]).toDF("text")).select(F.explode(F.arrays_zip('ner_chunk.result', 'assertion.result')).alias('result')).show(truncate=False)
Results
+--------------------------+
|result |
+--------------------------+
|[McAfee LLC, COMPETITOR] |
|[Broadcom Inc, COMPETITOR]|
+--------------------------+
Model Information
Model Name: | finassertion_competitors |
Type: | finance |
Compatibility: | Finance NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [document, doc_chunk, embeddings] |
Output Labels: | [assertion] |
Language: | en |
Size: | 2.2 MB |
References
In-house annotations from 10K Filings
Benchmarking
label tp fp fn prec rec f1
NO_COMPETITOR 158 0 1 1.0 0.9937107 0.9968454
COMPETITOR 25 1 0 0.9615384 1.0 0.9803921
Macro-average 183 1 1 0.9807692 0.9968554 0.9887469
Micro-average 183 1 1 0.9945652 0.9945652 0.9945652
PREVIOUSLegal NER (Signers)