Description
This model is trained to carry out a Zero-Shot Named Entity Recognition (NER) approach, detecting any kind of entities with no training dataset, just tje pretrained RoBERTa embeddings (included in the model) and some examples.
Predicted Entities
How to use
documentAssembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sparktokenizer = nlp.Tokenizer()\
.setInputCols("document")\
.setOutputCol("token")
zero_shot_ner = legal.ZeroShotNerModel.pretrained("legner_roberta_zeroshot", "en", "legal/models")\
.setInputCols(["document", "token"])\
.setOutputCol("zero_shot_ner")\
.setEntityDefinitions(
{
"DATE": ['When was the company acquisition?', 'When was the company purchase agreement?', "When was the agreement?"],
"ORG": ["Which company?"],
"STATE": ["Which state?"],
"AGREEMENT": ["What kind of agreement?"],
"LICENSE": ["What kind of license?"],
"LICENSE_RECIPIENT": ["To whom the license is granted?"]
})
nerconverter = nlp.NerConverter()\
.setInputCols(["document", "token", "zero_shot_ner"])\
.setOutputCol("ner_chunk")
pipeline = nlp.Pipeline(stages=[
documentAssembler,
sparktokenizer,
zero_shot_ner,
nerconverter,
]
)
sample_text = ["In March 2012, as part of a longer-term strategy, the Company acquired Vertro, Inc., which owned and operated the ALOT product portfolio.",
"In February 2017, the Company entered into an asset purchase agreement with NetSeer, Inc.",
"This INTELLECTUAL PROPERTY AGREEMENT, dated as of December 31, 2018 (the 'Effective Date') is entered into by and between Armstrong Flooring, Inc., a Delaware corporation ('Seller') and AFI Licensing LLC, a Delaware company('Licensing')"
"The Company hereby grants to Seller a perpetual, non- exclusive, royalty-free license"]
p_model = pipeline.fit(spark.createDataFrame([[""]]).toDF("text"))
res = p_model.transform(spark.createDataFrame(sample_text, StringType()).toDF("text"))
res.select(F.explode(F.arrays_zip(res.ner_chunk.result, res.ner_chunk.begin, res.ner_chunk.end, res.ner_chunk.metadata)).alias("cols")) \
.select(F.expr("cols['0']").alias("chunk"),
F.expr("cols['3']['entity']").alias("ner_label"))\
.filter("ner_label!='O'")\
.show(truncate=False)
Results
+---------------------------------------+-----------------+
|chunk |ner_label |
+---------------------------------------+-----------------+
|March 2012 |DATE |
|Vertro, Inc |ORG |
|February 2017 |DATE |
|asset purchase agreement |AGREEMENT |
|NetSeer |ORG |
|INTELLECTUAL PROPERTY AGREEMENT |AGREEMENT |
|December 31, 2018 |DATE |
|Armstrong Flooring |ORG |
|Delaware |STATE |
|AFI Licensing LLC |ORG |
|Delaware |ORG |
|Seller |LICENSE_RECIPIENT|
|perpetual, non- exclusive, royalty-free|LICENSE |
+---------------------------------------+-----------------+
Model Information
Model Name: | legner_roberta_zeroshot |
Type: | legal |
Compatibility: | Legal NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [document_question, document_context] |
Output Labels: | [answer] |
Language: | en |
Size: | 460.2 MB |
Case sensitive: | true |
Max sentence length: | 512 |
References
Legal Roberta Embeddings
PREVIOUSFinancial Zero-shot NER
NEXTNER Model Finder