Legal ORG, PRODUCT and ALIAS NER (small)

Description

This is a large Named Entity Recognition model, trained with a subset of generic conLL, financial and legal conll, ontonotes and several in-house corpora, to detect Organizations, Products and Aliases of Companies.

Predicted Entities

ORG, PROD, ALIAS

Live Demo Copy S3 URI

How to use

documentAssembler = nlp.DocumentAssembler()\
        .setInputCol("text")\
        .setOutputCol("document")
        
sentenceDetector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl","xx")\
        .setInputCols(["document"])\
        .setOutputCol("sentence")

tokenizer = nlp.Tokenizer()\
        .setInputCols(["sentence"])\
        .setOutputCol("token")

embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base","en") \
        .setInputCols(["sentence", "token"]) \
        .setOutputCol("embeddings")

ner_model = legal.NerModel.pretrained("legner_orgs_prods_alias","en","legal/models")\
        .setInputCols(["sentence", "token", "embeddings"])\
        .setOutputCol("ner")

ner_converter = nlp.NerConverter()\
        .setInputCols(["sentence","token","ner"])\
        .setOutputCol("ner_chunk")

nlpPipeline = nlp.Pipeline(stages=[
        documentAssembler,
        sentenceDetector,
        tokenizer,
        embeddings,
        ner_model,
        ner_converter])

empty_data = spark.createDataFrame([[""]]).toDF("text")

model = nlpPipeline.fit(empty_data)

text = ["""This INTELLECTUAL PROPERTY AGREEMENT (this "Agreement"), dated as of December 31, 2018 (the "Effective Date") is entered into by and between Armstrong Flooring, Inc., a Delaware corporation ("Seller") and AFI Licensing LLC, a Delaware limited liability company ("Licensing" and together with Seller, "Arizona") and AHF Holding, Inc. (formerly known as Tarzan HoldCo, Inc.), a Delaware corporation ("Buyer") and Armstrong Hardwood Flooring Company, a Tennessee corporation (the "Company" and together with Buyer the "Buyer Entities") (each of Arizona on the one hand and the Buyer Entities on the other hand, a "Party" and collectively, the "Parties").
"""]

res = model.transform(spark.createDataFrame([text]).toDF("text"))
res.select(F.explode(F.arrays_zip(res.ner_chunk.result, res.ner_chunk.metadata)).alias("cols")) \
      .select(F.expr("cols['0']").alias("chunk"),
              F.expr("cols['1']['entity']").alias("ner_label"),
              F.expr("cols['1']['confidence']").alias("confidence")).show(truncate=False)

Results

+-----------------------------------+---------+----------+
|chunk                              |ner_label|confidence|
+-----------------------------------+---------+----------+
|Armstrong Flooring, Inc            |ORG      |0.807575  |
|Seller                             |ALIAS    |0.997     |
|AFI Licensing LLC                  |ORG      |0.7076333 |
|Licensing                          |ALIAS    |0.9981    |
|Seller                             |ALIAS    |0.996     |
|Arizona                            |ALIAS    |0.9958    |
|AHF Holding, Inc.                  |ORG      |0.72438   |
|Tarzan HoldCo, Inc                 |ORG      |0.684675  |
|Buyer                              |ALIAS    |0.9983    |
|Armstrong Hardwood Flooring Company|ORG      |0.58274996|
|Company                            |ALIAS    |0.9989    |
|Buyer                              |ALIAS    |0.9979    |
|Buyer Entities                     |ALIAS    |0.98835003|
|Arizona                            |ALIAS    |0.9635    |
|Buyer Entities                     |ALIAS    |0.77565   |
|Party                              |ALIAS    |0.9982    |
+-----------------------------------+---------+----------+

Model Information

Model Name: legner_orgs_prods_alias
Type: legal
Compatibility: Legal NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en
Size: 16.7 MB

References

ConLL-2003, FinSec ConLL, a subset of Ontonotes, In-house corpora

Benchmarking

label          tp        fp      fn      prec          rec           f1
I-ORG          12853	 2621    2685    0.8306191     0.82719785    0.828905
B-PRODUCT      2306      697     932     0.76789874    0.712168      0.7389841
I-ALIAS        14        6       13      0.7           0.5185185     0.59574467
B-ORG          8967      2078    2311    0.81186056    0.79508775    0.80338657
I-PRODUCT      2336      803     1091    0.74418604    0.68164575    0.7115443
B-ALIAS        76        14      22      0.84444445    0.7755102     0.80851066
Macro-average  26552     6219    7054    0.78316814    0.7183547     0.7493626
Micro-average  26552     6219    7054    0.8102285     0.790097      0.80003613