Legal NER from Sigma Absa Dataset (PER+Pronouns)

Description

This is a Legal NER model trained on the Sigma Absa Dataset for legal sentiment analysis on legal parties, including correference pronouns (he, him, their…). This is the first component which extracts those people names and pronouns and NER.

You have the second component, which does Assertion Status to retrieve sentiment, on legassertion_sigma_absa_sentiment

Predicted Entities

PER, O

Copy S3 URI

How to use

document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentence_detector = nlp.SentenceDetector() \
    .setInputCols(["document"]) \
    .setOutputCol("sentence")

tokenizer = nlp.Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_embeddings_legal_roberta_base","en") \
    .setInputCols(["document", "token"]) \
    .setOutputCol("embeddings")

ner = legal.NerDLModel.pretrained("legner_sigma_absa_people", "en", "legal/models")\
        .setInputCols(["sentence", "token", "embeddings"])\
        .setOutputCol("label")

pipe = nlp.Pipeline(stages = [ document_assembler, sentence_detector, tokenizer, embeddings, ner])

text = "Petitioner Jae Lee moved to the United States from South Korea with his parents when he was 13."

sdf = spark.createDataFrame([[text]]).toDF("text")
res = pipe.fit(sdf).transform(sdf)

import pyspark.sql.functions as F
res.select(F.explode(F.arrays_zip(res.token.result, 
                                     res.label.result, 
                                     res.label.metadata)).alias("cols"))\
                  .select(F.expr("cols['0']").alias("token"),
                          F.expr("cols['1']").alias("ner_label"),
                          F.expr("cols['2']['confidence']").alias("confidence")).show(200, truncate=100)

Results

+----------+---------+----------+
|     token|ner_label|confidence|
+----------+---------+----------+
|Petitioner|    B-PER|    0.9997|
|       Jae|    I-PER|    0.9952|
|       Lee|    I-PER|    0.9951|
|     moved|        O|       1.0|
|        to|        O|       1.0|
|       the|        O|       1.0|
|    United|        O|       1.0|
|    States|        O|       1.0|
|      from|        O|       1.0|
|     South|        O|       1.0|
|     Korea|        O|       1.0|
|      with|        O|       1.0|
|       his|    B-PER|       1.0|
|   parents|        O|    0.9998|
|      when|        O|       1.0|
|        he|    B-PER|       1.0|
|       was|        O|       1.0|
|        13|        O|       1.0|
|         .|        O|       1.0|
+----------+---------+----------+

Model Information

Model Name: legner_sigma_absa_people
Compatibility: Legal NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en
Size: 16.1 MB

References

https://metatext.io/datasets/sigmalaw-absa

Benchmarking


 label          tp   fp  fn  prec        rec         f1         
 I-PER          43   2   0   0.95555556  1.0         0.97727275 
 B-PER          777  11  15  0.9860406   0.9810606   0.98354435 
 Macro-average  820  13  15  0.9707981   0.9905303   0.9805649  
 Micro-average  820  13  15  0.9843938   0.98203593  0.9832135