Legal NER for Indian Court Documents

Description

This is an NER model trained on Indian court dataset, aimed to extract the following entities from preamble documents.

Predicted Entities

COURT, PETITIONER, RESPONDENT, JUDGE, LAWYER

Copy S3 URI

How to use

document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")\
    .setCleanupMode("shrink")

sentence_detector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl", "en")\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = nlp.Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

embeddings = nlp.BertEmbeddings.pretrained("bert_base_cased", "en")\
    .setInputCols("sentence", "token")\
    .setOutputCol("embeddings")\
    .setMaxSentenceLength(512)\
    .setCaseSensitive(True)

ner_model = legal.NerModel.pretrained("legner_indian_court_preamble", "en", "legal/models")\
    .setInputCols(["sentence", "token", "embeddings"])\
    .setOutputCol("ner")\

ner_converter = nlp.NerConverter()\
    .setInputCols(["sentence", "token", "ner"])\
    .setOutputCol("ner_chunk")

pipeline = nlp.Pipeline(stages=[
    document_assembler,
    sentence_detector,
    tokenizer,
    embeddings,
    ner_model,
    ner_converter   
    ])


data = spark.createDataFrame([["""In The High Court Of Judicature At Madras 

                Dated:  31/05/2006  
                
                
The Hon'Ble Mr. Justice V. Dhanapalan         
                
C.M.A.No.535 of 1998

1. Sahabudeen               
             ...     Claimant/Appellant
                            
Vs


1. R. Selvaraj,

2. The New India Assurance Co.Ltd., 

                        ...     Respondents


Appeal filed under Section 173 of the Motor Vehicles Act to set  aside
the  judgment  and decree dated 25.03.97 passed in Mcop No.5/95 on the file of
the I Additional District Judge-cum-Chief Judicial Magistrate, Coimbatore  and
pass  the  award  of  Rs.3,50,000/-  instead  of  Rs.1,00  ,000/-  towards the
compensation to the petitioner.


For Petitioner :  Mr.  K.Sudarsanam for M/s.  Surithi Associates

For Respondents:  Mr.  Mohd.Fiary Hussain for R1"""]])
                             
result = pipeline.fit(data).transform(data)
val document_assembler = new DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("document")
    .setCleanupMode("shrink")

val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "en")
    .setInputCols(Array("document"))
    .setOutputCol("sentence")

val tokenizer = new Tokenizer()
    .setInputCols(Array("sentence"))
    .setOutputCol("token")

val embeddings = BertEmbeddings.pretrained("bert_base_cased", "en")
    .setInputCols(Array("sentence", "token"))
    .setOutputCol("embeddings")
    .setMaxSentenceLength(512)
    .setCaseSensitive(True)

val ner_model = NerModel.pretrained("legner_indian_court_preamble", "en", "legal/models")
    .setInputCols(Array("sentence", "token", "embeddings"))
    .setOutputCol("ner")

val ner_converter = new NerConverter()
    .setInputCols(Array("sentence", "token", "ner"))
    .setOutputCol("ner_chunk")

val pipeline = new Pipeline().setStages(Array(
    document_assembler,
    sentence_detector,
    tokenizer,
    embeddings,
    ner_model,
    ner_converter))

val data = Seq("""In The High Court Of Judicature At Madras 

                Dated:  31/05/2006  
                
                
The Hon'Ble Mr. Justice V. Dhanapalan         
                
C.M.A.No.535 of 1998

1. Sahabudeen               
             ...     Claimant/Appellant
                            
Vs


1. R. Selvaraj,

2. The New India Assurance Co.Ltd., 

                        ...     Respondents


Appeal filed under Section 173 of the Motor Vehicles Act to set  aside
the  judgment  and decree dated 25.03.97 passed in Mcop No.5/95 on the file of
the I Additional District Judge-cum-Chief Judicial Magistrate, Coimbatore  and
pass  the  award  of  Rs.3,50,000/-  instead  of  Rs.1,00  ,000/-  towards the
compensation to the petitioner.


For Petitioner :  Mr.  K.Sudarsanam for M/s.  Surithi Associates

For Respondents:  Mr.  Mohd.Fiary Hussain for R1""").toDS.toDF("text")
                             
val result = pipeline.fit(data).transform(data)

Results

+----------------------------------+----------+
|chunk                             |label     |
+----------------------------------+----------+
|High Court Of Judicature At Madras|COURT     |
|V. Dhanapalan                     |JUDGE     |
|Sahabudeen                        |PETITIONER|
|Selvaraj                          |RESPONDENT|
|New India Assurance               |RESPONDENT|
|K.Sudarsanam                      |LAWYER    |
|Mohd.Fiary Hussain                |LAWYER    |
+----------------------------------+----------+

Model Information

Model Name: legner_indian_court_preamble
Compatibility: Legal NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en
Size: 16.4 MB

References

Training data is available here.

Benchmarking

label         precision  recall  f1-score  support 
COURT         0.92       0.91    0.91      109     
JUDGE         0.96       0.92    0.94      168     
LAWYER        0.94       0.93    0.94      377     
PETITIONER    0.76       0.77    0.76      269     
RESPONDENT    0.78       0.80    0.79      356     
micro-avg     0.86       0.86    0.86      1279    
macro-avg     0.87       0.86    0.87      1279    
weighted-avg  0.86       0.86    0.86      1279