Description
This is an NER model trained on Indian court dataset, aimed to extract the following entities from judgement documents.
Predicted Entities
COURT
, PETITIONER
, RESPONDENT
, JUDGE
, DATE
, ORG
, GPE
, STATUTE
, PROVISION
, PRECEDENT
, CASE_NUMBER
, WITNESS
, OTHER_PERSON
How to use
document_assembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")\
.setCleanupMode("shrink")
sentence_detector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl", "en")\
.setInputCols(["document"])\
.setOutputCol("sentence")\
tokenizer = nlp.Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
embeddings = nlp.BertEmbeddings.pretrained("bert_base_cased", "en")\
.setInputCols("sentence", "token")\
.setOutputCol("embeddings")\
.setMaxSentenceLength(512)\
.setCaseSensitive(True)
ner_model = legal.NerModel.pretrained("legner_indian_court_judgement", "en", "legal/models")\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")\
ner_converter = nlp.NerConverter()\
.setInputCols(["sentence", "token", "ner"])\
.setOutputCol("ner_chunk")
pipeline = nlp.Pipeline(stages=[
document_assembler,
sentence_detector,
tokenizer,
embeddings,
ner_model,
ner_converter
])
data = spark.createDataFrame([["""Let fresh bailable warrant of Rs.20,000/- (Rupees Twenty Thousand) be issued through Superintendent of Police, Dhar to the respondents No.1 Sikandar and No.2 Aziz for a date to be fixed by the Registry to secure the presence of the respondents No.1 and 2, made returnable within six weeks.
P.K.Jaiswal) Judge
(Jarat Kumar Jain) Judge ns.
W.P.No.1361/2013
14/12/2015
Parties through their Counsel."""]])
result = pipeline.fit(data).transform(data)
val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
.setCleanupMode("shrink")
val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "en")
.setInputCols(Array("document"))
.setOutputCol("sentence")
val tokenizer = new Tokenizer()
.setInputCols(Array("sentence"))
.setOutputCol("token")
val embeddings = BertEmbeddings.pretrained("bert_base_cased", "en")
.setInputCols(Array("sentence", "token"))
.setOutputCol("embeddings")
.setMaxSentenceLength(512)
.setCaseSensitive(True)
val ner_model = NerModel.pretrained("legner_indian_court_judgement", "en", "legal/models")
.setInputCols(Array("sentence", "token", "embeddings"))
.setOutputCol("ner")
val ner_converter = new NerConverter()
.setInputCols(Array("sentence", "token", "ner"))
.setOutputCol("ner_chunk")
val pipeline = new Pipeline().setStages(Array(
document_assembler,
sentence_detector,
tokenizer,
embeddings,
ner_model,
ner_converter))
val data = Seq("""Let fresh bailable warrant of Rs.20,000/- (Rupees Twenty Thousand) be issued through Superintendent of Police, Dhar to the respondents No.1 Sikandar and No.2 Aziz for a date to be fixed by the Registry to secure the presence of the respondents No.1 and 2, made returnable within six weeks.
P.K.Jaiswal) Judge
(Jarat Kumar Jain) Judge ns.
W.P.No.1361/2013
14/12/2015
Parties through their Counsel.""").toDS.toDF("text")
val result = pipeline.fit(data).transform(data)
Results
+----------------+-----------+
|chunk |label |
+----------------+-----------+
|Dhar |GPE |
|Sikandar |RESPONDENT |
|Aziz |RESPONDENT |
|P.K.Jaiswal |JUDGE |
|Jarat Kumar Jain|JUDGE |
|W.P.No.1361/2013|CASE_NUMBER|
|14/12/2015 |DATE |
+----------------+-----------+
Model Information
Model Name: | legner_indian_court_judgement |
Compatibility: | Legal NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Size: | 16.4 MB |
References
Training data is available here.
Benchmarking
label precision recall f1-score support
CASE_NUMBER 0.83 0.80 0.82 112
COURT 0.92 0.94 0.93 140
DATE 0.97 0.97 0.97 204
GPE 0.81 0.75 0.78 95
JUDGE 0.84 0.86 0.85 57
ORG 0.75 0.76 0.76 131
OTHER_PERSON 0.83 0.90 0.86 241
PETITIONER 0.76 0.61 0.68 36
PRECEDENT 0.84 0.84 0.84 127
PROVISION 0.90 0.94 0.92 220
RESPONDENT 0.64 0.70 0.67 23
STATUTE 0.92 0.96 0.94 157
WITNESS 0.93 0.78 0.85 87
micro-avg 0.87 0.87 0.87 1630
macro-avg 0.84 0.83 0.83 1630
weighted-avg 0.87 0.87 0.87 1630