German Legal Judgement Classifier (Medium)

Description

This is a md version of German Legal Judgement Text Classifier written in German legal writing style “Urteilsstil” (judgement style), which will retrieve if a text is either conclusion, definition, other or subsumption .

Predicted Entities

conclusion, definition, subsumption, other

Copy S3 URI

How to use

 
document_assembler = nlp.DocumentAssembler() \
                .setInputCol("text") \
                .setOutputCol("document")
                
tokenizer = nlp.Tokenizer() \
                .setInputCols(["document"]) \
                .setOutputCol("token")
      
classifierdl = legal.BertForSequenceClassification.pretrained("legclf_judgement_medium","de", "legal/models")\
    .setInputCols(["document", "token"])\
    .setOutputCol("label")

bert_clf_pipeline = nlp.Pipeline(stages=[document_assembler,
                                     tokenizer,
                                     classifierdl])

text = ["Insoweit ergibt sich tatsächlich im Ergebnis ein Verzicht der Arbeitnehmer in Höhe der RoSi-Zulage ."]
empty_df = spark.createDataFrame([[""]]).toDF("text")
model = bert_clf_pipeline.fit(empty_df)
res = model.transform(spark.createDataFrame([text]).toDF("text"))

Results

+----------------------------------------------------------------------------------------------------+-------------+
|text                                                                                                |result       |
+----------------------------------------------------------------------------------------------------+-------------+
|Insoweit ergibt sich tatsächlich im Ergebnis ein Verzicht der Arbeitnehmer in Höhe der RoSi-Zulage .|[subsumption]|
+----------------------------------------------------------------------------------------------------+-------------+

Model Information

Model Name: legclf_judgement_medium
Compatibility: Legal NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [document, token]
Output Labels: [class]
Language: de
Size: 409.8 MB
Case sensitive: true
Max sentence length: 128

References

An in-house augmented version of this dataset

Benchmarking

       label  precision    recall  f1-score   support
  conclusion       0.74      0.79      0.76       189
  definition       0.91      0.88      0.90       160
       other       0.85      0.82      0.83       163
 subsumption       0.71      0.70      0.70       159
    accuracy          -         -      0.80       671
   macro-avg       0.80      0.80      0.80       671
weighted-avg       0.80      0.80      0.80       671