Description
This is a Multiclass classification model in German which classifies arguments in legal discourse. These are the following classes: subsumption
, definition
, conclusion
, other
.
Predicted Entities
subsumption
, definition
, conclusion
, other
How to use
documentAssembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
tokenizer = nlp.Tokenizer()\
.setInputCols(["document"])\
.setOutputCol("token")
embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_large_german_legal", "de")\
.setInputCols(["document", "token"])\
.setOutputCol("embeddings")\
.setMaxSentenceLength(512)
embeddingsSentence = nlp.SentenceEmbeddings()\
.setInputCols(["document", "embeddings"])\
.setOutputCol("sentence_embeddings")\
.setPoolingStrategy("AVERAGE")\
docClassifier = legal.ClassifierDLModel.pretrained("legclf_argument_mining_de", "de", "legal/models")\
.setInputCols(["sentence_embeddings"])\
.setOutputCol("category")
nlpPipeline = nlp.Pipeline(stages=[
documentAssembler,
tokenizer,
embeddings,
embeddingsSentence,
docClassifier
])
df = spark.createDataFrame([["Folglich liegt eine Verletzung von Artikel 8 der Konvention vor ."]]).toDF("text")
model = nlpPipeline.fit(df)
result = model.transform(df)
result.select("text", "category.result").show(truncate=False)
Results
+-----------------------------------------------------------------+------------+
|text |result |
+-----------------------------------------------------------------+------------+
|Folglich liegt eine Verletzung von Artikel 8 der Konvention vor .|[conclusion]|
+-----------------------------------------------------------------+------------+
Model Information
Model Name: | legclf_argument_mining_german |
Compatibility: | Legal NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence_embeddings] |
Output Labels: | [class] |
Language: | de |
Size: | 24.0 MB |
References
Train dataset available here
Benchmarking
label precision recall f1-score support
conclusion 0.88 0.88 0.88 52
definition 0.83 0.83 0.83 58
other 0.86 0.88 0.87 49
subsumption 0.81 0.80 0.80 64
accuracy - - 0.84 223
macro avg 0.85 0.85 0.85 223
weighted avg 0.84 0.84 0.84 223