Legal Proposal Classification

Description

Given a proposal on a socially important issue, this model classifies it according to its topic.

Predicted Entities

Democracy, Digital, EU_In_The_World, Economy, Education, Green_Deal, Health, Migration, Other

Copy S3 URI

How to use

document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")
    
sentence_embeddings = nlp.UniversalSentenceEncoder.pretrained()\
    .setInputCols("document")\
    .setOutputCol("sentence_embeddings")

classifier = legal.ClassifierDLModel.pretrained("legclf_proposal_topic", "en", "legal/models")\
    .setInputCols(["sentence_embeddings"])\
    .setOutputCol("class")

clf_pipeline = nlp.Pipeline(stages=[
            document_assembler, 
            sentence_embeddings,
            classifier
            ])

empty_df = spark.createDataFrame([['']]).toDF("text")

model = clf_pipeline.fit(empty_df)

text = ["""In order to involve young people in the European Union, they need to understand the role, importance, and impact of the European Union on their lives and how they can contribute to the EU. I believe that many Europeans do not know the values of Europe, how they can contribute to the EU, etc. To do this, it was necessary to create an education program on the European Union that could cut across all countries, including a discipline on the EU, visits by young people to the European institutions, and a 'channel of communication' between young people and the EU. The same could be done for older people in senior universities."""]

data = spark.createDataFrame([text]).toDF("text")

result = model.transform(data)

Results

+-----------+
|     result|
+-----------+
|[Education]|
+-----------+

Model Information

Model Name: legclf_proposal_topic
Compatibility: Legal NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence_embeddings]
Output Labels: [class]
Language: en
Size: 22.6 MB

References

Training dataset available here

Benchmarking

label            precision  recall  f1-score  support 
Democracy        0.86       0.90    0.88      62      
Digital          0.85       0.80    0.82      35      
EU_In_The_World  0.78       0.72    0.75      39      
Economy          0.82       0.77    0.80      43      
Education        0.89       0.87    0.88      46      
Green_Deal       0.85       0.92    0.88      49      
Health           0.87       0.95    0.91      21      
Migration        0.86       0.89    0.87      27      
Other            1.00       0.97    0.98      32      
accuracy         -          -       0.86      354     
macro-avg        0.86       0.87    0.86      354     
weighted-avg     0.86       0.86    0.86      354