Legal Multilabel Classifier on Covid-19 exceptions

Description

This is the Multi-Label Text Classification model that can be used to identify up to 6 classes to facilitate analysis, discovery and comparison of legal texts related to COVID-19 exception measures. The classes are as follows:

  • Closures/lockdown
  • Government_oversight
  • Restrictions_of_daily_liberties
  • Restrictions_of_fundamental_rights_and_civil_liberties
  • State_of_Emergency
  • Suspension_of_international_cooperation_and_commitments

Predicted Entities

Closures/lockdown, Government_oversight, Restrictions_of_daily_liberties, Restrictions_of_fundamental_rights_and_civil_liberties, State_of_Emergency, Suspension_of_international_cooperation_and_commitments

Download Copy S3 URI

How to use

document_assembler = nlp.DocumentAssembler() \
        .setInputCol('text')\
        .setOutputCol('document')

tokenizer = nlp.Tokenizer() \
        .setInputCols(['document'])\
        .setOutputCol('token')

embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base", "en") \
        .setInputCols(['document', 'token'])\
        .setOutputCol("embeddings")

embeddingsSentence = nlp.SentenceEmbeddings() \
        .setInputCols(['document', 'embeddings'])\
        .setOutputCol('sentence_embeddings')\
        .setPoolingStrategy('AVERAGE')

classifierdl = nlp.MultiClassifierDLModel.pretrained("legmulticlf_covid19_exceptions_english", "en", "legal/models") \
         .setInputCols(["sentence_embeddings"])\
         .setOutputCol("class")
  
clf_pipeline = nlp.Pipeline(stages=[document_assembler, 
                                    tokenizer, 
                                    embeddings, 
                                    embeddingsSentence, 
                                    classifierdl])

df = spark.createDataFrame([["First, we must protect the NHS’s ability to cope. We must be confident that we are able to provide sufficient critical care and specialist treatment right across the UK. The NHS staff have been incredible. We must continue to support them as much as we can."]]).toDF("text")

model = clf_pipeline.fit(df)
result = model.transform(df)

result.select("text", "class.result").show(truncate=False)

Results

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+
|text                                                                                                                                                                                                                                                             |result                |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+
|First, we must protect the NHS’s ability to cope. We must be confident that we are able to provide sufficient critical care and specialist treatment right across the UK. The NHS staff have been incredible. We must continue to support them as much as we can.|[Government_oversight]|
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+

Model Information

Model Name: legmulticlf_covid19_exceptions_english
Compatibility: Legal NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence_embeddings]
Output Labels: [class]
Language: en
Size: 13.9 MB

References

Train dataset available here

Benchmarking

label                                                    precision  recall  f1-score  support 
Closures/lockdown                                        1.00       0.60    0.75      10      
Government_oversight                                     0.88       1.00    0.94      22      
Restrictions_of_daily_liberties                          0.83       0.95    0.89      21      
Restrictions_of_fundamental_rights_and_civil_liberties   1.00       0.88    0.93      8       
State_of_Emergency                                       1.00       0.89    0.94      28      
Suspension_of_international_cooperation_and_commitments  1.00       1.00    1.00      2       
micro-avg                                                0.92       0.90    0.91      91      
macro-avg                                                0.95       0.89    0.91      91      
weighted-avg                                             0.93       0.90    0.91      91      
samples-avg                                              0.91       0.91    0.91      91