Description
This is the Multi-Label Text Classification model that can be used to identify up to 6 classes to facilitate analysis, discovery and comparison of legal texts related to COVID-19 exception measures. The classes are as follows:
- Closures/lockdown
- Government_oversight
- Restrictions_of_daily_liberties
- Restrictions_of_fundamental_rights_and_civil_liberties
- State_of_Emergency
- Suspension_of_international_cooperation_and_commitments
Predicted Entities
Closures/lockdown
, Government_oversight
, Restrictions_of_daily_liberties
, Restrictions_of_fundamental_rights_and_civil_liberties
, State_of_Emergency
, Suspension_of_international_cooperation_and_commitments
How to use
document_assembler = nlp.DocumentAssembler() \
.setInputCol('text')\
.setOutputCol('document')
tokenizer = nlp.Tokenizer() \
.setInputCols(['document'])\
.setOutputCol('token')
embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base", "en") \
.setInputCols(['document', 'token'])\
.setOutputCol("embeddings")
embeddingsSentence = nlp.SentenceEmbeddings() \
.setInputCols(['document', 'embeddings'])\
.setOutputCol('sentence_embeddings')\
.setPoolingStrategy('AVERAGE')
classifierdl = nlp.MultiClassifierDLModel.pretrained("legmulticlf_covid19_exceptions_english", "en", "legal/models") \
.setInputCols(["sentence_embeddings"])\
.setOutputCol("class")
clf_pipeline = nlp.Pipeline(stages=[document_assembler,
tokenizer,
embeddings,
embeddingsSentence,
classifierdl])
df = spark.createDataFrame([["First, we must protect the NHS’s ability to cope. We must be confident that we are able to provide sufficient critical care and specialist treatment right across the UK. The NHS staff have been incredible. We must continue to support them as much as we can."]]).toDF("text")
model = clf_pipeline.fit(df)
result = model.transform(df)
result.select("text", "class.result").show(truncate=False)
Results
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+
|text |result |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+
|First, we must protect the NHS’s ability to cope. We must be confident that we are able to provide sufficient critical care and specialist treatment right across the UK. The NHS staff have been incredible. We must continue to support them as much as we can.|[Government_oversight]|
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+
Model Information
Model Name: | legmulticlf_covid19_exceptions_english |
Compatibility: | Legal NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence_embeddings] |
Output Labels: | [class] |
Language: | en |
Size: | 13.9 MB |
References
Train dataset available here
Benchmarking
label precision recall f1-score support
Closures/lockdown 1.00 0.60 0.75 10
Government_oversight 0.88 1.00 0.94 22
Restrictions_of_daily_liberties 0.83 0.95 0.89 21
Restrictions_of_fundamental_rights_and_civil_liberties 1.00 0.88 0.93 8
State_of_Emergency 1.00 0.89 0.94 28
Suspension_of_international_cooperation_and_commitments 1.00 1.00 1.00 2
micro-avg 0.92 0.90 0.91 91
macro-avg 0.95 0.89 0.91 91
weighted-avg 0.93 0.90 0.91 91
samples-avg 0.91 0.91 0.91 91