Legal Multilabel Classifier on Online Terms of Service

Description

This is the Multi-Label Text Classification model that can be used to identify potentially unfair clauses in online Terms of Service. The classes are as follows:

 - Arbitration 
 - Choice_of_law
 - Content_removal
 - Jurisdiction
 - Limitation_of_liability
 - Other
 - Unilateral_change
 - Unilateral_termination

Predicted Entities

Arbitration, Choice_of_law, Content_removal, Jurisdiction, Limitation_of_liability, Other, Unilateral_change, Unilateral_termination

Download Copy S3 URI

How to use

document_assembler = nlp.DocumentAssembler() \
        .setInputCol('text')\
        .setOutputCol('document')

tokenizer = nlp.Tokenizer() \
        .setInputCols(['document'])\
        .setOutputCol('token')

embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base", "en") \
        .setInputCols(['document', 'token'])\
        .setOutputCol("embeddings")

embeddingsSentence = nlp.SentenceEmbeddings() \
        .setInputCols(['document', 'embeddings'])\
        .setOutputCol('sentence_embeddings')\
        .setPoolingStrategy('AVERAGE')

classifierdl = nlp.MultiClassifierDLModel.pretrained('legmulticlf_online_terms_of_service_english', 'en', 'legal/models')
         .setInputCols(["sentence_embeddings"])\
         .setOutputCol("class")
  
clf_pipeline = nlp.Pipeline(stages=[document_assembler, 
                                    tokenizer, 
                                    embeddings, 
                                    embeddingsSentence, 
                                    classifierdl])

df = spark.createDataFrame([["We are not responsible or liable for (and have no obligation to verify) any wrong or misspelled email address or inaccurate or wrong (mobile) phone number or credit card number."]]).toDF("text")

model = clf_pipeline.fit(df)
result = model.transform(df)

result.select("text", "class.result").show(truncate=False)

Results

+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------+
|sentence                                                                                                                                                                         |result                   |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------+
|We are not responsible or liable for (and have no obligation to verify) any wrong or misspelled email address or inaccurate or wrong (mobile) phone number or credit card number.|[Limitation_of_liability]|
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------+

Model Information

Model Name: legmulticlf_online_terms_of_service_english
Compatibility: Legal NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence_embeddings]
Output Labels: [class]
Language: en
Size: 13.9 MB

References

Train dataset available here

Benchmarking

label                    precision  recall  f1-score  support 
Arbitration              1.00       0.50    0.67      4       
Choice_of_law            0.67       0.67    0.67      3       
Content_removal          1.00       0.67    0.80      3       
Jurisdiction             0.80       1.00    0.89      4       
Limitation_of_liability  0.73       0.73    0.73      15      
Other                    0.86       0.89    0.88      28      
Unilateral_change        0.86       1.00    0.92      6       
Unilateral_termination   1.00       0.80    0.89      5       
micro-avg                0.84       0.82    0.83      68      
macro-avg                0.86       0.78    0.81      68      
weighted-avg             0.85       0.82    0.83      68      
samples-avg              0.80       0.82    0.81      68