Legal Multilabel Classification on Terms of Service (UNFAIR-ToS)

Description

This is a Multilabel Text Classification model that can help you classify 8 types of unfair contractual terms (sentences), meaning terms that potentially violate user rights according to European consumer law.

Predicted Entities

Arbitration, Choice_of_Law, Content_Removal, Contract_by_Using, Jurisdiction, Limitation_of_Liability, Unilateral_Change, Unilateral_Termination, Other

Download Copy S3 URI

How to use

document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

tokenizer = nlp.Tokenizer()\
    .setInputCols(["document"])\
    .setOutputCol("token")

embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_embeddings_legal_roberta_base", "en")\
    .setInputCols(["document", "token"])\
    .setOutputCol("embeddings")\
    .setMaxSentenceLength(512)

embeddingsSentence = nlp.SentenceEmbeddings()\
    .setInputCols(["document", "embeddings"])\
    .setOutputCol("sentence_embeddings")\
    .setPoolingStrategy("AVERAGE")

docClassifier = nlp.MultiClassifierDLModel().pretrained("legmulticlf_unfair_tos", "en", "legal/models")\
    .setInputCols("sentence_embeddings") \
    .setOutputCol("class")

pipeline = nlp.Pipeline(
    stages=[
        document_assembler,
        tokenizer,
        embeddings,
        embeddingsSentence,
        docClassifier
    ]
)

empty_data = spark.createDataFrame([[""]]).toDF("text")

model = pipeline.fit(empty_data)

light_model = nlp.LightPipeline(model)

result = light_model.annotate("""we may alter, suspend or discontinue any aspect of the service at any time, including the availability of any service feature, database or content.""")

Results

['Unilateral_Change', 'Unilateral_Termination']

Model Information

Model Name:	legmulticlf_unfair_tos
Compatibility:	Legal NLP 1.0.0+
License:	Licensed
Edition:	Official
Input Labels:	[sentence_embeddings]
Output Labels:	[class]
Language:	en
Size:	13.9 MB

References

Legal documents, scrapped from the Internet, and classified in-house.

Benchmarking

label                    precision  recall  f1-score  support 
Arbitration              1.00       0.82    0.90      11      
Choice_of_Law            0.93       0.93    0.93      14      
Content_Removal          0.80       0.57    0.67      21      
Contract_by_Using        0.93       0.82    0.87      17      
Jurisdiction             1.00       1.00    1.00      16      
Limitation_of_Liability  0.81       0.80    0.81      60      
Other                    0.78       0.71    0.75      66      
Unilateral_Change        0.94       0.84    0.89      38      
Unilateral_Termination   0.78       0.81    0.79      36      
micro-avg                0.85       0.79    0.82      279     
macro-avg                0.89       0.81    0.85      279     
weighted-avg             0.85       0.79    0.82      279     
samples-avg              0.78       0.80    0.78      279 

PREVIOUSPipeline to Detect Social Determinants of Health Mentions

NEXTMedical Question Answering (biogpt)