Legal Advice Class Identifier

Description

This is a Multiclass classification model which retrieves the topic/class of an informal message from a legal forum, including the following classes: digital, business, insurance, contract, driving, school, family, wills, employment, housing, criminal.

Predicted Entities

digital, business, insurance, contract, driving, school, family, wills, employment, housing

Copy S3 URI

How to use

documentAssembler = nlp.DocumentAssembler() \
    .setInputCols(["text"]) \
    .setOutputCols("document")

tokenizer = nlp.Tokenizer() \
    .setInputCols("document") \
    .setOutputCol("token")

seq_classifier = legal.BertForSequenceClassification.pretrained("legclf_reddit_advice", "en", "legal/models") \
    .setInputCols(["document", "token"]) \
    .setOutputCol("class")
    
pipeline = nlp.Pipeline(stages=[documentAssembler, tokenizer, seq_classifier])

data = spark.createDataFrame([["Mother of my child took my daughter and moved (without notice), won't let me see her or tell me where she is."]]).toDF("text")

result = pipeline.fit(data).transform(data)

Results

+--------+
|  result|
+--------+
|[family]|
+--------+

Model Information

Model Name: legclf_reddit_advice
Compatibility: Legal NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [document, token]
Output Labels: [class]
Language: en
Size: 406.4 MB
Case sensitive: true
Max sentence length: 512

References

Train dataset available here

Benchmarking

label         precision  recall  f1-score  support 
business      0.76       0.67    0.72      239     
contract      0.80       0.68    0.73      207     
criminal      0.82       0.77    0.80      209     
digital       0.76       0.74    0.75      223     
driving       0.86       0.85    0.86      223     
employment    0.76       0.92    0.83      222     
family        0.88       0.95    0.92      216     
housing       0.89       0.95    0.92      221     
insurance     0.83       0.80    0.81      221     
school        0.87       0.91    0.89      207     
wills         0.95       0.96    0.96      199     
accuracy      -          -       0.83      2387    
macro-avg     0.84       0.84    0.83      2387    
weighted-avg  0.83       0.83    0.83      2387