Legal Marketing Document Classifier (EURLEX)

Description

European Union (EU) legislation is published in the EUR-Lex portal. All EU laws are annotated by the EU’s Publications Office with multiple concepts from the EuroVoc thesaurus, a multilingual thesaurus maintained by the Publications Office.

Given a document, the legclf_marketing_bert model, it is a Bert Sentence Embeddings Document Classifier, classifies if the document belongs to the class Marketing or not (Binary Classification) according to EuroVoc labels.

Predicted Entities

Marketing, Other

Download Copy S3 URI

How to use

document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

embeddings = nlp.BertSentenceEmbeddings.pretrained("sent_bert_base_cased", "en")\
    .setInputCols("document")\
    .setOutputCol("sentence_embeddings")

doc_classifier = legal.ClassifierDLModel.pretrained("legclf_marketing_bert", "en", "legal/models")\
    .setInputCols(["sentence_embeddings"])\
    .setOutputCol("category")

nlpPipeline = nlp.Pipeline(stages=[
    document_assembler, 
    embeddings,
    doc_classifier])

df = spark.createDataFrame([["YOUR TEXT HERE"]]).toDF("text")

model = nlpPipeline.fit(df)

result = model.transform(df)

Results

+-------+
|result|
+-------+
|[Marketing]|
|[Other]|
|[Other]|
|[Marketing]|

Model Information

Model Name:	legclf_marketing_bert
Compatibility:	Legal NLP 1.0.0+
License:	Licensed
Edition:	Official
Input Labels:	[sentence_embeddings]
Output Labels:	[class]
Language:	en
Size:	22.1 MB

References

Legal documents, scrapped from the Internet, and classified in-house.

Benchmarking

       label precision recall  f1-score  support
   Marketing      0.85   0.84      0.84      716
       Other      0.82   0.83      0.83      648
    accuracy         -      -      0.84     1364
   macro-avg      0.84   0.84      0.84     1364
weighted-avg      0.84   0.84      0.84     1364

PREVIOUSLegal Maritime And Inland Waterway Transport Document Classifier (EURLEX)

NEXTLegal Means Of Agricultural Production Document Classifier (EURLEX)