Description
The legclf_agreement_bert
model is a Bert Sentence Embeddings Document Classifier used to classify if the document belongs to the class agreement
or not (Binary Classification).
Unlike the Longformer model, this model is lighter in terms of inference time.
Predicted Entities
agreement
, other
How to use
document_assembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
embeddings = nlp.BertSentenceEmbeddings.pretrained("sent_bert_base_cased", "en")\
.setInputCols("document")\
.setOutputCol("sentence_embeddings")
doc_classifier = legal.ClassifierDLModel.pretrained("legclf_agreement_bert", "en", "legal/models")\
.setInputCols(["sentence_embeddings"])\
.setOutputCol("category")
nlpPipeline = nlp.Pipeline(stages=[
document_assembler,
embeddings,
doc_classifier])
df = spark.createDataFrame([["YOUR TEXT HERE"]]).toDF("text")
model = nlpPipeline.fit(df)
result = model.transform(df)
Results
+-------+
|result|
+-------+
|[agreement]|
|[other]|
|[other]|
|[agreement]|
Model Information
Model Name: | legclf_agreement_bert |
Compatibility: | Legal NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence_embeddings] |
Output Labels: | [class] |
Language: | en |
Size: | 22.9 MB |
References
Legal documents, scrapped from the Internet, and classified in-house + SEC documents
Benchmarking
label precision recall f1-score support
agreement 0.75 0.71 0.73 90
other 0.88 0.90 0.89 204
accuracy - - 0.84 294
macro-avg 0.81 0.80 0.81 294
weighted-avg 0.84 0.84 0.84 294