Description
The legclf_indenture_agreement_bert
model is a Bert Sentence Embeddings Document Classifier used to classify if the document belongs to the class indenture
or not (Binary Classification).
Unlike the Longformer model, this model is lighter in terms of inference time.
Predicted Entities
indenture
, other
How to use
document_assembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
embeddings = nlp.BertSentenceEmbeddings.pretrained("sent_bert_base_cased", "en")\
.setInputCols("document")\
.setOutputCol("sentence_embeddings")
doc_classifier = legal.ClassifierDLModel.pretrained("legclf_indenture_agreement_bert", "en", "legal/models")\
.setInputCols(["sentence_embeddings"])\
.setOutputCol("category")
nlpPipeline = nlp.Pipeline(stages=[
document_assembler,
embeddings,
doc_classifier])
df = spark.createDataFrame([["YOUR TEXT HERE"]]).toDF("text")
model = nlpPipeline.fit(df)
result = model.transform(df)
Results
+-------+
|result|
+-------+
|[indenture]|
|[other]|
|[other]|
|[indenture]|
Model Information
Model Name: | legclf_indenture_agreement_bert |
Compatibility: | Legal NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence_embeddings] |
Output Labels: | [class] |
Language: | en |
Size: | 22.9 MB |
References
Legal documents, scrapped from the Internet, and classified in-house + SEC documents
Benchmarking
label precision recall f1-score support
indenture 0.96 0.93 0.94 97
other 0.97 0.98 0.97 204
accuracy - - 0.96 301
macro-avg 0.96 0.95 0.96 301
weighted-avg 0.96 0.96 0.96 301