Description
The legclf_nda_agreements
model is a Bert Sentence Embeddings Document Classifier used to classify if the document belongs to the class nda
or not (Binary Classification).
Unlike the Longformer model, this model is lighter in terms of inference time.
Predicted Entities
nda
, other
How to use
document_assembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
embeddings = nlp.BertSentenceEmbeddings.pretrained("sent_bert_base_cased", "en")\
.setInputCols("document")\
.setOutputCol("sentence_embeddings")
doc_classifier = legal.ClassifierDLModel.pretrained("legclf_nda_agreements", "en", "legal/models")\
.setInputCols(["sentence_embeddings"])\
.setOutputCol("category")
nlpPipeline = nlp.Pipeline(stages=[
document_assembler,
embeddings,
doc_classifier])
df = spark.createDataFrame([["YOUR TEXT HERE"]]).toDF("text")
model = nlpPipeline.fit(df)
result = model.transform(df)
Results
+-------+
|result|
+-------+
|[nda]|
|[other]|
|[other]|
|[nda]|
Model Information
Model Name: | legclf_nda_agreements |
Compatibility: | Legal NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence_embeddings] |
Output Labels: | [category] |
Language: | en |
Size: | 22.9 MB |
References
Legal documents, scrapped from the Internet, and classified in-house + SEC documents
Benchmarking
label precision recall f1-score support
nda 0.93 0.96 0.95 135
other 0.97 0.95 0.96 181
accuracy - - 0.95 316
macro-avg 0.95 0.95 0.95 316
weighted-avg 0.95 0.95 0.95 316