Description
This is a Text Classification model can help you classify if a paragraph talks about an Effective Date, a Renewal Date, a Termination Date or something else. Don’t confuse this model with the NER model (legner_dates_sm
) which allows you to extract the actual dates from the texts.
Predicted Entities
EFFECTIVE_DATE
, RENEWAL_DATE
, TERMINATION_DATE
, other
How to use
documentAssembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")
embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_cased", "en") \
.setInputCols("document") \
.setOutputCol("sentence_embeddings")
docClassifier = legal.ClassifierDLModel.pretrained('legclf_dates_sm', 'en', 'legal/models')\
.setInputCols(["sentence_embeddings"])\
.setOutputCol("label")
nlpPipeline = nlp.Pipeline(stages=[
documentAssembler,
embeddings,
docClassifier])
text = ["""Renewal Date means January 1, 2018."""]
empty_data = spark.createDataFrame([[""]]).toDF("text")
model = nlpPipeline.fit(empty_data)
res = model.transform(spark.createDataFrame([text]).toDF("text"))
Results
+--------------+
| result|
+--------------+
|[RENEWAL_DATE]|
+--------------+
Model Information
Model Name: | legclf_dates_sm |
Compatibility: | Legal NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence_embeddings] |
Output Labels: | [label] |
Language: | en |
Size: | 22.5 MB |
References
In-house annotations.
Benchmarking
label precision recall f1-score support
EFFECTIVE_DATE 1.00 0.80 0.89 5
RENEWAL_DATE 1.00 1.00 1.00 6
TERMINATION_DATE 0.86 0.75 0.80 8
other 0.91 1.00 0.95 21
accuracy - - 0.93 40
macro-avg 0.94 0.89 0.91 40
weighted-avg 0.93 0.93 0.92 40