Description
This is a NER model aimed to be used in applicable_law
clauses to retrieve entities as APPLIC_LAW
. Make sure you run this model only on applicable_law
clauses after you filter them using legclf_applicable_law_cuad
model.
Predicted Entities
APPLIC_LAW
How to use
documentAssembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentenceDetector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl","xx")\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = nlp.Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_embeddings_legal_roberta_base", "en") \
.setInputCols("sentence", "token") \
.setOutputCol("embeddings")
ner_model = legal.NerModel.pretrained("legner_applicable_law_clause", "en", "legal/models")\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")
ner_converter = nlp.NerConverter()\
.setInputCols(["sentence","token","ner"])\
.setOutputCol("ner_chunk")
nlpPipeline = nlp.Pipeline(stages=[
documentAssembler,
sentenceDetector,
tokenizer,
embeddings,
ner_model,
ner_converter])
empty_data = spark.createDataFrame([[""]]).toDF("text")
model = nlpPipeline.fit(empty_data)
text = ["""ELECTRAMECCANICA VEHICLES CORP., an entity incorporated under the laws of the Province of British Columbia, Canada, with an address of Suite 102 East 1st Avenue, Vancouver, British Columbia, Canada, V5T 1A4 ("EMV")""" ]
result = model.transform(spark.createDataFrame([text]).toDF("text"))
Results
+----------------------------------------+----------+----------+
|chunk |ner_label |confidence|
+----------------------------------------+----------+----------+
|laws of the Province of British Columbia|APPLIC_LAW|0.95625716|
+----------------------------------------+----------+----------+
Model Information
Model Name: | legner_applicable_law_clause |
Compatibility: | Legal NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Size: | 1.1 MB |
References
In-house dataset
Benchmarking
label precision recall f1-score support
B-APPLIC_LAW 0.90 0.89 0.90 84
I-APPLIC_LAW 0.98 0.93 0.96 425
micro-avg 0.97 0.93 0.95 509
macro-avg 0.94 0.91 0.93 509
weighted-avg 0.97 0.93 0.95 509