Description
Thie NER model was trained on a dataset containing legal definitions extracted from state and federal laws in the United States. The definitions cover a range of legal topics, including criminal law, civil law, and commercial law. Each definition includes the term being defined, the source of the definition (e.g., the specific statute or case), and the definition itself.
Predicted Entities
TERM
How to use
document = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentence = nlp.SentenceDetector()\
.setInputCols(['document'])\
.setOutputCol('sentence')
token = nlp.Tokenizer()\
.setInputCols(['sentence'])\
.setOutputCol('token')
roberta_embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_embeddings_legal_roberta_base","en") \
.setInputCols(["sentence", "token"]) \
.setOutputCol("embeddings") \
.setMaxSentenceLength(512)
loaded_ner_model = legal.NerModel.pretrained("legner_definitions", "en", "legal/models")\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")
converter = nlp.NerConverter()\
.setInputCols(["document", "token", "ner"])\
.setOutputCol("ner_span")
ner_prediction_pipeline = nlp.Pipeline(stages = [
document,
sentence,
token,
roberta_embeddings,
loaded_ner_model,
converter
])
df = spark.createDataFrame([['''This Amendment No . 2 to Securities Purchase Agreement ( this " Amendment " ) , dated this 5th day of January , 2018 , is made by and among InfoSonics Corporation , a Maryland corporation ( the " Company " ) , and each purchaser identified on the signature pages hereto ( the " Purchasers " ) .''']]).toDF("text")
model = ner_prediction_pipeline.fit(df)
result = model.transform(df)
Results
+----------+------+
|chunk |entity|
+----------+------+
|Amendment |Term |
|Company |Term |
|Purchasers|Term |
+----------+------+
Model Information
Model Name: | legner_definitions |
Compatibility: | Legal NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Size: | 16.5 MB |
References
In-house dataset
Benchmarking
label precision recall f1-score support
B-Term 0.93 0.94 0.93 1591
I-Term 0.90 0.93 0.91 1881
micro-avg 0.91 0.93 0.92 3472
macro-avg 0.92 0.93 0.92 3472
weighted-avg 0.91 0.93 0.92 3472