Description
This is a legal NER mode trained on Subpoenas, which is aimed to extract the following entities from a Subpoena. ADDRESS
, MATTER_VS
, APPOINTMENT_HOUR
, DOCUMENT_TOPIC
, DOCUMENT_PERSON
, COURT_ADDRESS
, APPOINTMENT_DATE
, COUNTY
, CASE
, SIGNER
, COURT
, DOCUMENT_DATE_TO
, DOCUMENT_TYPE
, STATE
, DOCUMENT_DATE_FROM
, RECEIVER
, MATTER
, SUBPOENA_DATE
, DOCUMENT_DATE_YEAR
Predicted Entities
ADDRESS
, MATTER_VS
, APPOINTMENT_HOUR
, DOCUMENT_TOPIC
, DOCUMENT_PERSON
, COURT_ADDRESS
, APPOINTMENT_DATE
, COUNTY
, CASE
, SIGNER
, COURT
, DOCUMENT_DATE_TO
, DOCUMENT_TYPE
, STATE
, DOCUMENT_DATE_FROM
, RECEIVER
, MATTER
, SUBPOENA_DATE
, DOCUMENT_DATE_YEAR
How to use
document = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
textSplitter = legal.TextSplitter()\
.setInputCols(['document'])\
.setOutputCol('sentence')
token = nlp.Tokenizer()\
.setInputCols(['sentence'])\
.setOutputCol('token')
roberta_embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_embeddings_legal_roberta_base","en") \
.setInputCols(["sentence", "token"]) \
.setOutputCol("embeddings") \
.setMaxSentenceLength(512)
loaded_ner_model = legal.NerModel.pretrained('legner_subpoena','en','legal/models')\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")
converter = nlp.NerConverter()\
.setInputCols(["document", "token", "ner"])\
.setOutputCol("ner_span")
ner_prediction_pipeline = nlp.Pipeline(stages = [
document,
textSplitter,
token,
roberta_embeddings,
loaded_ner_model,
converter
])
empty_data = spark.createDataFrame([['']]).toDF("text")
prediction_model = ner_prediction_pipeline.fit(empty_data)
text = """ABC Corporation Case Number : 2023-0456-7890 To Whom It May Concern , Please be advised that on behalf of John Doe , we have issued a subpoena to ABC Corporation for the production of financial records . SUBPOENA REPORT STATE : New York COURT : Supreme Court of New York COUNTY : New York County CASE NO : 2023-456789 To : Jane Doe Address : 456 Park Avenue , New York , NY 10022 You are hereby commanded to appear before the Supreme Court of New York on the date and time specified below to give testimony in the above-mentioned case ."""
sample_data = spark.createDataFrame([[text]]).toDF("text")
preds = prediction_model.transform(sample_data)
Results
+-------------------------------------+---------------+
|chunk |entity |
+-------------------------------------+---------------+
|ABC Corporation |MATTER_VS |
|2023-0456-7890 |CASE |
|John Doe |DOCUMENT_PERSON|
|ABC Corporation |DOCUMENT_PERSON|
|financial records |DOCUMENT_TYPE |
|New York |STATE |
|Supreme Court |COURT |
|New York |STATE |
|New York County |COUNTY |
|2023-456789 |CASE |
|Jane Doe |RECEIVER |
|456 Park Avenue , New York , NY 10022|ADDRESS |
|Supreme Court |COURT |
|New York |STATE |
|testimony |DOCUMENT_TYPE |
+-------------------------------------+---------------+
Model Information
Model Name: | legner_subpoena |
Compatibility: | Legal NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Size: | 16.4 MB |
References
In house annotated dataset
Benchmarking
label precision recall f1-score support
B-ADDRESS 0.92 0.80 0.86 61
B-MATTER_VS 0.84 0.84 0.84 49
I-ADDRESS 0.90 0.77 0.83 430
B-APPOINTMENT_HOUR 0.98 1.00 0.99 46
I-DOCUMENT_TOPIC 0.67 0.56 0.61 68
B-DOCUMENT_PERSON 0.86 0.92 0.89 374
B-COURT_ADDRESS 0.63 0.71 0.67 24
I-APPOINTMENT_DATE 1.00 0.90 0.95 149
I-COUNTY 0.88 0.98 0.93 102
B-CASE 0.85 1.00 0.92 41
B-SIGNER 0.84 0.66 0.74 32
I-COURT 0.97 0.91 0.94 85
B-DOCUMENT_DATE_TO 1.00 1.00 1.00 52
B-DOCUMENT_TYPE 0.95 0.93 0.94 699
I-STATE 1.00 0.67 0.80 27
I-DOCUMENT_DATE_FROM 0.98 1.00 0.99 196
B-DOCUMENT_TOPIC 0.83 0.84 0.84 230
I-RECEIVER 0.79 0.93 0.85 82
B-DOCUMENT_DATE_FROM 0.99 1.00 0.99 66
I-APPOINTMENT_HOUR 0.97 1.00 0.98 57
I-SIGNER 0.90 0.65 0.75 40
B-COUNTY 0.92 1.00 0.96 47
B-MATTER 0.89 0.89 0.89 45
I-DOCUMENT_DATE_TO 1.00 1.00 1.00 154
I-SUBPOENA_DATE 0.81 0.91 0.86 148
B-APPOINTMENT_DATE 0.94 0.92 0.93 51
B-COURT 0.97 0.89 0.93 85
I-DOCUMENT_TYPE 0.96 0.86 0.91 243
B-RECEIVER 0.78 0.92 0.84 71
I-MATTER 0.84 0.93 0.88 45
I-MATTER_VS 0.87 0.67 0.75 30
B-DOCUMENT_DATE_YEAR 0.92 1.00 0.96 34
B-STATE 0.91 0.86 0.88 83
B-SUBPOENA_DATE 0.81 0.88 0.84 48
I-COURT_ADDRESS 0.60 0.84 0.70 221
I-DOCUMENT_PERSON 0.90 0.91 0.91 300
micro-avg 0.89 0.89 0.89 4515
macro-avg 0.89 0.88 0.88 4515
weighted-avg 0.89 0.89 0.89 4515