Description
This is a Legal NER Model, aimed to carry out Section Splitting by using the Headers and Subheaders entities, detected in the document.
Other models can be found to detect other parts of the document, as Headers/Subheaders, Signers, “Will-do”, etc.
Predicted Entities
HEADER
, SUBHEADER
How to use
documentAssembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentenceDetector = nlp.SentenceDetector()\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = nlp.Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base","en") \
.setInputCols(["sentence", "token"]) \
.setOutputCol("embeddings")
ner_model = legal.NerModel.pretrained('legner_headers', 'en', 'legal/models')\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")
ner_converter = nlp.NerConverter()\
.setInputCols(["sentence","token","ner"])\
.setOutputCol("ner_chunk")
nlpPipeline = nlp.Pipeline(stages=[
documentAssembler,
sentenceDetector,
tokenizer,
embeddings,
ner_model,
ner_converter])
empty_data = spark.createDataFrame([[""]]).toDF("text")
model = nlpPipeline.fit(empty_data)
text = ["""
2. Definitions. For purposes of this Agreement, the following terms have the meanings ascribed thereto in this Section 1. 2. Appointment as Reseller.
2.1 Appointment. The Company hereby [***]. Allscripts may also disclose Company's pricing information relating to its Merchant Processing Services and facilitate procurement of Merchant Processing Services on behalf of Sublicensed Customers, including, without limitation by references to such pricing information and Merchant Processing Services in Customer Agreements. 6
2.2 Customer Agreements."""]
res = model.transform(spark.createDataFrame([text]).toDF("text"))
Results
+-----------+-----------+
| token| ner_label|
+-----------+-----------+
| 2| B-HEADER|
| .| I-HEADER|
|Definitions| I-HEADER|
| .| O|
| For| O|
| purposes| O|
| of| O|
| this| O|
| Agreement| O|
| ,| O|
| the| O|
| following| O|
| terms| O|
| have| O|
| the| O|
| meanings| O|
| ascribed| O|
| thereto| O|
| in| O|
| this| O|
| Section| O|
| 1|B-SUBHEADER|
| .|I-SUBHEADER|
| 2|I-SUBHEADER|
| .|I-SUBHEADER|
|Appointment| I-HEADER|
| as| I-HEADER|
| Reseller| I-HEADER|
| .| O|
| 2.1|B-SUBHEADER|
|Appointment|I-SUBHEADER|
| .| O|
| The| O|
| Company| O|
| hereby| O|
| [***]| O|
| .| O|
| Allscripts| O|
| may| O|
| also| O|
| disclose| O|
| Company's| O|
| pricing| O|
|information| O|
| relating| O|
| to| O|
| its| O|
| Merchant| O|
| Processing| O|
| Services| O|
| and| O|
| facilitate| O|
|procurement| O|
| of| O|
| Merchant| O|
| Processing| O|
| Services| O|
| on| O|
| behalf| O|
| of| O|
|Sublicensed| O|
| Customers| O|
| ,| O|
| including| O|
| ,| O|
| without| O|
| limitation| O|
| by| O|
| references| O|
| to| O|
| such| O|
| pricing| O|
|information| O|
| and| O|
| Merchant| O|
| Processing| O|
| Services| O|
| in| O|
| Customer| O|
| Agreements| O|
| .| O|
| 6| O|
| 2.2|B-SUBHEADER|
| Customer|I-SUBHEADER|
| Agreements|I-SUBHEADER|
| .| O|
+-----------+-----------+
Model Information
Model Name: | legner_headers |
Type: | legal |
Compatibility: | Legal NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Size: | 16.3 MB |
References
Manual annotations on CUAD dataset
Benchmarking
label tp fp fn prec rec f1
I-HEADER 1486 40 25 0.97378767 0.98345464 0.9785973
B-SUBHEADER 744 16 14 0.97894734 0.98153037 0.9802372
I-SUBHEADER 2382 53 34 0.9782341 0.98592716 0.98206556
B-HEADER 415 4 12 0.9904535 0.97189695 0.9810875
Macro-average 5027 113 85 0.9803556 0.9807023 0.9805289
Micro-average 5027 113 85 0.97801554 0.98337245 0.98068666