Description
This is a Legal NER Model, aimed to process the last page of the agreements when information can be found about:
- People Signing the document;
- Title of those people in their companies;
- Company (Party) they represent;
Predicted Entities
SIGNING_TITLE
, SIGNING_PERSON
, PARTY
How to use
documentAssembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentenceDetector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl","xx")\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = nlp.Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base","en")\
.setInputCols(["sentence", "token"])\
.setOutputCol("embeddings")
ner_model = finance.NerModel.pretrained('finner_signers', 'en', 'finance/models')\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")
ner_converter = nlp.NerConverter()\
.setInputCols(["sentence","token","ner"])\
.setOutputCol("ner_chunk")
nlpPipeline = Pipeline(stages=[
documentAssembler,
sentenceDetector,
tokenizer,
embeddings,
ner_model,
ner_converter])
empty_data = spark.createDataFrame([[""]]).toDF("text")
model = nlpPipeline.fit(empty_data)
text = """
VENDOR:
VENDINGDATA CORPORATION, a Nevada corporation
By: /s/ Steven J. Blad
Its: Steven J. Blad CEO
DISTRIBUTOR:
TECHNICAL CASINO SUPPLIES LTD, an English company
By: /s/ David K. Heap
Its: David K. Heap Chief Executive Officer
-15-"""
res = model.transform(spark.createDataFrame([[text]]).toDF("text"))
Results
+-----------+----------------+
| token| ner_label|
+-----------+----------------+
| VENDOR| O|
| :| O|
|VENDINGDATA| B-PARTY|
|CORPORATION| I-PARTY|
| ,| I-PARTY|
| a| O|
| Nevada| O|
|corporation| O|
| By| O|
| :| O|
| /s/| O|
| Steven|B-SIGNING_PERSON|
| J|I-SIGNING_PERSON|
| .|I-SIGNING_PERSON|
| Blad|I-SIGNING_PERSON|
| Its| O|
| :| O|
| Steven|B-SIGNING_PERSON|
| J|I-SIGNING_PERSON|
| .|I-SIGNING_PERSON|
| Blad|I-SIGNING_PERSON|
| CEO| B-SIGNING_TITLE|
|DISTRIBUTOR| O|
| :| O|
| TECHNICAL| B-PARTY|
| CASINO| I-PARTY|
| SUPPLIES| I-PARTY|
| LTD| I-PARTY|
| ,| I-PARTY|
| an| O|
| English| O|
| company| O|
| By| O|
| :| O|
| /s/| O|
| David|B-SIGNING_PERSON|
| K|I-SIGNING_PERSON|
| .|I-SIGNING_PERSON|
| Heap|I-SIGNING_PERSON|
| Its| O|
| :| O|
| David|B-SIGNING_PERSON|
| K|I-SIGNING_PERSON|
| .|I-SIGNING_PERSON|
| Heap|I-SIGNING_PERSON|
| Chief| B-SIGNING_TITLE|
| Executive| I-SIGNING_TITLE|
| Officer| I-SIGNING_TITLE|
| -| O|
| 15| O|
| -| O|
+-----------+----------------+
Model Information
Model Name: | finner_signers |
Compatibility: | Finance NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Size: | 16.4 MB |
References
Manual annotations on CUAD dataset and data augmentation
Benchmarking
label tp fp fn prec rec f1
I-PARTY 366 26 39 0.93367344 0.9037037 0.91844416
I-SIGNING_TITLE 41 0 4 1.0 0.9111111 0.95348835
I-SIGNING_PERSON 115 10 13 0.92 0.8984375 0.9090909
B-SIGNING_PERSON 46 3 11 0.93877554 0.80701756 0.8679246
B-PARTY 122 14 28 0.89705884 0.81333333 0.85314685
B-SIGNING_TITLE 26 0 2 1.0 0.9285714 0.9629629
Macro-average 716 53 97 0.9482513 0.8770291 0.91125065
Micro-average 716 53 97 0.9310793 0.8806888 0.9051833