NER on Capital Calls (Small)

Description

This is a small capital call NER, trained to extract contact and financial information from Capital Call Notices. These are the entities retrieved by the model:

Financial information:
FUND: Name of the Fund called
ORG: Organization asking the Fund for the Capital
AMOUNT: Amount called by ORG to FUND
DUE_DATE: Due date of the call
ACCOUNT_NAME: Organization's Bank Account Name
ACCOUNT_NUMBER: Organization's Bank Account Number
ABA: Routing Number (ABA)
BANK_ADDRESS: Contact address of the Bank

Contact information:
PHONE: Contact Phone
PERSON: Contact Person
BANK_CONTACT: Person to contact in Bank
EMAIL: Contact Email

Other additional information, not directly involved in the call:
OTHER_PERSON: Other people detected (People signing the call, people to whom is addressed the Notice, etc)
OTHER_PERCENTAGE: Percentages mentiones
OTHER_DATE: Other dates mentioned, not Due Date
OTHER_AMOUNT: Other amounts mentioned
OTHER_ADDRESS: Other addresses mentiones
OTHER_ORG: Other ORG mentiones

Predicted Entities

FUND, ORG, AMOUNT, DUE_DATE, ACCOUNT_NAME, ACCOUNT_NUMBER, BANK_ADDRESS, PHONE, PERSON, BANK_CONTACT, EMAIL, OTHER_PERSON, OTHER_PERCENTAGE, OTHER_DATE, OTHER_AMOUNT, OTHER_ADDRESS, OTHER_ORG, ABA

Live Demo Copy S3 URI

How to use

from pyspark.sql import functions as F

documentAssembler = nlp.DocumentAssembler() \
    .setInputCol("text") \
    .setOutputCol("document")

sentence = nlp.SentenceDetector() \
    .setInputCols(["document"]) \
    .setOutputCol("sentence") 

tokenizer = nlp.Tokenizer() \
    .setInputCols("sentence") \
    .setOutputCol("token")

embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base","en") \
    .setInputCols(["sentence", "token"]) \
    .setOutputCol("embeddings")\
    .setMaxSentenceLength(512)

ner = finance.NerModel.pretrained('finner_capital_calls', 'en', 'finance/models')\
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("ner")

converter = finance.NerConverterInternal()\
    .setInputCols(["sentence", "token", "ner"]) \
    .setOutputCol("ner_chunk")\

pipeline = nlp.Pipeline(stages=[documentAssembler,
                            sentence,
                            tokenizer,
                            embeddings,
                            ner,
                            converter
                            ])

df = spark.createDataFrame([[""]]).toDF("text")

model = pipeline.fit(df)

lp = nlp.LightPipeline(model)


text = """Dear Charlotte R. Davis,

We hope this message finds you well. This is to inform you that a capital call for Upfront Ventures has been initiated. The amount requested is 7000 EUR and is due on 01.01.2024.

Kindly wire transfer the funds to the following account:

Account Green Planet Solutions LLC
Account Number 1234567-1XX
Routing Number 51903761
Bank First Republic Bank

If you require any further information, please do not hesitate to reach out to us at 3055 550818 or coxeric@example.com.

Thank you for your prompt attention to this matter.

Best regards,
James Wilson"""

result = model.transform(spark.createDataFrame([[text]]).toDF("text"))

from pyspark.sql import functions as F


result.select(F.explode(F.arrays_zip(result.ner_chunk.result, result.ner_chunk.metadata)).alias("cols")) \
      .select(F.expr("cols['0']").alias("chunk"),
              F.expr("cols['1']['entity']").alias("ner_label"),
              F.expr("cols['1']['confidence']").alias("confidence")).show(truncate=False)
                
                

Results

+--------------------------+--------------+----------+
|chunk                     |ner_label     |confidence|
+--------------------------+--------------+----------+
|Charlotte R. Davis        |OTHER_PERSON  |0.971875  |
|Upfront Ventures          |FUND          |1.0       |
|7000 EUR                  |AMOUNT        |1.0       |
|01.01.2024                |DUE_DATE      |1.0       |
|Green Planet Solutions LLC|ACCOUNT_NAME  |0.999875  |
|1234567-1XX               |ACCOUNT_NUMBER|1.0       |
|51903761                  |ABA           |1.0       |
|First Republic Bank       |BANK_NAME     |0.9999333 |
|3055 550818               |PHONE         |1.0       |
|coxeric@example.com       |EMAIL         |1.0       |
|James Wilson              |OTHER_PERSON  |1.0       |
+--------------------------+--------------+----------+

Model Information

Model Name: finner_capital_calls
Compatibility: Finance NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en
Size: 16.3 MB

References

In-house capital call notices

Benchmarking

Total test loss: 0.3542	Avg test loss: 0.0295
label	 tp	 fp	 fn	 prec	 rec	 f1
B-PERSON	 47	 0	 0	 1.0	 1.0	 1.0
I-OTHER_PERSON	 214	 0	 0	 1.0	 1.0	 1.0
I-AMOUNT	 127	 0	 0	 1.0	 1.0	 1.0
I-OTHER_PERCENTAGE	 37	 0	 0	 1.0	 1.0	 1.0
B-OTHER_DATE	 25	 0	 0	 1.0	 1.0	 1.0
I-BANK_ADDRESS	 121	 0	 0	 1.0	 1.0	 1.0
B-AMOUNT	 170	 0	 0	 1.0	 1.0	 1.0
B-OTHER_AMOUNT	 409	 0	 0	 1.0	 1.0	 1.0
I-ORG	 311	 18	 0	 0.9452888	 1.0	 0.971875
B-PHONE	 79	 0	 0	 1.0	 1.0	 1.0
I-DUE_DATE	 153	 0	 0	 1.0	 1.0	 1.0
B-FUND	 124	 0	 0	 1.0	 1.0	 1.0
B-ABA	 97	 0	 0	 1.0	 1.0	 1.0
I-ACCOUNT_NAME	 223	 0	 0	 1.0	 1.0	 1.0
I-OTHER_DATE	 25	 0	 0	 1.0	 1.0	 1.0
I-PHONE	 119	 0	 0	 1.0	 1.0	 1.0
B-BANK_ADDRESS	 39	 0	 0	 1.0	 1.0	 1.0
B-OTHER_ORG	 139	 0	 6	 1.0	 0.95862067	 0.97887325
I-OTHER_AMOUNT	 307	 0	 0	 1.0	 1.0	 1.0
I-FUND	 131	 0	 0	 1.0	 1.0	 1.0
I-BANK_NAME	 139	 0	 0	 1.0	 1.0	 1.0
B-EMAIL	 73	 0	 0	 1.0	 1.0	 1.0
I-BANK_CONTACT	 52	 0	 0	 1.0	 1.0	 1.0
B-BANK_CONTACT	 30	 0	 0	 1.0	 1.0	 1.0
B-OTHER_PERSON	 116	 0	 0	 1.0	 1.0	 1.0
B-ACCOUNT_NAME	 97	 0	 0	 1.0	 1.0	 1.0
B-DUE_DATE	 127	 0	 0	 1.0	 1.0	 1.0
B-OTHER_ADDRESS	 11	 0	 0	 1.0	 1.0	 1.0
B-ORG	 147	 6	 0	 0.9607843	 1.0	 0.98
B-BANK_NAME	 113	 0	 0	 1.0	 1.0	 1.0
B-OTHER_PERCENTAGE	 74	 0	 0	 1.0	 1.0	 1.0
I-OTHER_ADDRESS	 38	 0	 0	 1.0	 1.0	 1.0
B-ACCOUNT_NUMBER	 97	 0	 0	 1.0	 1.0	 1.0
I-PERSON	 109	 0	 0	 1.0	 1.0	 1.0
I-OTHER_ORG	 283	 0	 18	 1.0	 0.9401993	 0.969178
I-ACCOUNT_NUMBER	 32	 0	 0	 1.0	 1.0	 1.0
Macro-average 4435  24 24 0.997391 0.9971894 0.9972902
Micro-average 4435  24 24  0.99461764 0.99461764 0.99461764