Description
This is a Financial BertForTokenClassification NER model aimed to extract entities from suspicious activity reports that are filed by financial institutions, and those associated with their business, with the Financial Crimes Enforcement Network.
Predicted Entities
SUSPICIOUS_ITEMS
, PERSON_NAME
, SUSPICIOUS_ACTION
, SUSPICIOUS_KEYWORD
How to use
documentAssembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
tokenizer = nlp.Tokenizer()\
.setInputCols("document")\
.setOutputCol("token")
tokenClassifier = finance.BertForTokenClassification.pretrained("finner_bert_suspicious_activity_reports", "en", "finance/models")\
.setInputCols("token", "document")\
.setOutputCol("label")\
.setCaseSensitive(True)
ner_converter = nlp.NerConverter()\
.setInputCols(["document","token","label"])\
.setOutputCol("ner_chunk")
pipeline = nlp.Pipeline(stages=[
documentAssembler,
tokenizer,
tokenClassifier,
ner_converter
]
)
import pandas as pd
p_model = pipeline.fit(spark.createDataFrame(pd.DataFrame({'text': ['']})))
text = """SUSPICIOUS ACTIVITY REPORT
Date: [Today's Date]
To: [Financial Institution's Compliance Department]
Subject: Suspicious Activity Related to Business Loan
Account Holder Information:
Name: [Name of Business]
Address: [Business Address]
Account Number: [Business Account Number]
Description of Activity:
On [Date], [Name of Business] submitted a loan application for a substantial amount of money. The loan officer reviewing the application noticed several indications of possible suspicious activity."""
res = p_model.transform(spark.createDataFrame([[text]]).toDF("text"))
result_df = res.select(F.explode(F.arrays_zip(res.token.result,res.label.result, res.label.metadata)).alias("cols"))\
.select(F.expr("cols['0']").alias("token"),
F.expr("cols['1']").alias("label"),
F.expr("cols['2']['confidence']").alias("confidence"))
result_df.show(100, truncate=100)
Results
+-------------+--------------------+
|chunk |entity |
+-------------+--------------------+
|SUSPICIOUS |B-SUSPICIOUS_KEYWORD|
|ACTIVITY |O |
|REPORT |O |
|Date |O |
|: |O |
|[Today's |O |
|Date] |O |
|To |O |
|: |O |
|[Financial |O |
|Institution's|O |
|Compliance |O |
|Department] |O |
|Subject |O |
|: |O |
|Suspicious |B-SUSPICIOUS_KEYWORD|
|Activity |O |
|Related |O |
|to |O |
|Business |B-SUSPICIOUS_ACTION |
|Loan |I-SUSPICIOUS_ACTION |
|Account |O |
|Holder |O |
|Information |O |
|: |O |
|Name |O |
|: |O |
|[Name |O |
|of |O |
|Business] |O |
|Address |O |
|: |O |
|[Business |O |
|Address] |O |
|Account |O |
|Number |O |
|: |O |
|[Business |O |
|Account |O |
|Number] |O |
|Description |O |
|of |O |
|Activity |O |
|: |O |
|On |O |
|[Date] |O |
|, |O |
|[Name |O |
|of |O |
|Business] |O |
|submitted |O |
|a |O |
|loan |B-SUSPICIOUS_ACTION |
|application |I-SUSPICIOUS_ACTION |
|for |O |
|a |O |
|substantial |O |
|amount |O |
|of |O |
|money |O |
|. |O |
|The |O |
|loan |O |
|officer |O |
|reviewing |O |
|the |O |
|application |O |
|noticed |O |
|several |O |
|indications |O |
|of |O |
|possible |O |
|suspicious |B-SUSPICIOUS_KEYWORD|
|activity |O |
|. |O |
+-------------+--------------------+
Model Information
Model Name: | finner_bert_suspicious_activity_reports |
Compatibility: | Finance NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token] |
Output Labels: | [ner] |
Language: | en |
Size: | 404.2 MB |
Case sensitive: | true |
Max sentence length: | 128 |
References
In house annotated data
Benchmarking
label precision recall f1-score support
B-SUSPICIOUS_ITEMS 0.75 0.84 0.79 1079
B-PERSON_NAME 0.97 0.97 0.97 88
I-PERSON_NAME 0.98 0.99 0.98 171
B-SUSPICIOUS_ACTION 0.91 0.87 0.89 752
I-SUSPICIOUS_ACTION 0.93 0.91 0.92 814
B-SUSPICIOUS_KEYWORD 0.91 0.97 0.94 1528
I-SUSPICIOUS_ITEMS 0.77 0.84 0.81 659
micro-avg 0.86 0.90 0.88 5091
macro-avg 0.89 0.91 0.90 5091
weighted-avg 0.86 0.90 0.88 5091