Financial Job Titles NER

Description

This is a Financial nlp.BertForTokenClassification NER model aimed to extract Job Titles / Roles of people in Companies, and was trained using Resumes, Wikipedia Articles, Financial and Legal documents, annotated in-house.

Predicted Entities

ROLE

Live Demo Copy S3 URI

How to use

from johnsnowlabs import *

documentAssembler = nlp.DocumentAssembler()\
  .setInputCol("text")\
  .setOutputCol("document")

tokenizer = nlp.Tokenizer()\
  .setInputCols("document")\
  .setOutputCol("token")

tokenClassifier = finance.BertForTokenClassification.pretrained("finner_bert_roles", "en", "finance/models")\
  .setInputCols("token", "document")\
  .setOutputCol("label")\
  .setCaseSensitive(True)

ner_converter = nlp.NerConverter()\
        .setInputCols(["document","token","label"])\
        .setOutputCol("ner_chunk")

pipeline =  nlp.Pipeline(stages=[
  documentAssembler,
  tokenizer,
  tokenClassifier,
    ner_converter
    ]
)

import pandas as pd

p_model = pipeline.fit(spark.createDataFrame(pd.DataFrame({'text': ['']})))


text = 'Jeffrey Preston Bezos is an American entrepreneur, founder and CEO of Amazon'

res = p_model.transform(spark.createDataFrame([[text]]).toDF("text"))

result_df = res.select(F.explode(F.arrays_zip(res.token.result,res.label.result, res.label.metadata)).alias("cols"))\
                  .select(F.expr("cols['0']").alias("token"),
                          F.expr("cols['1']").alias("label"),
                          F.expr("cols['2']['confidence']").alias("confidence"))

result_df.show(50, truncate=100)

Results

+------------+---------+----------+
|       token|ner_label|confidence|
+------------+---------+----------+
|     Jeffrey|        O|    0.9984|
|     Preston|        O|    0.9878|
|       Bezos|        O|    0.9939|
|          is|        O|     0.999|
|          an|        O|    0.9988|
|    American|   B-ROLE|    0.8294|
|entrepreneur|   I-ROLE|    0.9358|
|           ,|        O|    0.9979|
|     founder|   B-ROLE|    0.8645|
|         and|        O|     0.857|
|         CEO|   B-ROLE|      0.72|
|          of|        O|     0.995|
|      Amazon|        O|    0.9428|
+------------+---------+----------+

Model Information

Model Name: finner_bert_roles
Type: finance
Compatibility: Finance NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence, token]
Output Labels: [ner]
Language: en
Size: 402.8 MB
Case sensitive: true
Max sentence length: 128

References

In-house annotations on Wikidata, CUAD dataset, Financial 10-K documents and Resumes

Benchmarking

label             tp     fp    fn    prec        rec          f1
B-ROLE            3553   174   262   0.95331365	 0.9313237    0.9421904
I-ROLE            4868   250   243   0.9511528	 0.95245546   0.9518037
Macro-average     8421   424   505   0.9522332   0.9418896    0.9470331
Micro-average     8421   424   505   0.9520633   0.9434237    0.94772375