com.johnsnowlabs.finance.token_classification

ner

package ner

Ordering

Alphabetic

Visibility

Public
All

Type Members

class FinanceBertForTokenClassification extends MedicalBertForTokenClassifier
class FinanceNerApproach extends MedicalNerApproach
Trains generic NER models based on Neural Networks.
Trains generic NER models based on Neural Networks.
The architecture of the neural network is a Char CNNs - BiLSTM - CRF that achieves state-of-the-art in most datasets.
For instantiated/pretrained models, see FinanceNerModel
The training data should be a labeled Spark Dataset, in the CoNLL 2003 IOB format with Annotation type columns. The data should have columns of type DOCUMENT, TOKEN, WORD_EMBEDDINGS and an additional label column of annotator type NAMED_ENTITY.
Excluding the label, this can be done with, for example, the annotators SentenceDetector, Tokenizer, and WordEmbeddingsModel (any embeddings can be chosen, e.g. BertEmbeddings for BERT based embeddings).
For extended examples of usage, see the Spark NLP Workshop.
Notes
Both DocumentAssembler and SentenceDetector annotators are annotators that output the DOCUMENT annotation type. Thus, any of them can be used as the first annotators in a pipeline.
Example
First extract the prerequisites for the FinanceNerApproach
```
val document = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")
val sentenceDetector = new SentenceDetector()
  .setInputCols("document")
  .setOutputCol("sentence")
val tokenizer = new Tokenizer()
  .setInputCols("sentence")
  .setOutputCol("token")
val embeddings = BertEmbeddings.pretrained()
  .setInputCols("sentence", "token")
  .setOutputCol("embeddings")
```
Then define the NER annotator
```
val nerTagger = new FinanceNerApproach()
  .setInputCols("sentence", "token", "embeddings")
  .setLabelColumn("label")
  .setOutputCol("ner")
  .setMaxEpochs(10)
  .setLr(0.005f)
  .setPo(0.005f)
  .setBatchSize(32)
  .setValidationSplit(0.1f)
```
Then the training can start
```
val pipeline = new Pipeline().setStages(Array(
  document,
  sentenceDetector,
  tokenizer,
  embeddings,
  nerTagger
))

trainingData = conll.readDataset(spark, "path/to/train_data.conll")
pipelineModel = pipeline.fit(trainingData)
```
class FinanceNerModel extends MedicalNerModel
trait ReadFinanceBertForTokenTensorflowModel extends ReadTensorflowModel
trait ReadZeroShotNerTensorflowModel extends ReadTensorflowModel with ReadOnnxModel with ReadOpenvinoModel
trait ReadablePretrainedFinanceBertForTokenModel extends ParamsAndFeaturesReadable[FinanceBertForTokenClassification] with HasPretrained[FinanceBertForTokenClassification]
trait ReadablePretrainedZeroShotNer extends ParamsAndFeaturesReadable[ZeroShotNerModel] with HasPretrained[ZeroShotNerModel]
class ZeroShotNerModel extends nlp.annotators.ner.ZeroShotNerModel

Value Members

object FinanceBertForTokenClassification extends ReadablePretrainedFinanceBertForTokenModel with ReadFinanceBertForTokenTensorflowModel with Serializable
This is the companion object of FinanceBertForTokenClassification.
This is the companion object of FinanceBertForTokenClassification. Please refer to that class for the documentation.
object FinanceNerApproach extends MedicalNerApproach
object FinanceNerModel extends ReadablePretrainedMedicalNer with ReadsMedicalNerGraph with Serializable
object ZeroShotNerModel extends ReadablePretrainedZeroShotNer with ReadZeroShotNerTensorflowModel with Serializable

Packages

ner

package ner

Type Members

Notes

Example

Value Members

Ungrouped

Packages

ner 

package ner

Type Members

Notes

Example

Value Members

Ungrouped

ner