package ner

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. class LegalBertForTokenClassification extends MedicalBertForTokenClassifier
  2. class LegalNerApproach extends MedicalNerApproach

    Trains generic NER models based on Neural Networks.

    Trains generic NER models based on Neural Networks.

    The architecture of the neural network is a Char CNNs - BiLSTM - CRF that achieves state-of-the-art in most datasets.

    For instantiated/pretrained models, see LegalNerModel

    The training data should be a labeled Spark Dataset, in the CoNLL 2003 IOB format with Annotation type columns. The data should have columns of type DOCUMENT, TOKEN, WORD_EMBEDDINGS and an additional label column of annotator type NAMED_ENTITY.

    Excluding the label, this can be done with, for example, the annotators SentenceDetector, Tokenizer, and WordEmbeddingsModel (any embeddings can be chosen, e.g. BertEmbeddings for BERT based embeddings).

    For extended examples of usage, see the Spark NLP Workshop.

    Notes

    Both DocumentAssembler and SentenceDetector annotators are annotators that output the DOCUMENT annotation type. Thus, any of them can be used as the first annotators in a pipeline.

    Example

    First extract the prerequisites for the LegalNerApproach

    val document = new DocumentAssembler()
      .setInputCol("text")
      .setOutputCol("document")
    val sentenceDetector = new SentenceDetector()
      .setInputCols("document")
      .setOutputCol("sentence")
    val tokenizer = new Tokenizer()
      .setInputCols("sentence")
      .setOutputCol("token")
    val embeddings = BertEmbeddings.pretrained()
      .setInputCols("sentence", "token")
      .setOutputCol("embeddings")

    Then define the NER annotator

    val nerTagger = new LegalNerApproach()
      .setInputCols("sentence", "token", "embeddings")
      .setLabelColumn("label")
      .setOutputCol("ner")
      .setMaxEpochs(10)
      .setLr(0.005f)
      .setPo(0.005f)
      .setBatchSize(32)
      .setValidationSplit(0.1f)

    Then the training can start

    val pipeline = new Pipeline().setStages(Array(
      document,
      sentenceDetector,
      tokenizer,
      embeddings,
      nerTagger
    ))
    
    trainingData = conll.readDataset(spark, "path/to/train_data.conll")
    pipelineModel = pipeline.fit(trainingData)
  3. class LegalNerModel extends MedicalNerModel
  4. trait ReadLegalBertForTokenTensorflowModel extends ReadTensorflowModel
  5. trait ReadZeroShotNerTensorflowModel extends ReadTensorflowModel
  6. trait ReadablePretrainedLegalBertForTokenModel extends ParamsAndFeaturesReadable[LegalBertForTokenClassification] with HasPretrained[LegalBertForTokenClassification]
  7. trait ReadablePretrainedZeroShotNer extends ParamsAndFeaturesReadable[ZeroShotNerModel] with HasPretrained[ZeroShotNerModel]
  8. class ZeroShotNerModel extends nlp.annotators.ner.ZeroShotNerModel

Value Members

  1. object LegalBertForTokenClassification extends ReadablePretrainedLegalBertForTokenModel with ReadLegalBertForTokenTensorflowModel with Serializable

    This is the companion object of LegalBertForTokenClassification.

    This is the companion object of LegalBertForTokenClassification. Please refer to that class for the documentation.

  2. object LegalNerApproach extends MedicalNerApproach
  3. object LegalNerModel extends ReadablePretrainedMedicalNer with ReadsMedicalNerGraph with Serializable
  4. object ZeroShotNerModel extends ReadablePretrainedZeroShotNer with ReadZeroShotNerTensorflowModel with Serializable

Ungrouped