com.johnsnowlabs.legal.token_classification.ner

LegalNerApproach

Companion object LegalNerApproach

class LegalNerApproach extends MedicalNerApproach

Trains generic NER models based on Neural Networks.

The architecture of the neural network is a Char CNNs - BiLSTM - CRF that achieves state-of-the-art in most datasets.

For instantiated/pretrained models, see LegalNerModel

The training data should be a labeled Spark Dataset, in the CoNLL 2003 IOB format with Annotation type columns. The data should have columns of type DOCUMENT, TOKEN, WORD_EMBEDDINGS and an additional label column of annotator type NAMED_ENTITY.

Excluding the label, this can be done with, for example, the annotators SentenceDetector, Tokenizer, and WordEmbeddingsModel (any embeddings can be chosen, e.g. BertEmbeddings for BERT based embeddings).

For extended examples of usage, see the Spark NLP Workshop.

Notes

Both DocumentAssembler and SentenceDetector annotators are annotators that output the DOCUMENT annotation type. Thus, any of them can be used as the first annotators in a pipeline.

Example

First extract the prerequisites for the LegalNerApproach

val document = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")
val sentenceDetector = new SentenceDetector()
  .setInputCols("document")
  .setOutputCol("sentence")
val tokenizer = new Tokenizer()
  .setInputCols("sentence")
  .setOutputCol("token")
val embeddings = BertEmbeddings.pretrained()
  .setInputCols("sentence", "token")
  .setOutputCol("embeddings")

Then define the NER annotator

val nerTagger = new LegalNerApproach()
  .setInputCols("sentence", "token", "embeddings")
  .setLabelColumn("label")
  .setOutputCol("ner")
  .setMaxEpochs(10)
  .setLr(0.005f)
  .setPo(0.005f)
  .setBatchSize(32)
  .setValidationSplit(0.1f)

Then the training can start

val pipeline = new Pipeline().setStages(Array(
  document,
  sentenceDetector,
  tokenizer,
  embeddings,
  nerTagger
))

trainingData = conll.readDataset(spark, "path/to/train_data.conll")
pipelineModel = pipeline.fit(trainingData)

Linear Supertypes

MedicalNerApproach, CheckLicense, EvaluationDLParams, ParamsAndFeaturesWritable, Logging, NerApproach[MedicalNerApproach], MedicalNerParams, HasFeatures, AnnotatorApproach[MedicalNerModel], CanBeLazy, DefaultParamsWritable, MLWritable, HasOutputAnnotatorType, HasOutputAnnotationCol, HasInputAnnotationCols, Estimator[MedicalNerModel], PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any

Ordering

Grouped
Alphabetic
By Inheritance

Inherited

LegalNerApproach
MedicalNerApproach
CheckLicense
EvaluationDLParams
ParamsAndFeaturesWritable
Logging
NerApproach
MedicalNerParams
HasFeatures
AnnotatorApproach
CanBeLazy
DefaultParamsWritable
MLWritable
HasOutputAnnotatorType
HasOutputAnnotationCol
HasInputAnnotationCols
Estimator
PipelineStage
Logging
Params
Serializable
Serializable
Identifiable
AnyRef
Any

Hide All
Show All

Visibility

Public
All

Instance Constructors

new LegalNerApproach()
new LegalNerApproach(uid: String)
uid
a unique identifier for the instantiated AnnotatorModel

Type Members

type AnnotatorType = String

Definition Classes
HasOutputAnnotatorType

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def $[T](param: Param[T]): T

Attributes
protected
Definition Classes
Params
def $$[T](feature: StructFeature[T]): T

Attributes
protected
Definition Classes
HasFeatures
def $$[K, V](feature: MapFeature[K, V]): Map[K, V]

Attributes
protected
Definition Classes
HasFeatures
def $$[T](feature: SetFeature[T]): Set[T]

Attributes
protected
Definition Classes
HasFeatures
def $$[T](feature: ArrayFeature[T]): Array[T]

Attributes
protected
Definition Classes
HasFeatures
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def _fit(dataset: Dataset[_], recursiveStages: Option[PipelineModel]): MedicalNerModel

Attributes
protected
Definition Classes
AnnotatorApproach
final def asInstanceOf[T0]: T0

Definition Classes
Any
val batchSize: IntParam
Batch size, by default 8.
Batch size, by default 8.

Definition Classes
MedicalNerApproach
def beforeTraining(spark: SparkSession): Unit

Definition Classes
MedicalNerApproach → AnnotatorApproach
def calculateEmbeddingsDim(sentences: Seq[WordpieceEmbeddingsSentence]): Int

Definition Classes
MedicalNerApproach
final def checkSchema(schema: StructType, inputAnnotatorType: String): Boolean

Attributes
protected
Definition Classes
HasInputAnnotationCols
def checkValidEnvironment(spark: Option[SparkSession], scopes: Seq[String]): Unit

Definition Classes
CheckLicense
def checkValidScope(scope: String): Unit

Definition Classes
CheckLicense
def checkValidScopeAndEnvironment(scope: String, spark: Option[SparkSession], checkLp: Boolean): Unit

Definition Classes
CheckLicense
def checkValidScopesAndEnvironment(scopes: Seq[String], spark: Option[SparkSession], checkLp: Boolean): Unit

Definition Classes
CheckLicense
final def clear(param: Param[_]): LegalNerApproach.this.type

Definition Classes
Params
def clone(): AnyRef

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( ... ) @native()
val configProtoBytes: IntArrayParam
ConfigProto from tensorflow, serialized into byte array.
ConfigProto from tensorflow, serialized into byte array. Get with config_proto.SerializeToString()

Definition Classes
MedicalNerParams
final def copy(extra: ParamMap): Estimator[MedicalNerModel]

Definition Classes
AnnotatorApproach → Estimator → PipelineStage → Params
def copyValues[T <: Params](to: T, extra: ParamMap): T

Attributes
protected
Definition Classes
Params
val datasetInfo: Param[String]
Descriptive information about the dataset being used.
Descriptive information about the dataset being used.

Definition Classes
MedicalNerParams
final def defaultCopy[T <: Params](extra: ParamMap): T

Attributes
protected
Definition Classes
Params
val description: String
Trains Tensorflow based Char-CNN-BLSTM model
Trains Tensorflow based Char-CNN-BLSTM model

Definition Classes
MedicalNerApproach → AnnotatorApproach
val dropout: FloatParam
Dropout coefficient, by default 0.5.
Dropout coefficient, by default 0.5.
The coefficient of the dropout layer. The value should be between 0.0 and 1.0. Internally, it is used by Tensorflow as: rate = 1.0 - dropout when adding a dropout layer on top of the recurrent layers.

Definition Classes
MedicalNerParams
val earlyStoppingCriterion: FloatParam
If set, this param specifies the criterion to stop training if performance is not improving.
If set, this param specifies the criterion to stop training if performance is not improving.
Default value is 0 which is means that early stopping is not used.
The criterion is set to F1-score if the validationSplit is greater than 0.0 (F1-socre on validation set) or testDataset is defined (F1-score on test set), otherwise it is set to model loss. The priority is as follows: - If testDataset is defined, then the criterion is set to F1-score on test set. - If validationSplit is greater than 0.0, then the criterion is set to F1-score on validation set. - Otherwise, the criterion is set to model loss.
Note that while the F1-score ranges from 0.0 to 1.0, the loss ranges from 0.0 to infinity. So, depending on which case you are in, the value you use for the criterion can be very different. For example, if validationSplit is 0.1, then a criterion of 0.01 means that if the F1-score on the validation set difference from last epoch is greater than 0.01, then the training should stop. However, if there is not validation or test set defined, then a criterion of 2.0 means that if the loss difference between the last epoch and the current one is less than 2.0, then training should stop.

Definition Classes
MedicalNerParams
See also
earlyStoppingPatience.
val earlyStoppingPatience: IntParam
Number of epochs to wait before early stopping if no improvement, by default 5.
Number of epochs to wait before early stopping if no improvement, by default 5.
Given the earlyStoppingCriterion, if the performance does not improve for the given number of epochs, then the training will stop. If the value is 0, then early stopping will occurs as soon as the criterion is met (no patience).

Definition Classes
MedicalNerParams
See also
earlyStoppingCriterion.
val enableMemoryOptimizer: BooleanParam
Whether to optimize for large datasets or not.
Whether to optimize for large datasets or not. Enabling this option can slow down training.
In practice, if set to true the training will iterate over the spark Data Frame and retrieve the batches from the Data Frame iterator. This can be slower than the default option as it has to collect the batches on evey bach for every epoch, but it can be useful if the dataset is too large to fit in memory.
It controls if we want the features collected and generated at once and then feed into the network batch by batch (False) or collected and generated by batch and then feed into the network in batches (True) .
If the training data can fit to memory, then it is recommended to set this option to False (default value).

Definition Classes
MedicalNerParams
val enableOutputLogs: BooleanParam

Definition Classes
EvaluationDLParams
val entities: StringArrayParam

Definition Classes
NerApproach
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
val evaluationLogExtended: BooleanParam

Definition Classes
EvaluationDLParams
def explainParam(param: Param[_]): String

Definition Classes
Params
def explainParams(): String

Definition Classes
Params
final def extractParamMap(): ParamMap

Definition Classes
Params
final def extractParamMap(extra: ParamMap): ParamMap

Definition Classes
Params
val features: ArrayBuffer[Feature[_, _, _]]

Definition Classes
HasFeatures
def finalize(): Unit

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
final def fit(dataset: Dataset[_]): MedicalNerModel

Definition Classes
AnnotatorApproach → Estimator
def fit(dataset: Dataset[_], paramMaps: Seq[ParamMap]): Seq[MedicalNerModel]

Definition Classes
Estimator
Annotations
@Since( "2.0.0" )
def fit(dataset: Dataset[_], paramMap: ParamMap): MedicalNerModel

Definition Classes
Estimator
Annotations
@Since( "2.0.0" )
def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): MedicalNerModel

Definition Classes
Estimator
Annotations
@Since( "2.0.0" ) @varargs()
def get[T](feature: StructFeature[T]): Option[T]

Attributes
protected
Definition Classes
HasFeatures
def get[K, V](feature: MapFeature[K, V]): Option[Map[K, V]]

Attributes
protected
Definition Classes
HasFeatures
def get[T](feature: SetFeature[T]): Option[Set[T]]

Attributes
protected
Definition Classes
HasFeatures
def get[T](feature: ArrayFeature[T]): Option[Array[T]]

Attributes
protected
Definition Classes
HasFeatures
final def get[T](param: Param[T]): Option[T]

Definition Classes
Params
def getBatchSize: Int
Batch size
Batch size

Definition Classes
MedicalNerApproach
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
Annotations
@native()
def getConfigProtoBytes: Option[Array[Byte]]
ConfigProto from tensorflow, serialized into byte array.
ConfigProto from tensorflow, serialized into byte array. Get with config_proto.SerializeToString()

Definition Classes
MedicalNerParams
def getDatasetInfo: String
get descriptive information about the dataset being used
get descriptive information about the dataset being used

Definition Classes
MedicalNerParams
final def getDefault[T](param: Param[T]): Option[T]

Definition Classes
Params
def getDropout: Float
Dropout coefficient
Dropout coefficient

Definition Classes
MedicalNerParams
def getEarlyStoppingCriterion: Float
Early stopping criterion
Early stopping criterion

Definition Classes
MedicalNerParams
def getEarlyStoppingPatience: Int
Early stopping patience
Early stopping patience

Definition Classes
MedicalNerParams
def getEnableMemoryOptimizer: Boolean
Whether to optimize for large datasets or not.
Whether to optimize for large datasets or not. Enabling this option can slow down training.

Definition Classes
MedicalNerParams
def getEnableOutputLogs: Boolean

Definition Classes
EvaluationDLParams
def getIncludeAllConfidenceScores: Boolean
whether to include all confidence scores in annotation metadata or just the score of the predicted tag
whether to include all confidence scores in annotation metadata or just the score of the predicted tag

Definition Classes
MedicalNerParams
def getIncludeConfidence: Boolean
whether to include confidence scores in annotation metadata
whether to include confidence scores in annotation metadata

Definition Classes
MedicalNerParams
def getInputCols: Array[String]

Definition Classes
HasInputAnnotationCols
def getLazyAnnotator: Boolean

Definition Classes
CanBeLazy
def getLogName: String

Definition Classes
MedicalNerApproach → Logging
def getLr: Float
Learning Rate
Learning Rate

Definition Classes
MedicalNerParams
def getMaxEpochs: Int

Definition Classes
NerApproach
def getMinEpochs: Int

Definition Classes
NerApproach
final def getOrDefault[T](param: Param[T]): T

Definition Classes
Params
final def getOutputCol: String

Definition Classes
HasOutputAnnotationCol
def getOutputLogsPath: String

Definition Classes
EvaluationDLParams
def getOverrideExistingTags: Boolean
Whether to override already learned tags when using a pretrained model to initialize the new model.
Whether to override already learned tags when using a pretrained model to initialize the new model.

Definition Classes
MedicalNerParams
def getParam(paramName: String): Param[Any]

Definition Classes
Params
def getPo: Float
Learning rate decay coefficient.
Learning rate decay coefficient. Real Learning Rage = lr / (1 + po * epoch)

Definition Classes
MedicalNerParams
def getRandomSeed: Int

Definition Classes
NerApproach
def getRandomValidationSplitPerEpoch: Boolean
Checks if a random validation split is done after each epoch or at the beginning of training only.
Checks if a random validation split is done after each epoch or at the beginning of training only.

Definition Classes
MedicalNerParams
def getSentenceTokenIndex: Boolean
whether to include the token index for each sentence in annotation metadata.
whether to include the token index for each sentence in annotation metadata.

Definition Classes
MedicalNerParams
def getUseBestModel: Boolean
useBestModel
useBestModel

Definition Classes
MedicalNerParams
def getUseContrib: Boolean
Whether to use contrib LSTM Cells.
Whether to use contrib LSTM Cells. Not compatible with Windows. Might slightly improve accuracy.

Definition Classes
MedicalNerParams
def getValidationSplit: Float

Definition Classes
EvaluationDLParams
val graphFile: Param[String]
Path that contains the external graph file.
Path that contains the external graph file.
When specified, the provided file will be used, and no graph search will happen. The path can be a local file path, a distributed file path (HDFS, DBFS), or a cloud storage (S3).

Definition Classes
MedicalNerParams
val graphFolder: Param[String]
Folder path that contains external graph files.
Folder path that contains external graph files.
The path can be a local file path, a distributed file path (HDFS, DBFS), or a cloud storage (S3).
When instantiating the Tensorflow model, uses this folder to search for the adequate Tensorflow graph. The search is done using the name of the .pb file, which should be in this format: blstn_{ntags}_{embedding_dim}_{lstm_size}_{nchars}.pb.
Then, the search follows these rules: - Embedding dimension should be exactly the same as the one used to train the model. - Number of unique tags should be greater than or equal to the number of unique tags in the training data. - Number of unique chars should be greater than or equal to the number of unique chars in the training data.
The returned file will be the first one that satisfies all the conditions.
If the name of the file is ill-formed, errors will occur during training.

Definition Classes
MedicalNerParams
final def hasDefault[T](param: Param[T]): Boolean

Definition Classes
Params
def hasParam(paramName: String): Boolean

Definition Classes
Params
def hashCode(): Int

Definition Classes
AnyRef → Any
Annotations
@native()
val includeAllConfidenceScores: BooleanParam
Whether to include confidence scores for all tags in annotation metadata or just the score of the predicted tag, by default False.
Whether to include confidence scores for all tags in annotation metadata or just the score of the predicted tag, by default False.
Needs the includeConfidence parameter to be set to true.
Enabling this may slow down the inference speed.

Definition Classes
MedicalNerParams
val includeConfidence: BooleanParam
Whether to include confidence scores in annotation metadata, by default False.
Whether to include confidence scores in annotation metadata, by default False.
Setting this parameter to True will add the confidence score to the metadata of the NAMED_ENTITY annotation. In addition, if includeAllConfidenceScores is set to true, then the confidence scores of all the tags will be added to the metadata, otherwise only for the predicted tag (the one with maximum score).

Definition Classes
MedicalNerParams
def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean

Attributes
protected
Definition Classes
Logging
def initializeLogIfNecessary(isInterpreter: Boolean): Unit

Attributes
protected
Definition Classes
Logging
val inputAnnotatorTypes: Array[String]
Input annotator types : DOCUMENT, TOKEN, WORD_EMBEDDINGS
Input annotator types : DOCUMENT, TOKEN, WORD_EMBEDDINGS

Definition Classes
MedicalNerApproach → HasInputAnnotationCols
final val inputCols: StringArrayParam

Attributes
protected
Definition Classes
HasInputAnnotationCols
final def isDefined(param: Param[_]): Boolean

Definition Classes
Params
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
final def isSet(param: Param[_]): Boolean

Definition Classes
Params
def isTraceEnabled(): Boolean

Attributes
protected
Definition Classes
Logging
val labelColumn: Param[String]

Definition Classes
NerApproach
val lazyAnnotator: BooleanParam

Definition Classes
CanBeLazy
def log(value: ⇒ String, minLevel: Level): Unit

Attributes
protected
Definition Classes
Logging
def log: Logger

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logName: String

Attributes
protected
Definition Classes
Logging
val logPrefix: Param[String]
A prefix that will be appended to every log, default value is empty.
A prefix that will be appended to every log, default value is empty.

Definition Classes
MedicalNerParams
def logTrace(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logTrace(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
val logger: Logger

Attributes
protected
Definition Classes
Logging
val lr: FloatParam
Learning Rate, by default 0.001.
Learning Rate, by default 0.001.

Definition Classes
MedicalNerParams
val maxEpochs: IntParam

Definition Classes
NerApproach
val minEpochs: IntParam

Definition Classes
NerApproach
def msgHelper(schema: StructType): String

Attributes
protected
Definition Classes
HasInputAnnotationCols
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
Annotations
@native()
final def notifyAll(): Unit

Definition Classes
AnyRef
Annotations
@native()
def onTrained(model: MedicalNerModel, spark: SparkSession): Unit

Definition Classes
AnnotatorApproach
def onWrite(path: String, spark: SparkSession): Unit

Attributes
protected
Definition Classes
ParamsAndFeaturesWritable
val optionalInputAnnotatorTypes: Array[String]

Definition Classes
HasInputAnnotationCols
val outputAnnotatorType: String
Input annotator types : NAMED_ENTITY
Input annotator types : NAMED_ENTITY

Definition Classes
MedicalNerApproach → HasOutputAnnotatorType
final val outputCol: Param[String]

Attributes
protected
Definition Classes
HasOutputAnnotationCol
def outputLog(value: ⇒ String, uuid: String, shouldLog: Boolean, outputLogsPath: String): Unit

Attributes
protected
Definition Classes
Logging
val outputLogsPath: Param[String]

Definition Classes
EvaluationDLParams
val overrideExistingTags: BooleanParam
Controls whether to override already learned tags when using a pretrained model to initialize the new model.
Controls whether to override already learned tags when using a pretrained model to initialize the new model. A value of true will override existing tags.

Definition Classes
MedicalNerParams
lazy val params: Array[Param[_]]

Definition Classes
Params
val po: FloatParam
Learning rate decay coefficient (time-based).
Learning rate decay coefficient (time-based).
This is used to calculate the decayed learning rate at each step as: lr = lr / (1 + po * epoch), meaning that the value of the learning rate is updated on each epoch. By default 0.005.

Definition Classes
MedicalNerParams
val pretrainedModelPath: Param[String]
Path to an already trained MedicalNerModel.
Path to an already trained MedicalNerModel.
This pretrained model will be used as a starting point for training the new one. The path can be a local file path, a distributed file path (HDFS, DBFS), or a cloud storage (S3).

Definition Classes
MedicalNerParams
val randomSeed: IntParam

Definition Classes
NerApproach
val randomValidationSplitPerEpoch: BooleanParam
Do a random validation split after each epoch rather than at the beginning of training only.
Do a random validation split after each epoch rather than at the beginning of training only.

Definition Classes
MedicalNerParams
def resumeTrainingFromModel(model: LegalNerApproach): LegalNerApproach.this.type
def resumeTrainingFromModel(model: MedicalNerModel): LegalNerApproach.this.type

Definition Classes
MedicalNerApproach
def save(path: String): Unit

Definition Classes
MLWritable
Annotations
@Since( "1.6.0" ) @throws( ... )
val sentenceTokenIndex: BooleanParam
whether to include the token index for each sentence in annotation metadata, by default false.
whether to include the token index for each sentence in annotation metadata, by default false. If the value is true, the process might be slowed down.

Definition Classes
MedicalNerParams
def set[T](feature: StructFeature[T], value: T): LegalNerApproach.this.type

Attributes
protected
Definition Classes
HasFeatures
def set[K, V](feature: MapFeature[K, V], value: Map[K, V]): LegalNerApproach.this.type

Attributes
protected
Definition Classes
HasFeatures
def set[T](feature: SetFeature[T], value: Set[T]): LegalNerApproach.this.type

Attributes
protected
Definition Classes
HasFeatures
def set[T](feature: ArrayFeature[T], value: Array[T]): LegalNerApproach.this.type

Attributes
protected
Definition Classes
HasFeatures
final def set(paramPair: ParamPair[_]): LegalNerApproach.this.type

Attributes
protected
Definition Classes
Params
final def set(param: String, value: Any): LegalNerApproach.this.type

Attributes
protected
Definition Classes
Params
final def set[T](param: Param[T], value: T): LegalNerApproach.this.type

Definition Classes
Params
def setBatchSize(batch: Int): LegalNerApproach.this.type
Batch size
Batch size

Definition Classes
MedicalNerApproach
def setConfigProtoBytes(bytes: Array[Int]): LegalNerApproach.this.type
ConfigProto from tensorflow, serialized into byte array.
ConfigProto from tensorflow, serialized into byte array. Get with config_proto.SerializeToString()

Definition Classes
MedicalNerParams
def setDatasetInfo(value: String): LegalNerApproach.this.type
set descriptive information about the dataset being used
set descriptive information about the dataset being used

Definition Classes
MedicalNerParams
def setDefault[T](feature: StructFeature[T], value: () ⇒ T): LegalNerApproach.this.type

Attributes
protected
Definition Classes
HasFeatures
def setDefault[K, V](feature: MapFeature[K, V], value: () ⇒ Map[K, V]): LegalNerApproach.this.type

Attributes
protected
Definition Classes
HasFeatures
def setDefault[T](feature: SetFeature[T], value: () ⇒ Set[T]): LegalNerApproach.this.type

Attributes
protected
Definition Classes
HasFeatures
def setDefault[T](feature: ArrayFeature[T], value: () ⇒ Array[T]): LegalNerApproach.this.type

Attributes
protected
Definition Classes
HasFeatures
final def setDefault(paramPairs: ParamPair[_]*): LegalNerApproach.this.type

Attributes
protected
Definition Classes
Params
final def setDefault[T](param: Param[T], value: T): LegalNerApproach.this.type

Attributes
protected[org.apache.spark.ml]
Definition Classes
Params
def setDropout(dropout: Float): LegalNerApproach.this.type
Dropout coefficient
Dropout coefficient

Definition Classes
MedicalNerParams
def setEarlyStoppingCriterion(value: Float): LegalNerApproach.this.type

Definition Classes
MedicalNerParams
def setEarlyStoppingPatience(value: Int): LegalNerApproach.this.type

Definition Classes
MedicalNerParams
def setEnableMemoryOptimizer(value: Boolean): LegalNerApproach.this.type

Definition Classes
MedicalNerParams
def setEnableOutputLogs(enableOutputLogs: Boolean): LegalNerApproach.this.type

Definition Classes
EvaluationDLParams
def setEntities(tags: Array[String]): MedicalNerApproach

Definition Classes
NerApproach
def setEvaluationLogExtended(evaluationLogExtended: Boolean): LegalNerApproach.this.type

Definition Classes
EvaluationDLParams
def setGraphFile(path: String): LegalNerApproach.this.type
Folder path that contain external graph files
Folder path that contain external graph files

Definition Classes
MedicalNerParams
def setGraphFolder(path: String): LegalNerApproach.this.type
Folder path that contain external graph files
Folder path that contain external graph files

Definition Classes
MedicalNerParams
def setIncludeAllConfidenceScores(value: Boolean): LegalNerApproach.this.type
Whether to include confidence scores in annotation metadata
Whether to include confidence scores in annotation metadata

Definition Classes
MedicalNerParams
def setIncludeConfidence(value: Boolean): LegalNerApproach.this.type
Whether to include confidence scores for all tags rather than just for the predicted one
Whether to include confidence scores for all tags rather than just for the predicted one

Definition Classes
MedicalNerParams
final def setInputCols(value: String*): LegalNerApproach.this.type

Definition Classes
HasInputAnnotationCols
def setInputCols(value: Array[String]): LegalNerApproach.this.type

Definition Classes
HasInputAnnotationCols
def setLabelColumn(column: String): MedicalNerApproach

Definition Classes
NerApproach
def setLazyAnnotator(value: Boolean): LegalNerApproach.this.type

Definition Classes
CanBeLazy
def setLogPrefix(value: String): LegalNerApproach.this.type
a string prefix to be included in the logs
a string prefix to be included in the logs

Definition Classes
MedicalNerParams
def setLr(lr: Float): LegalNerApproach.this.type
Learning Rate
Learning Rate

Definition Classes
MedicalNerParams
def setMaxEpochs(epochs: Int): MedicalNerApproach

Definition Classes
NerApproach
def setMinEpochs(epochs: Int): MedicalNerApproach

Definition Classes
NerApproach
final def setOutputCol(value: String): LegalNerApproach.this.type

Definition Classes
HasOutputAnnotationCol
def setOutputLogsPath(path: String): LegalNerApproach.this.type

Definition Classes
EvaluationDLParams
def setOverrideExistingTags(value: Boolean): LegalNerApproach.this.type
Controls whether to override already learned tags when using a pretrained model to initialize the new model.
Controls whether to override already learned tags when using a pretrained model to initialize the new model. A value of true will override existing tags.

Definition Classes
MedicalNerParams
def setPo(po: Float): LegalNerApproach.this.type
Learning rate decay coefficient.
Learning rate decay coefficient. Real Learning Rage = lr / (1 + po * epoch)

Definition Classes
MedicalNerParams
def setPretrainedModelPath(path: String): LegalNerApproach.this.type
Set the location of an already trained MedicalNerModel, which is used as a starting point for training the new model.
Set the location of an already trained MedicalNerModel, which is used as a starting point for training the new model.

Definition Classes
MedicalNerParams
def setRandomSeed(seed: Int): MedicalNerApproach

Definition Classes
NerApproach
def setRandomValidationSplitPerEpoch(value: Boolean): LegalNerApproach.this.type
Do a random validation split after each epoch rather than at the beginning of training only.
Do a random validation split after each epoch rather than at the beginning of training only.

Definition Classes
MedicalNerParams
def setSentenceTokenIndex(value: Boolean): LegalNerApproach.this.type
whether to include the token index for each sentence in annotation metadata, by default false.
whether to include the token index for each sentence in annotation metadata, by default false. If the value is true, the process might be slowed down.

Definition Classes
MedicalNerParams
def setTagsMapping(mapping: Map[String, String]): LegalNerApproach.this.type
A map specifying how old tags are mapped to new ones.
A map specifying how old tags are mapped to new ones. Maps are specified either using a list of comma separated strings, e.g. ("OLDTAG1,NEWTAG1", "OLDTAG2,NEWTAG2", ...) or by a Map data structure.

Definition Classes
MedicalNerParams
def setTagsMapping(mapping: ArrayList[String]): LegalNerApproach.this.type

Definition Classes
MedicalNerParams
def setTagsMapping(mapping: Array[String]): LegalNerApproach.this.type
A map specifying how old tags are mapped to new ones.
A map specifying how old tags are mapped to new ones. Maps are specified either using a list of comma separated strings, e.g. ("OLDTAG1,NEWTAG1", "OLDTAG2,NEWTAG2", ...) or by a Map data structure. It only works if setOverrideExistingTags is false.

Definition Classes
MedicalNerParams
def setTestDataset(er: ExternalResource): LegalNerApproach.this.type

Definition Classes
EvaluationDLParams
def setTestDataset(path: String, readAs: Format, options: Map[String, String]): LegalNerApproach.this.type

Definition Classes
EvaluationDLParams
def setUseBestModel(value: Boolean): LegalNerApproach.this.type

Definition Classes
MedicalNerParams
def setUseContrib(value: Boolean): LegalNerApproach.this.type
Whether to use contrib LSTM Cells.
Whether to use contrib LSTM Cells. Not compatible with Windows. Might slightly improve accuracy.

Definition Classes
MedicalNerParams
def setValidationSplit(validationSplit: Float): LegalNerApproach.this.type

Definition Classes
EvaluationDLParams
def setVerbose(verbose: Level): LegalNerApproach.this.type

Definition Classes
EvaluationDLParams
def setVerbose(verbose: Int): LegalNerApproach.this.type

Definition Classes
EvaluationDLParams
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
val tagsMapping: MapFeature[String, String]
A map specifying how old tags are mapped to new ones.
A map specifying how old tags are mapped to new ones.
It only works if overrideExistingTags is set to false.

Definition Classes
MedicalNerParams
val testDataset: ExternalResourceParam

Definition Classes
EvaluationDLParams
def toString(): String

Definition Classes
Identifiable → AnyRef → Any
def train(dataset: Dataset[_], recursivePipeline: Option[PipelineModel]): LegalNerModel

Definition Classes
LegalNerApproach → MedicalNerApproach → AnnotatorApproach
final def transformSchema(schema: StructType): StructType

Definition Classes
AnnotatorApproach → PipelineStage
def transformSchema(schema: StructType, logging: Boolean): StructType

Attributes
protected
Definition Classes
PipelineStage
Annotations
@DeveloperApi()
val uid: String

Definition Classes
LegalNerApproach → MedicalNerApproach → Identifiable
val useBestModel: BooleanParam
Whether to restore and use the model from the epoch that has achieved the best performance at the end of the training.
Whether to restore and use the model from the epoch that has achieved the best performance at the end of the training.
By default false (keep the model from the last trained epoch).
The best model depends on the earlyStoppingCriterion, which can be F1-score on test/validation dataset or the value of loss.

Definition Classes
MedicalNerParams
val useContrib: BooleanParam
whether to use contrib LSTM Cells.
whether to use contrib LSTM Cells. Not compatible with Windows. Might slightly improve accuracy. By default true.

Definition Classes
MedicalNerParams
def validate(schema: StructType): Boolean

Attributes
protected
Definition Classes
AnnotatorApproach
val validationSplit: FloatParam

Definition Classes
EvaluationDLParams
val verbose: IntParam

Definition Classes
EvaluationDLParams
val verboseLevel: Level

Definition Classes
MedicalNerApproach → Logging
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... ) @native()
def write: MLWriter

Definition Classes
ParamsAndFeaturesWritable → DefaultParamsWritable → MLWritable

Packages

LegalNerApproach

Companion object LegalNerApproach

class LegalNerApproach extends MedicalNerApproach

Notes

Example

Instance Constructors

Type Members

Value Members

Inherited from MedicalNerApproach

Inherited from CheckLicense

Inherited from EvaluationDLParams

Inherited from ParamsAndFeaturesWritable

Inherited from Logging

Inherited from NerApproach[MedicalNerApproach]

Inherited from MedicalNerParams

Inherited from HasFeatures

Inherited from AnnotatorApproach[MedicalNerModel]

Inherited from CanBeLazy

Inherited from DefaultParamsWritable

Inherited from MLWritable

Inherited from HasOutputAnnotatorType

Inherited from HasOutputAnnotationCol

Inherited from HasInputAnnotationCols

Inherited from Estimator[MedicalNerModel]

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

Parameters

Annotator types

Members

Parameter setters

Parameter getters

Packages

LegalNerApproach 

Companion object LegalNerApproach

class LegalNerApproach extends MedicalNerApproach

Notes

Example

Instance Constructors

Type Members

Value Members

Inherited from MedicalNerApproach

Inherited from CheckLicense

Inherited from EvaluationDLParams

Inherited from ParamsAndFeaturesWritable

Inherited from Logging

Inherited from NerApproach[MedicalNerApproach]

Inherited from MedicalNerParams

Inherited from HasFeatures

Inherited from AnnotatorApproach[MedicalNerModel]

Inherited from CanBeLazy

Inherited from DefaultParamsWritable

Inherited from MLWritable

Inherited from HasOutputAnnotatorType

Inherited from HasOutputAnnotationCol

Inherited from HasInputAnnotationCols

Inherited from Estimator[MedicalNerModel]

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

Parameters

Annotator types

Members

Parameter setters

Parameter getters

LegalNerApproach