com.johnsnowlabs.legal.token_classification.ner
LegalNerApproach
Companion object LegalNerApproach
class LegalNerApproach extends MedicalNerApproach
Trains generic NER models based on Neural Networks.
The architecture of the neural network is a Char CNNs - BiLSTM - CRF that achieves state-of-the-art in most datasets.
For instantiated/pretrained models, see LegalNerModel
The training data should be a labeled Spark Dataset, in the CoNLL 2003
IOB format with Annotation
type columns. The data should have columns
of type
and an additional
label column of annotator type DOCUMENT, TOKEN, WORD_EMBEDDINGS
.NAMED_ENTITY
Excluding the label, this can be done with, for example, the annotators SentenceDetector, Tokenizer, and WordEmbeddingsModel (any embeddings can be chosen, e.g. BertEmbeddings for BERT based embeddings).
For extended examples of usage, see the Spark NLP Workshop.
Notes
Both DocumentAssembler and SentenceDetector annotators are
annotators that output the
annotation type.
Thus, any of them can be used as the first annotators in a pipeline.DOCUMENT
Example
First extract the prerequisites for the LegalNerApproach
val document = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val sentenceDetector = new SentenceDetector() .setInputCols("document") .setOutputCol("sentence") val tokenizer = new Tokenizer() .setInputCols("sentence") .setOutputCol("token") val embeddings = BertEmbeddings.pretrained() .setInputCols("sentence", "token") .setOutputCol("embeddings")
Then define the NER annotator
val nerTagger = new LegalNerApproach() .setInputCols("sentence", "token", "embeddings") .setLabelColumn("label") .setOutputCol("ner") .setMaxEpochs(10) .setLr(0.005f) .setPo(0.005f) .setBatchSize(32) .setValidationSplit(0.1f)
Then the training can start
val pipeline = new Pipeline().setStages(Array( document, sentenceDetector, tokenizer, embeddings, nerTagger )) trainingData = conll.readDataset(spark, "path/to/train_data.conll") pipelineModel = pipeline.fit(trainingData)
- Grouped
- Alphabetic
- By Inheritance
- LegalNerApproach
- MedicalNerApproach
- CheckLicense
- EvaluationDLParams
- ParamsAndFeaturesWritable
- Logging
- NerApproach
- MedicalNerParams
- HasFeatures
- AnnotatorApproach
- CanBeLazy
- DefaultParamsWritable
- MLWritable
- HasOutputAnnotatorType
- HasOutputAnnotationCol
- HasInputAnnotationCols
- Estimator
- PipelineStage
- Logging
- Params
- Serializable
- Serializable
- Identifiable
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
Type Members
-
type
AnnotatorType = String
- Definition Classes
- HasOutputAnnotatorType
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
$[T](param: Param[T]): T
- Attributes
- protected
- Definition Classes
- Params
-
def
$$[T](feature: StructFeature[T]): T
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
$$[K, V](feature: MapFeature[K, V]): Map[K, V]
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
$$[T](feature: SetFeature[T]): Set[T]
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
$$[T](feature: ArrayFeature[T]): Array[T]
- Attributes
- protected
- Definition Classes
- HasFeatures
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
_fit(dataset: Dataset[_], recursiveStages: Option[PipelineModel]): MedicalNerModel
- Attributes
- protected
- Definition Classes
- AnnotatorApproach
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
val
batchSize: IntParam
Batch size, by default 8.
Batch size, by default 8.
- Definition Classes
- MedicalNerApproach
-
def
beforeTraining(spark: SparkSession): Unit
- Definition Classes
- MedicalNerApproach → AnnotatorApproach
-
def
calculateEmbeddingsDim(sentences: Seq[WordpieceEmbeddingsSentence]): Int
- Definition Classes
- MedicalNerApproach
-
final
def
checkSchema(schema: StructType, inputAnnotatorType: String): Boolean
- Attributes
- protected
- Definition Classes
- HasInputAnnotationCols
-
def
checkValidEnvironment(spark: Option[SparkSession], scopes: Seq[String]): Unit
- Definition Classes
- CheckLicense
-
def
checkValidScope(scope: String): Unit
- Definition Classes
- CheckLicense
-
def
checkValidScopeAndEnvironment(scope: String, spark: Option[SparkSession], checkLp: Boolean): Unit
- Definition Classes
- CheckLicense
-
def
checkValidScopesAndEnvironment(scopes: Seq[String], spark: Option[SparkSession], checkLp: Boolean): Unit
- Definition Classes
- CheckLicense
-
final
def
clear(param: Param[_]): LegalNerApproach.this.type
- Definition Classes
- Params
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
val
configProtoBytes: IntArrayParam
ConfigProto from tensorflow, serialized into byte array.
ConfigProto from tensorflow, serialized into byte array. Get with config_proto.SerializeToString()
- Definition Classes
- MedicalNerParams
-
final
def
copy(extra: ParamMap): Estimator[MedicalNerModel]
- Definition Classes
- AnnotatorApproach → Estimator → PipelineStage → Params
-
def
copyValues[T <: Params](to: T, extra: ParamMap): T
- Attributes
- protected
- Definition Classes
- Params
-
val
datasetInfo: Param[String]
Descriptive information about the dataset being used.
Descriptive information about the dataset being used.
- Definition Classes
- MedicalNerParams
-
final
def
defaultCopy[T <: Params](extra: ParamMap): T
- Attributes
- protected
- Definition Classes
- Params
-
val
description: String
Trains Tensorflow based Char-CNN-BLSTM model
Trains Tensorflow based Char-CNN-BLSTM model
- Definition Classes
- MedicalNerApproach → AnnotatorApproach
-
val
dropout: FloatParam
Dropout coefficient, by default 0.5.
Dropout coefficient, by default 0.5.
The coefficient of the dropout layer. The value should be between 0.0 and 1.0. Internally, it is used by Tensorflow as:
rate = 1.0 - dropout
when adding a dropout layer on top of the recurrent layers.- Definition Classes
- MedicalNerParams
-
val
earlyStoppingCriterion: FloatParam
If set, this param specifies the criterion to stop training if performance is not improving.
If set, this param specifies the criterion to stop training if performance is not improving.
Default value is 0 which is means that early stopping is not used.
The criterion is set to F1-score if the validationSplit is greater than 0.0 (F1-socre on validation set) or testDataset is defined (F1-score on test set), otherwise it is set to model loss. The priority is as follows: - If testDataset is defined, then the criterion is set to F1-score on test set. - If validationSplit is greater than 0.0, then the criterion is set to F1-score on validation set. - Otherwise, the criterion is set to model loss.
Note that while the F1-score ranges from 0.0 to 1.0, the loss ranges from 0.0 to infinity. So, depending on which case you are in, the value you use for the criterion can be very different. For example, if validationSplit is 0.1, then a criterion of 0.01 means that if the F1-score on the validation set difference from last epoch is greater than 0.01, then the training should stop. However, if there is not validation or test set defined, then a criterion of 2.0 means that if the loss difference between the last epoch and the current one is less than 2.0, then training should stop.
- Definition Classes
- MedicalNerParams
- See also
-
val
earlyStoppingPatience: IntParam
Number of epochs to wait before early stopping if no improvement, by default 5.
Number of epochs to wait before early stopping if no improvement, by default 5.
Given the earlyStoppingCriterion, if the performance does not improve for the given number of epochs, then the training will stop. If the value is 0, then early stopping will occurs as soon as the criterion is met (no patience).
- Definition Classes
- MedicalNerParams
- See also
-
val
enableMemoryOptimizer: BooleanParam
Whether to optimize for large datasets or not.
Whether to optimize for large datasets or not. Enabling this option can slow down training.
In practice, if set to true the training will iterate over the spark Data Frame and retrieve the batches from the Data Frame iterator. This can be slower than the default option as it has to collect the batches on evey bach for every epoch, but it can be useful if the dataset is too large to fit in memory.
It controls if we want the features collected and generated at once and then feed into the network batch by batch (False) or collected and generated by batch and then feed into the network in batches (True) .
If the training data can fit to memory, then it is recommended to set this option to False (default value).
- Definition Classes
- MedicalNerParams
-
val
enableOutputLogs: BooleanParam
- Definition Classes
- EvaluationDLParams
-
val
entities: StringArrayParam
- Definition Classes
- NerApproach
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
val
evaluationLogExtended: BooleanParam
- Definition Classes
- EvaluationDLParams
-
def
explainParam(param: Param[_]): String
- Definition Classes
- Params
-
def
explainParams(): String
- Definition Classes
- Params
-
final
def
extractParamMap(): ParamMap
- Definition Classes
- Params
-
final
def
extractParamMap(extra: ParamMap): ParamMap
- Definition Classes
- Params
-
val
features: ArrayBuffer[Feature[_, _, _]]
- Definition Classes
- HasFeatures
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
fit(dataset: Dataset[_]): MedicalNerModel
- Definition Classes
- AnnotatorApproach → Estimator
-
def
fit(dataset: Dataset[_], paramMaps: Seq[ParamMap]): Seq[MedicalNerModel]
- Definition Classes
- Estimator
- Annotations
- @Since( "2.0.0" )
-
def
fit(dataset: Dataset[_], paramMap: ParamMap): MedicalNerModel
- Definition Classes
- Estimator
- Annotations
- @Since( "2.0.0" )
-
def
fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): MedicalNerModel
- Definition Classes
- Estimator
- Annotations
- @Since( "2.0.0" ) @varargs()
-
def
get[T](feature: StructFeature[T]): Option[T]
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
get[K, V](feature: MapFeature[K, V]): Option[Map[K, V]]
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
get[T](feature: SetFeature[T]): Option[Set[T]]
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
get[T](feature: ArrayFeature[T]): Option[Array[T]]
- Attributes
- protected
- Definition Classes
- HasFeatures
-
final
def
get[T](param: Param[T]): Option[T]
- Definition Classes
- Params
-
def
getBatchSize: Int
Batch size
Batch size
- Definition Classes
- MedicalNerApproach
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
getConfigProtoBytes: Option[Array[Byte]]
ConfigProto from tensorflow, serialized into byte array.
ConfigProto from tensorflow, serialized into byte array. Get with config_proto.SerializeToString()
- Definition Classes
- MedicalNerParams
-
def
getDatasetInfo: String
get descriptive information about the dataset being used
get descriptive information about the dataset being used
- Definition Classes
- MedicalNerParams
-
final
def
getDefault[T](param: Param[T]): Option[T]
- Definition Classes
- Params
-
def
getDropout: Float
Dropout coefficient
Dropout coefficient
- Definition Classes
- MedicalNerParams
-
def
getEarlyStoppingCriterion: Float
Early stopping criterion
Early stopping criterion
- Definition Classes
- MedicalNerParams
-
def
getEarlyStoppingPatience: Int
Early stopping patience
Early stopping patience
- Definition Classes
- MedicalNerParams
-
def
getEnableMemoryOptimizer: Boolean
Whether to optimize for large datasets or not.
Whether to optimize for large datasets or not. Enabling this option can slow down training.
- Definition Classes
- MedicalNerParams
-
def
getEnableOutputLogs: Boolean
- Definition Classes
- EvaluationDLParams
-
def
getIncludeAllConfidenceScores: Boolean
whether to include all confidence scores in annotation metadata or just the score of the predicted tag
whether to include all confidence scores in annotation metadata or just the score of the predicted tag
- Definition Classes
- MedicalNerParams
-
def
getIncludeConfidence: Boolean
whether to include confidence scores in annotation metadata
whether to include confidence scores in annotation metadata
- Definition Classes
- MedicalNerParams
-
def
getInputCols: Array[String]
- Definition Classes
- HasInputAnnotationCols
-
def
getLazyAnnotator: Boolean
- Definition Classes
- CanBeLazy
-
def
getLogName: String
- Definition Classes
- MedicalNerApproach → Logging
-
def
getLr: Float
Learning Rate
Learning Rate
- Definition Classes
- MedicalNerParams
-
def
getMaxEpochs: Int
- Definition Classes
- NerApproach
-
def
getMinEpochs: Int
- Definition Classes
- NerApproach
-
final
def
getOrDefault[T](param: Param[T]): T
- Definition Classes
- Params
-
final
def
getOutputCol: String
- Definition Classes
- HasOutputAnnotationCol
-
def
getOutputLogsPath: String
- Definition Classes
- EvaluationDLParams
-
def
getOverrideExistingTags: Boolean
Whether to override already learned tags when using a pretrained model to initialize the new model.
Whether to override already learned tags when using a pretrained model to initialize the new model.
- Definition Classes
- MedicalNerParams
-
def
getParam(paramName: String): Param[Any]
- Definition Classes
- Params
-
def
getPo: Float
Learning rate decay coefficient.
Learning rate decay coefficient. Real Learning Rage = lr / (1 + po * epoch)
- Definition Classes
- MedicalNerParams
-
def
getRandomSeed: Int
- Definition Classes
- NerApproach
-
def
getRandomValidationSplitPerEpoch: Boolean
Checks if a random validation split is done after each epoch or at the beginning of training only.
Checks if a random validation split is done after each epoch or at the beginning of training only.
- Definition Classes
- MedicalNerParams
-
def
getSentenceTokenIndex: Boolean
whether to include the token index for each sentence in annotation metadata.
whether to include the token index for each sentence in annotation metadata.
- Definition Classes
- MedicalNerParams
-
def
getUseBestModel: Boolean
useBestModel
useBestModel
- Definition Classes
- MedicalNerParams
-
def
getUseContrib: Boolean
Whether to use contrib LSTM Cells.
Whether to use contrib LSTM Cells. Not compatible with Windows. Might slightly improve accuracy.
- Definition Classes
- MedicalNerParams
-
def
getValidationSplit: Float
- Definition Classes
- EvaluationDLParams
-
val
graphFile: Param[String]
Path that contains the external graph file.
Path that contains the external graph file.
When specified, the provided file will be used, and no graph search will happen. The path can be a local file path, a distributed file path (HDFS, DBFS), or a cloud storage (S3).
- Definition Classes
- MedicalNerParams
-
val
graphFolder: Param[String]
Folder path that contains external graph files.
Folder path that contains external graph files.
The path can be a local file path, a distributed file path (HDFS, DBFS), or a cloud storage (S3).
When instantiating the Tensorflow model, uses this folder to search for the adequate Tensorflow graph. The search is done using the name of the
.pb
file, which should be in this format:blstn_{ntags}_{embedding_dim}_{lstm_size}_{nchars}.pb
.Then, the search follows these rules: - Embedding dimension should be exactly the same as the one used to train the model. - Number of unique tags should be greater than or equal to the number of unique tags in the training data. - Number of unique chars should be greater than or equal to the number of unique chars in the training data.
The returned file will be the first one that satisfies all the conditions.
If the name of the file is ill-formed, errors will occur during training.
- Definition Classes
- MedicalNerParams
-
final
def
hasDefault[T](param: Param[T]): Boolean
- Definition Classes
- Params
-
def
hasParam(paramName: String): Boolean
- Definition Classes
- Params
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
val
includeAllConfidenceScores: BooleanParam
Whether to include confidence scores for all tags in annotation metadata or just the score of the predicted tag, by default False.
Whether to include confidence scores for all tags in annotation metadata or just the score of the predicted tag, by default False.
Needs the includeConfidence parameter to be set to true.
Enabling this may slow down the inference speed.
- Definition Classes
- MedicalNerParams
-
val
includeConfidence: BooleanParam
Whether to include confidence scores in annotation metadata, by default False.
Whether to include confidence scores in annotation metadata, by default False.
Setting this parameter to True will add the confidence score to the metadata of the NAMED_ENTITY annotation. In addition, if includeAllConfidenceScores is set to true, then the confidence scores of all the tags will be added to the metadata, otherwise only for the predicted tag (the one with maximum score).
- Definition Classes
- MedicalNerParams
-
def
initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
def
initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
val
inputAnnotatorTypes: Array[String]
Input annotator types : DOCUMENT, TOKEN, WORD_EMBEDDINGS
Input annotator types : DOCUMENT, TOKEN, WORD_EMBEDDINGS
- Definition Classes
- MedicalNerApproach → HasInputAnnotationCols
-
final
val
inputCols: StringArrayParam
- Attributes
- protected
- Definition Classes
- HasInputAnnotationCols
-
final
def
isDefined(param: Param[_]): Boolean
- Definition Classes
- Params
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
isSet(param: Param[_]): Boolean
- Definition Classes
- Params
-
def
isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
val
labelColumn: Param[String]
- Definition Classes
- NerApproach
-
val
lazyAnnotator: BooleanParam
- Definition Classes
- CanBeLazy
-
def
log(value: ⇒ String, minLevel: Level): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
log: Logger
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logName: String
- Attributes
- protected
- Definition Classes
- Logging
-
val
logPrefix: Param[String]
A prefix that will be appended to every log, default value is empty.
A prefix that will be appended to every log, default value is empty.
- Definition Classes
- MedicalNerParams
-
def
logTrace(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
val
logger: Logger
- Attributes
- protected
- Definition Classes
- Logging
-
val
lr: FloatParam
Learning Rate, by default 0.001.
Learning Rate, by default 0.001.
- Definition Classes
- MedicalNerParams
-
val
maxEpochs: IntParam
- Definition Classes
- NerApproach
-
val
minEpochs: IntParam
- Definition Classes
- NerApproach
-
def
msgHelper(schema: StructType): String
- Attributes
- protected
- Definition Classes
- HasInputAnnotationCols
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
onTrained(model: MedicalNerModel, spark: SparkSession): Unit
- Definition Classes
- AnnotatorApproach
-
def
onWrite(path: String, spark: SparkSession): Unit
- Attributes
- protected
- Definition Classes
- ParamsAndFeaturesWritable
-
val
optionalInputAnnotatorTypes: Array[String]
- Definition Classes
- HasInputAnnotationCols
-
val
outputAnnotatorType: String
Input annotator types : NAMED_ENTITY
Input annotator types : NAMED_ENTITY
- Definition Classes
- MedicalNerApproach → HasOutputAnnotatorType
-
final
val
outputCol: Param[String]
- Attributes
- protected
- Definition Classes
- HasOutputAnnotationCol
-
def
outputLog(value: ⇒ String, uuid: String, shouldLog: Boolean, outputLogsPath: String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
val
outputLogsPath: Param[String]
- Definition Classes
- EvaluationDLParams
-
val
overrideExistingTags: BooleanParam
Controls whether to override already learned tags when using a pretrained model to initialize the new model.
Controls whether to override already learned tags when using a pretrained model to initialize the new model. A value of
true
will override existing tags.- Definition Classes
- MedicalNerParams
-
lazy val
params: Array[Param[_]]
- Definition Classes
- Params
-
val
po: FloatParam
Learning rate decay coefficient (time-based).
Learning rate decay coefficient (time-based).
This is used to calculate the decayed learning rate at each step as: lr = lr / (1 + po * epoch), meaning that the value of the learning rate is updated on each epoch. By default 0.005.
- Definition Classes
- MedicalNerParams
-
val
pretrainedModelPath: Param[String]
Path to an already trained MedicalNerModel.
Path to an already trained MedicalNerModel.
This pretrained model will be used as a starting point for training the new one. The path can be a local file path, a distributed file path (HDFS, DBFS), or a cloud storage (S3).
- Definition Classes
- MedicalNerParams
-
val
randomSeed: IntParam
- Definition Classes
- NerApproach
-
val
randomValidationSplitPerEpoch: BooleanParam
Do a random validation split after each epoch rather than at the beginning of training only.
Do a random validation split after each epoch rather than at the beginning of training only.
- Definition Classes
- MedicalNerParams
- def resumeTrainingFromModel(model: LegalNerApproach): LegalNerApproach.this.type
-
def
resumeTrainingFromModel(model: MedicalNerModel): LegalNerApproach.this.type
- Definition Classes
- MedicalNerApproach
-
def
save(path: String): Unit
- Definition Classes
- MLWritable
- Annotations
- @Since( "1.6.0" ) @throws( ... )
-
val
sentenceTokenIndex: BooleanParam
whether to include the token index for each sentence in annotation metadata, by default false.
whether to include the token index for each sentence in annotation metadata, by default false. If the value is true, the process might be slowed down.
- Definition Classes
- MedicalNerParams
-
def
set[T](feature: StructFeature[T], value: T): LegalNerApproach.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
set[K, V](feature: MapFeature[K, V], value: Map[K, V]): LegalNerApproach.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
set[T](feature: SetFeature[T], value: Set[T]): LegalNerApproach.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
set[T](feature: ArrayFeature[T], value: Array[T]): LegalNerApproach.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
final
def
set(paramPair: ParamPair[_]): LegalNerApproach.this.type
- Attributes
- protected
- Definition Classes
- Params
-
final
def
set(param: String, value: Any): LegalNerApproach.this.type
- Attributes
- protected
- Definition Classes
- Params
-
final
def
set[T](param: Param[T], value: T): LegalNerApproach.this.type
- Definition Classes
- Params
-
def
setBatchSize(batch: Int): LegalNerApproach.this.type
Batch size
Batch size
- Definition Classes
- MedicalNerApproach
-
def
setConfigProtoBytes(bytes: Array[Int]): LegalNerApproach.this.type
ConfigProto from tensorflow, serialized into byte array.
ConfigProto from tensorflow, serialized into byte array. Get with config_proto.SerializeToString()
- Definition Classes
- MedicalNerParams
-
def
setDatasetInfo(value: String): LegalNerApproach.this.type
set descriptive information about the dataset being used
set descriptive information about the dataset being used
- Definition Classes
- MedicalNerParams
-
def
setDefault[T](feature: StructFeature[T], value: () ⇒ T): LegalNerApproach.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
setDefault[K, V](feature: MapFeature[K, V], value: () ⇒ Map[K, V]): LegalNerApproach.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
setDefault[T](feature: SetFeature[T], value: () ⇒ Set[T]): LegalNerApproach.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
setDefault[T](feature: ArrayFeature[T], value: () ⇒ Array[T]): LegalNerApproach.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
final
def
setDefault(paramPairs: ParamPair[_]*): LegalNerApproach.this.type
- Attributes
- protected
- Definition Classes
- Params
-
final
def
setDefault[T](param: Param[T], value: T): LegalNerApproach.this.type
- Attributes
- protected[org.apache.spark.ml]
- Definition Classes
- Params
-
def
setDropout(dropout: Float): LegalNerApproach.this.type
Dropout coefficient
Dropout coefficient
- Definition Classes
- MedicalNerParams
-
def
setEarlyStoppingCriterion(value: Float): LegalNerApproach.this.type
- Definition Classes
- MedicalNerParams
-
def
setEarlyStoppingPatience(value: Int): LegalNerApproach.this.type
- Definition Classes
- MedicalNerParams
-
def
setEnableMemoryOptimizer(value: Boolean): LegalNerApproach.this.type
- Definition Classes
- MedicalNerParams
-
def
setEnableOutputLogs(enableOutputLogs: Boolean): LegalNerApproach.this.type
- Definition Classes
- EvaluationDLParams
-
def
setEntities(tags: Array[String]): MedicalNerApproach
- Definition Classes
- NerApproach
-
def
setEvaluationLogExtended(evaluationLogExtended: Boolean): LegalNerApproach.this.type
- Definition Classes
- EvaluationDLParams
-
def
setGraphFile(path: String): LegalNerApproach.this.type
Folder path that contain external graph files
Folder path that contain external graph files
- Definition Classes
- MedicalNerParams
-
def
setGraphFolder(path: String): LegalNerApproach.this.type
Folder path that contain external graph files
Folder path that contain external graph files
- Definition Classes
- MedicalNerParams
-
def
setIncludeAllConfidenceScores(value: Boolean): LegalNerApproach.this.type
Whether to include confidence scores in annotation metadata
Whether to include confidence scores in annotation metadata
- Definition Classes
- MedicalNerParams
-
def
setIncludeConfidence(value: Boolean): LegalNerApproach.this.type
Whether to include confidence scores for all tags rather than just for the predicted one
Whether to include confidence scores for all tags rather than just for the predicted one
- Definition Classes
- MedicalNerParams
-
final
def
setInputCols(value: String*): LegalNerApproach.this.type
- Definition Classes
- HasInputAnnotationCols
-
def
setInputCols(value: Array[String]): LegalNerApproach.this.type
- Definition Classes
- HasInputAnnotationCols
-
def
setLabelColumn(column: String): MedicalNerApproach
- Definition Classes
- NerApproach
-
def
setLazyAnnotator(value: Boolean): LegalNerApproach.this.type
- Definition Classes
- CanBeLazy
-
def
setLogPrefix(value: String): LegalNerApproach.this.type
a string prefix to be included in the logs
a string prefix to be included in the logs
- Definition Classes
- MedicalNerParams
-
def
setLr(lr: Float): LegalNerApproach.this.type
Learning Rate
Learning Rate
- Definition Classes
- MedicalNerParams
-
def
setMaxEpochs(epochs: Int): MedicalNerApproach
- Definition Classes
- NerApproach
-
def
setMinEpochs(epochs: Int): MedicalNerApproach
- Definition Classes
- NerApproach
-
final
def
setOutputCol(value: String): LegalNerApproach.this.type
- Definition Classes
- HasOutputAnnotationCol
-
def
setOutputLogsPath(path: String): LegalNerApproach.this.type
- Definition Classes
- EvaluationDLParams
-
def
setOverrideExistingTags(value: Boolean): LegalNerApproach.this.type
Controls whether to override already learned tags when using a pretrained model to initialize the new model.
Controls whether to override already learned tags when using a pretrained model to initialize the new model. A value of
true
will override existing tags.- Definition Classes
- MedicalNerParams
-
def
setPo(po: Float): LegalNerApproach.this.type
Learning rate decay coefficient.
Learning rate decay coefficient. Real Learning Rage = lr / (1 + po * epoch)
- Definition Classes
- MedicalNerParams
-
def
setPretrainedModelPath(path: String): LegalNerApproach.this.type
Set the location of an already trained MedicalNerModel, which is used as a starting point for training the new model.
Set the location of an already trained MedicalNerModel, which is used as a starting point for training the new model.
- Definition Classes
- MedicalNerParams
-
def
setRandomSeed(seed: Int): MedicalNerApproach
- Definition Classes
- NerApproach
-
def
setRandomValidationSplitPerEpoch(value: Boolean): LegalNerApproach.this.type
Do a random validation split after each epoch rather than at the beginning of training only.
Do a random validation split after each epoch rather than at the beginning of training only.
- Definition Classes
- MedicalNerParams
-
def
setSentenceTokenIndex(value: Boolean): LegalNerApproach.this.type
whether to include the token index for each sentence in annotation metadata, by default false.
whether to include the token index for each sentence in annotation metadata, by default false. If the value is true, the process might be slowed down.
- Definition Classes
- MedicalNerParams
-
def
setTagsMapping(mapping: Map[String, String]): LegalNerApproach.this.type
A map specifying how old tags are mapped to new ones.
A map specifying how old tags are mapped to new ones. Maps are specified either using a list of comma separated strings, e.g. ("OLDTAG1,NEWTAG1", "OLDTAG2,NEWTAG2", ...) or by a Map data structure.
- Definition Classes
- MedicalNerParams
-
def
setTagsMapping(mapping: ArrayList[String]): LegalNerApproach.this.type
- Definition Classes
- MedicalNerParams
-
def
setTagsMapping(mapping: Array[String]): LegalNerApproach.this.type
A map specifying how old tags are mapped to new ones.
A map specifying how old tags are mapped to new ones. Maps are specified either using a list of comma separated strings, e.g. ("OLDTAG1,NEWTAG1", "OLDTAG2,NEWTAG2", ...) or by a Map data structure. It only works if setOverrideExistingTags is false.
- Definition Classes
- MedicalNerParams
-
def
setTestDataset(er: ExternalResource): LegalNerApproach.this.type
- Definition Classes
- EvaluationDLParams
-
def
setTestDataset(path: String, readAs: Format, options: Map[String, String]): LegalNerApproach.this.type
- Definition Classes
- EvaluationDLParams
-
def
setUseBestModel(value: Boolean): LegalNerApproach.this.type
- Definition Classes
- MedicalNerParams
-
def
setUseContrib(value: Boolean): LegalNerApproach.this.type
Whether to use contrib LSTM Cells.
Whether to use contrib LSTM Cells. Not compatible with Windows. Might slightly improve accuracy.
- Definition Classes
- MedicalNerParams
-
def
setValidationSplit(validationSplit: Float): LegalNerApproach.this.type
- Definition Classes
- EvaluationDLParams
-
def
setVerbose(verbose: Level): LegalNerApproach.this.type
- Definition Classes
- EvaluationDLParams
-
def
setVerbose(verbose: Int): LegalNerApproach.this.type
- Definition Classes
- EvaluationDLParams
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
val
tagsMapping: MapFeature[String, String]
A map specifying how old tags are mapped to new ones.
A map specifying how old tags are mapped to new ones.
It only works if overrideExistingTags is set to false.
- Definition Classes
- MedicalNerParams
-
val
testDataset: ExternalResourceParam
- Definition Classes
- EvaluationDLParams
-
def
toString(): String
- Definition Classes
- Identifiable → AnyRef → Any
-
def
train(dataset: Dataset[_], recursivePipeline: Option[PipelineModel]): LegalNerModel
- Definition Classes
- LegalNerApproach → MedicalNerApproach → AnnotatorApproach
-
final
def
transformSchema(schema: StructType): StructType
- Definition Classes
- AnnotatorApproach → PipelineStage
-
def
transformSchema(schema: StructType, logging: Boolean): StructType
- Attributes
- protected
- Definition Classes
- PipelineStage
- Annotations
- @DeveloperApi()
-
val
uid: String
- Definition Classes
- LegalNerApproach → MedicalNerApproach → Identifiable
-
val
useBestModel: BooleanParam
Whether to restore and use the model from the epoch that has achieved the best performance at the end of the training.
Whether to restore and use the model from the epoch that has achieved the best performance at the end of the training.
By default false (keep the model from the last trained epoch).
The best model depends on the earlyStoppingCriterion, which can be F1-score on test/validation dataset or the value of loss.
- Definition Classes
- MedicalNerParams
-
val
useContrib: BooleanParam
whether to use contrib LSTM Cells.
whether to use contrib LSTM Cells. Not compatible with Windows. Might slightly improve accuracy. By default true.
- Definition Classes
- MedicalNerParams
-
def
validate(schema: StructType): Boolean
- Attributes
- protected
- Definition Classes
- AnnotatorApproach
-
val
validationSplit: FloatParam
- Definition Classes
- EvaluationDLParams
-
val
verbose: IntParam
- Definition Classes
- EvaluationDLParams
-
val
verboseLevel: Level
- Definition Classes
- MedicalNerApproach → Logging
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
write: MLWriter
- Definition Classes
- ParamsAndFeaturesWritable → DefaultParamsWritable → MLWritable
Inherited from MedicalNerApproach
Inherited from CheckLicense
Inherited from EvaluationDLParams
Inherited from ParamsAndFeaturesWritable
Inherited from Logging
Inherited from NerApproach[MedicalNerApproach]
Inherited from MedicalNerParams
Inherited from HasFeatures
Inherited from AnnotatorApproach[MedicalNerModel]
Inherited from CanBeLazy
Inherited from DefaultParamsWritable
Inherited from MLWritable
Inherited from HasOutputAnnotatorType
Inherited from HasOutputAnnotationCol
Inherited from HasInputAnnotationCols
Inherited from Estimator[MedicalNerModel]
Inherited from PipelineStage
Inherited from Logging
Inherited from Params
Inherited from Serializable
Inherited from Serializable
Inherited from Identifiable
Inherited from AnyRef
Inherited from Any
Parameters
Annotator types
Required input and expected output annotator types