class MedicalNerApproach extends AnnotatorApproach[MedicalNerModel] with NerApproach[MedicalNerApproach] with Logging with ParamsAndFeaturesWritable with EvaluationDLParams with CheckLicense

Trains generic NER models based on Neural Networks.

The architecture of the neural network is a Char CNNs - BiLSTM - CRF that achieves state-of-the-art in most datasets. For instantiated/pretrained models, see MedicalNerModel

The training data should be a labeled Spark Dataset, in the CoNLL 2003 IOB format with Annotation type columns. The data should have columns of type DOCUMENT, TOKEN, WORD_EMBEDDINGS and an additional label column of annotator type NAMED_ENTITY.

Excluding the label, this can be done with, for example, the annotators SentenceDetector, Tokenizer, and WordEmbeddingsModel (any embeddings can be chosen, e.g. BertEmbeddings for BERT based embeddings).

For extended examples of usage, see the Spark NLP Workshop.

Notes

Both DocumentAssembler and SentenceDetector annotators are annotators that output the DOCUMENT annotation type. Thus, any of them can be used as the first annotators in a pipeline.

Example

First extract the prerequisites for the MedicalNerApproach

val document = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")
val sentenceDetector = new SentenceDetector()
  .setInputCols("document")
  .setOutputCol("sentence")
val tokenizer = new Tokenizer()
  .setInputCols("sentence")
  .setOutputCol("token")
val embeddings = BertEmbeddings.pretrained()
  .setInputCols("sentence", "token")
  .setOutputCol("embeddings")

Then define the NER annotator

val nerTagger = new MedicalNerApproach()
  .setInputCols("sentence", "token", "embeddings")
  .setLabelColumn("label")
  .setOutputCol("ner")
  .setMaxEpochs(10)
  .setLr(0.005f)
  .setPo(0.005f)
  .setBatchSize(32)
  .setValidationSplit(0.1f)

Then the training can start

val pipeline = new Pipeline().setStages(Array(
  document,
  sentenceDetector,
  tokenizer,
  embeddings,
  nerTagger
))

trainingData = conll.readDataset(spark, "path/to/train_data.conll")
pipelineModel = pipeline.fit(trainingData)
Linear Supertypes
CheckLicense, EvaluationDLParams, ParamsAndFeaturesWritable, HasFeatures, Logging, NerApproach[MedicalNerApproach], AnnotatorApproach[MedicalNerModel], CanBeLazy, DefaultParamsWritable, MLWritable, HasOutputAnnotatorType, HasOutputAnnotationCol, HasInputAnnotationCols, Estimator[MedicalNerModel], PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any
Ordering
  1. Grouped
  2. Alphabetic
  3. By Inheritance
Inherited
  1. MedicalNerApproach
  2. CheckLicense
  3. EvaluationDLParams
  4. ParamsAndFeaturesWritable
  5. HasFeatures
  6. Logging
  7. NerApproach
  8. AnnotatorApproach
  9. CanBeLazy
  10. DefaultParamsWritable
  11. MLWritable
  12. HasOutputAnnotatorType
  13. HasOutputAnnotationCol
  14. HasInputAnnotationCols
  15. Estimator
  16. PipelineStage
  17. Logging
  18. Params
  19. Serializable
  20. Serializable
  21. Identifiable
  22. AnyRef
  23. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new MedicalNerApproach()
  2. new MedicalNerApproach(uid: String)

    uid

    a unique identifier for the instantiated AnnotatorModel

Type Members

  1. type AnnotatorType = String
    Definition Classes
    HasOutputAnnotatorType

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def $[T](param: Param[T]): T
    Attributes
    protected
    Definition Classes
    Params
  4. def $$[T](feature: StructFeature[T]): T
    Attributes
    protected
    Definition Classes
    HasFeatures
  5. def $$[K, V](feature: MapFeature[K, V]): Map[K, V]
    Attributes
    protected
    Definition Classes
    HasFeatures
  6. def $$[T](feature: SetFeature[T]): Set[T]
    Attributes
    protected
    Definition Classes
    HasFeatures
  7. def $$[T](feature: ArrayFeature[T]): Array[T]
    Attributes
    protected
    Definition Classes
    HasFeatures
  8. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  9. def _fit(dataset: Dataset[_], recursiveStages: Option[PipelineModel]): MedicalNerModel
    Attributes
    protected
    Definition Classes
    AnnotatorApproach
  10. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  11. val batchSize: IntParam

    Batch size, by default 8.

  12. def beforeTraining(spark: SparkSession): Unit
    Definition Classes
    MedicalNerApproach → AnnotatorApproach
  13. def calculateEmbeddingsDim(sentences: Seq[WordpieceEmbeddingsSentence]): Int
  14. final def checkSchema(schema: StructType, inputAnnotatorType: String): Boolean
    Attributes
    protected
    Definition Classes
    HasInputAnnotationCols
  15. def checkValidEnvironment(spark: Option[SparkSession], scopes: Seq[String]): Unit
    Definition Classes
    CheckLicense
  16. def checkValidScope(scope: String): Unit
    Definition Classes
    CheckLicense
  17. def checkValidScopeAndEnvironment(scope: String, spark: Option[SparkSession], checkLp: Boolean): Unit
    Definition Classes
    CheckLicense
  18. def checkValidScopesAndEnvironment(scopes: Seq[String], spark: Option[SparkSession], checkLp: Boolean): Unit
    Definition Classes
    CheckLicense
  19. final def clear(param: Param[_]): MedicalNerApproach.this.type
    Definition Classes
    Params
  20. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  21. val configProtoBytes: IntArrayParam

    ConfigProto from tensorflow, serialized into byte array.

    ConfigProto from tensorflow, serialized into byte array. Get with config_proto.SerializeToString()

  22. final def copy(extra: ParamMap): Estimator[MedicalNerModel]
    Definition Classes
    AnnotatorApproach → Estimator → PipelineStage → Params
  23. def copyValues[T <: Params](to: T, extra: ParamMap): T
    Attributes
    protected
    Definition Classes
    Params
  24. final def defaultCopy[T <: Params](extra: ParamMap): T
    Attributes
    protected
    Definition Classes
    Params
  25. val description: String

    Trains Tensorflow based Char-CNN-BLSTM model

    Trains Tensorflow based Char-CNN-BLSTM model

    Definition Classes
    MedicalNerApproach → AnnotatorApproach
  26. val dropout: FloatParam

    Dropout coefficient, by default 0.5.

    Dropout coefficient, by default 0.5.

    The coefficient of the dropout layer. The value should be between 0.0 and 1.0. Internally, it is used by Tensorflow as: rate = 1.0 - dropout when adding a dropout layer on top of the recurrent layers.

  27. val earlyStoppingCriterion: FloatParam

    If set, this param specifies the criterion to stop training if performance is not improving.

    If set, this param specifies the criterion to stop training if performance is not improving.

    Default value is 0 which is means that early stopping is not used.

    The criterion is set to F1-score if the validationSplit is greater than 0.0 (F1-socre on validation set) or testDataset is defined (F1-score on test set), otherwise it is set to model loss. The priority is as follows: - If testDataset is defined, then the criterion is set to F1-score on test set. - If validationSplit is greater than 0.0, then the criterion is set to F1-score on validation set. - Otherwise, the criterion is set to model loss.

    Note that while the F1-score ranges from 0.0 to 1.0, the loss ranges from 0.0 to infinity. So, depending on which case you are in, the value you use for the criterion can be very different. For example, if validationSplit is 0.1, then a criterion of 0.01 means that if the F1-score on the validation set difference from last epoch is greater than 0.01, then the training should stop. However, if there is not validation or test set defined, then a criterion of 2.0 means that if the loss difference between the last epoch and the current one is less than 2.0, then training should stop.

    See also

    earlyStoppingPatience.

  28. val earlyStoppingPatience: IntParam

    Number of epochs to wait before early stopping if no improvement, by default 5.

    Number of epochs to wait before early stopping if no improvement, by default 5.

    Given the earlyStoppingCriterion, if the performance does not improve for the given number of epochs, then the training will stop. If the value is 0, then early stopping will occurs as soon as the criterion is met (no patience).

    See also

    earlyStoppingCriterion.

  29. val enableMemoryOptimizer: BooleanParam

    Whether to optimize for large datasets or not.

    Whether to optimize for large datasets or not. Enabling this option can slow down training.

    In practice, if set to true the training will iterate over the spark Data Frame and retrieve the batches from the Data Frame iterator. This can be slower than the default option as it has to collect the batches on evey bach for every epoch, but it can be useful if the dataset is too large to fit in memory.

    It controls if we want the features collected and generated at once and then feed into the network batch by batch (False) or collected and generated by batch and then feed into the network in batches (True) .

    If the training data can fit to memory, then it is recommended to set this option to False (default value).

  30. val enableOutputLogs: BooleanParam
    Definition Classes
    EvaluationDLParams
  31. val entities: StringArrayParam
    Definition Classes
    NerApproach
  32. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  33. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  34. val evaluationLogExtended: BooleanParam
    Definition Classes
    EvaluationDLParams
  35. def explainParam(param: Param[_]): String
    Definition Classes
    Params
  36. def explainParams(): String
    Definition Classes
    Params
  37. final def extractParamMap(): ParamMap
    Definition Classes
    Params
  38. final def extractParamMap(extra: ParamMap): ParamMap
    Definition Classes
    Params
  39. val features: ArrayBuffer[Feature[_, _, _]]
    Definition Classes
    HasFeatures
  40. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  41. final def fit(dataset: Dataset[_]): MedicalNerModel
    Definition Classes
    AnnotatorApproach → Estimator
  42. def fit(dataset: Dataset[_], paramMaps: Seq[ParamMap]): Seq[MedicalNerModel]
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" )
  43. def fit(dataset: Dataset[_], paramMap: ParamMap): MedicalNerModel
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" )
  44. def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): MedicalNerModel
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" ) @varargs()
  45. def get[T](feature: StructFeature[T]): Option[T]
    Attributes
    protected
    Definition Classes
    HasFeatures
  46. def get[K, V](feature: MapFeature[K, V]): Option[Map[K, V]]
    Attributes
    protected
    Definition Classes
    HasFeatures
  47. def get[T](feature: SetFeature[T]): Option[Set[T]]
    Attributes
    protected
    Definition Classes
    HasFeatures
  48. def get[T](feature: ArrayFeature[T]): Option[Array[T]]
    Attributes
    protected
    Definition Classes
    HasFeatures
  49. final def get[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  50. def getBatchSize: Int

    Batch size

  51. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  52. def getConfigProtoBytes: Option[Array[Byte]]

    ConfigProto from tensorflow, serialized into byte array.

    ConfigProto from tensorflow, serialized into byte array. Get with config_proto.SerializeToString()

  53. final def getDefault[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  54. def getDropout: Float

    Dropout coefficient

  55. def getEarlyStoppingCriterion: Float

    Early stopping criterion

  56. def getEarlyStoppingPatience: Int

    Early stopping patience

  57. def getEnableMemoryOptimizer: Boolean

    Whether to optimize for large datasets or not.

    Whether to optimize for large datasets or not. Enabling this option can slow down training.

  58. def getEnableOutputLogs: Boolean
    Definition Classes
    EvaluationDLParams
  59. def getIncludeAllConfidenceScores: Boolean

    whether to include all confidence scores in annotation metadata or just the score of the predicted tag

  60. def getIncludeConfidence: Boolean

    whether to include confidence scores in annotation metadata

  61. def getInputCols: Array[String]
    Definition Classes
    HasInputAnnotationCols
  62. def getLazyAnnotator: Boolean
    Definition Classes
    CanBeLazy
  63. def getLogName: String
    Definition Classes
    MedicalNerApproach → Logging
  64. def getLr: Float

    Learning Rate

  65. def getMaxEpochs: Int
    Definition Classes
    NerApproach
  66. def getMinEpochs: Int
    Definition Classes
    NerApproach
  67. final def getOrDefault[T](param: Param[T]): T
    Definition Classes
    Params
  68. final def getOutputCol: String
    Definition Classes
    HasOutputAnnotationCol
  69. def getOutputLogsPath: String
    Definition Classes
    EvaluationDLParams
  70. def getOverrideExistingTags: Boolean

    Whether to override already learned tags when using a pretrained model to initialize the new model.

  71. def getParam(paramName: String): Param[Any]
    Definition Classes
    Params
  72. def getPo: Float

    Learning rate decay coefficient.

    Learning rate decay coefficient. Real Learning Rage = lr / (1 + po * epoch)

  73. def getRandomSeed: Int
    Definition Classes
    NerApproach
  74. def getRandomValidationSplitPerEpoch: Boolean

    Checks if a random validation split is done after each epoch or at the beginning of training only.

  75. def getSentenceTokenIndex: Boolean

    whether to include the token index for each sentence in annotation metadata.

  76. def getUseBestModel: Boolean

    useBestModel

  77. def getUseContrib: Boolean

    Whether to use contrib LSTM Cells.

    Whether to use contrib LSTM Cells. Not compatible with Windows. Might slightly improve accuracy.

  78. def getValidationSplit: Float
    Definition Classes
    EvaluationDLParams
  79. val graphFile: Param[String]

    Path that contains the external graph file.

    Path that contains the external graph file.

    When specified, the provided file will be used, and no graph search will happen. The path can be a local file path, a distributed file path (HDFS, DBFS), or a cloud storage (S3).

  80. val graphFolder: Param[String]

    Folder path that contains external graph files.

    Folder path that contains external graph files.

    The path can be a local file path, a distributed file path (HDFS, DBFS), or a cloud storage (S3).

    When instantiating the Tensorflow model, uses this folder to search for the adequate Tensorflow graph. The search is done using the name of the .pb file, which should be in this format: blstn_{ntags}_{embedding_dim}_{lstm_size}_{nchars}.pb.

    Then, the search follows these rules: - Embedding dimension should be exactly the same as the one used to train the model. - Number of unique tags should be greater than or equal to the number of unique tags in the training data. - Number of unique chars should be greater than or equal to the number of unique chars in the training data.

    The returned file will be the first one that satisfies all the conditions.

    If the name of the file is ill-formed, errors will occur during training.

  81. final def hasDefault[T](param: Param[T]): Boolean
    Definition Classes
    Params
  82. def hasParam(paramName: String): Boolean
    Definition Classes
    Params
  83. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  84. val includeAllConfidenceScores: BooleanParam

    Whether to include confidence scores for all tags in annotation metadata or just the score of the predicted tag, by default False.

    Whether to include confidence scores for all tags in annotation metadata or just the score of the predicted tag, by default False.

    Needs the includeConfidence parameter to be set to true.

    Enabling this may slow down the inference speed.

  85. val includeConfidence: BooleanParam

    Whether to include confidence scores in annotation metadata, by default False.

    Whether to include confidence scores in annotation metadata, by default False.

    Setting this parameter to True will add the confidence score to the metadata of the NAMED_ENTITY annotation. In addition, if includeAllConfidenceScores is set to true, then the confidence scores of all the tags will be added to the metadata, otherwise only for the predicted tag (the one with maximum score).

  86. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  87. def initializeLogIfNecessary(isInterpreter: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Logging
  88. val inputAnnotatorTypes: Array[String]

    Input annotator types : DOCUMENT, TOKEN, WORD_EMBEDDINGS

    Input annotator types : DOCUMENT, TOKEN, WORD_EMBEDDINGS

    Definition Classes
    MedicalNerApproach → HasInputAnnotationCols
  89. final val inputCols: StringArrayParam
    Attributes
    protected
    Definition Classes
    HasInputAnnotationCols
  90. final def isDefined(param: Param[_]): Boolean
    Definition Classes
    Params
  91. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  92. final def isSet(param: Param[_]): Boolean
    Definition Classes
    Params
  93. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  94. val labelColumn: Param[String]
    Definition Classes
    NerApproach
  95. val lazyAnnotator: BooleanParam
    Definition Classes
    CanBeLazy
  96. def log(value: ⇒ String, minLevel: Level): Unit
    Attributes
    protected
    Definition Classes
    Logging
  97. def log: Logger
    Attributes
    protected
    Definition Classes
    Logging
  98. def logDebug(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  99. def logDebug(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  100. def logError(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  101. def logError(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  102. def logInfo(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  103. def logInfo(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  104. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  105. val logPrefix: Param[String]

    A prefix that will be appended to every log, default value is empty.

  106. def logTrace(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  107. def logTrace(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  108. def logWarning(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  109. def logWarning(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  110. val logger: Logger
    Attributes
    protected
    Definition Classes
    Logging
  111. val lr: FloatParam

    Learning Rate, by default 0.001.

  112. val maxEpochs: IntParam
    Definition Classes
    NerApproach
  113. val minEpochs: IntParam
    Definition Classes
    NerApproach
  114. def msgHelper(schema: StructType): String
    Attributes
    protected
    Definition Classes
    HasInputAnnotationCols
  115. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  116. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  117. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  118. def onTrained(model: MedicalNerModel, spark: SparkSession): Unit
    Definition Classes
    AnnotatorApproach
  119. def onWrite(path: String, spark: SparkSession): Unit
    Attributes
    protected
    Definition Classes
    ParamsAndFeaturesWritable
  120. val optionalInputAnnotatorTypes: Array[String]
    Definition Classes
    HasInputAnnotationCols
  121. val outputAnnotatorType: String

    Input annotator types : NAMED_ENTITY

    Input annotator types : NAMED_ENTITY

    Definition Classes
    MedicalNerApproach → HasOutputAnnotatorType
  122. final val outputCol: Param[String]
    Attributes
    protected
    Definition Classes
    HasOutputAnnotationCol
  123. def outputLog(value: ⇒ String, uuid: String, shouldLog: Boolean, outputLogsPath: String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  124. val outputLogsPath: Param[String]
    Definition Classes
    EvaluationDLParams
  125. val overrideExistingTags: BooleanParam

    Controls whether to override already learned tags when using a pretrained model to initialize the new model.

    Controls whether to override already learned tags when using a pretrained model to initialize the new model. A value of true will override existing tags.

  126. lazy val params: Array[Param[_]]
    Definition Classes
    Params
  127. val po: FloatParam

    Learning rate decay coefficient (time-based).

    Learning rate decay coefficient (time-based).

    This is used to calculate the decayed learning rate at each step as: lr = lr / (1 + po * epoch), meaning that the value of the learning rate is updated on each epoch. By default 0.005.

  128. val pretrainedModelPath: Param[String]

    Path to an already trained MedicalNerModel.

    Path to an already trained MedicalNerModel.

    This pretrained model will be used as a starting point for training the new one. The path can be a local file path, a distributed file path (HDFS, DBFS), or a cloud storage (S3).

  129. val randomSeed: IntParam
    Definition Classes
    NerApproach
  130. val randomValidationSplitPerEpoch: BooleanParam

    Do a random validation split after each epoch rather than at the beginning of training only.

  131. def resumeTrainingFromModel(model: MedicalNerModel): MedicalNerApproach.this.type
  132. def save(path: String): Unit
    Definition Classes
    MLWritable
    Annotations
    @Since( "1.6.0" ) @throws( ... )
  133. val sentenceTokenIndex: BooleanParam

    whether to include the token index for each sentence in annotation metadata, by default false.

    whether to include the token index for each sentence in annotation metadata, by default false. If the value is true, the process might be slowed down.

  134. def set[T](feature: StructFeature[T], value: T): MedicalNerApproach.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  135. def set[K, V](feature: MapFeature[K, V], value: Map[K, V]): MedicalNerApproach.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  136. def set[T](feature: SetFeature[T], value: Set[T]): MedicalNerApproach.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  137. def set[T](feature: ArrayFeature[T], value: Array[T]): MedicalNerApproach.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  138. final def set(paramPair: ParamPair[_]): MedicalNerApproach.this.type
    Attributes
    protected
    Definition Classes
    Params
  139. final def set(param: String, value: Any): MedicalNerApproach.this.type
    Attributes
    protected
    Definition Classes
    Params
  140. final def set[T](param: Param[T], value: T): MedicalNerApproach.this.type
    Definition Classes
    Params
  141. def setBatchSize(batch: Int): MedicalNerApproach.this.type

    Batch size

  142. def setConfigProtoBytes(bytes: Array[Int]): MedicalNerApproach.this.type

    ConfigProto from tensorflow, serialized into byte array.

    ConfigProto from tensorflow, serialized into byte array. Get with config_proto.SerializeToString()

  143. def setDefault[T](feature: StructFeature[T], value: () ⇒ T): MedicalNerApproach.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  144. def setDefault[K, V](feature: MapFeature[K, V], value: () ⇒ Map[K, V]): MedicalNerApproach.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  145. def setDefault[T](feature: SetFeature[T], value: () ⇒ Set[T]): MedicalNerApproach.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  146. def setDefault[T](feature: ArrayFeature[T], value: () ⇒ Array[T]): MedicalNerApproach.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  147. final def setDefault(paramPairs: ParamPair[_]*): MedicalNerApproach.this.type
    Attributes
    protected
    Definition Classes
    Params
  148. final def setDefault[T](param: Param[T], value: T): MedicalNerApproach.this.type
    Attributes
    protected
    Definition Classes
    Params
  149. def setDropout(dropout: Float): MedicalNerApproach.this.type

    Dropout coefficient

  150. def setEarlyStoppingCriterion(value: Float): MedicalNerApproach.this.type

  151. def setEarlyStoppingPatience(value: Int): MedicalNerApproach.this.type

  152. def setEnableMemoryOptimizer(value: Boolean): MedicalNerApproach.this.type
  153. def setEnableOutputLogs(enableOutputLogs: Boolean): MedicalNerApproach.this.type
    Definition Classes
    EvaluationDLParams
  154. def setEntities(tags: Array[String]): MedicalNerApproach
    Definition Classes
    NerApproach
  155. def setEvaluationLogExtended(evaluationLogExtended: Boolean): MedicalNerApproach.this.type
    Definition Classes
    EvaluationDLParams
  156. def setGraphFile(path: String): MedicalNerApproach.this.type

    Folder path that contain external graph files

  157. def setGraphFolder(path: String): MedicalNerApproach.this.type

    Folder path that contain external graph files

  158. def setIncludeAllConfidenceScores(value: Boolean): MedicalNerApproach.this.type

    Whether to include confidence scores in annotation metadata

  159. def setIncludeConfidence(value: Boolean): MedicalNerApproach.this.type

    Whether to include confidence scores for all tags rather than just for the predicted one

  160. final def setInputCols(value: String*): MedicalNerApproach.this.type
    Definition Classes
    HasInputAnnotationCols
  161. def setInputCols(value: Array[String]): MedicalNerApproach.this.type
    Definition Classes
    HasInputAnnotationCols
  162. def setLabelColumn(column: String): MedicalNerApproach
    Definition Classes
    NerApproach
  163. def setLazyAnnotator(value: Boolean): MedicalNerApproach.this.type
    Definition Classes
    CanBeLazy
  164. def setLogPrefix(value: String): MedicalNerApproach.this.type

    a string prefix to be included in the logs

  165. def setLr(lr: Float): MedicalNerApproach.this.type

    Learning Rate

  166. def setMaxEpochs(epochs: Int): MedicalNerApproach
    Definition Classes
    NerApproach
  167. def setMinEpochs(epochs: Int): MedicalNerApproach
    Definition Classes
    NerApproach
  168. final def setOutputCol(value: String): MedicalNerApproach.this.type
    Definition Classes
    HasOutputAnnotationCol
  169. def setOutputLogsPath(path: String): MedicalNerApproach.this.type
    Definition Classes
    EvaluationDLParams
  170. def setOverrideExistingTags(value: Boolean): MedicalNerApproach.this.type

    Controls whether to override already learned tags when using a pretrained model to initialize the new model.

    Controls whether to override already learned tags when using a pretrained model to initialize the new model. A value of true will override existing tags.

  171. def setPo(po: Float): MedicalNerApproach.this.type

    Learning rate decay coefficient.

    Learning rate decay coefficient. Real Learning Rage = lr / (1 + po * epoch)

  172. def setPretrainedModelPath(path: String): MedicalNerApproach.this.type

    Set the location of an already trained MedicalNerModel, which is used as a starting point for training the new model.

  173. def setRandomSeed(seed: Int): MedicalNerApproach
    Definition Classes
    NerApproach
  174. def setRandomValidationSplitPerEpoch(value: Boolean): MedicalNerApproach.this.type

    Do a random validation split after each epoch rather than at the beginning of training only.

  175. def setSentenceTokenIndex(value: Boolean): MedicalNerApproach.this.type

    whether to include the token index for each sentence in annotation metadata, by default false.

    whether to include the token index for each sentence in annotation metadata, by default false. If the value is true, the process might be slowed down.

  176. def setTagsMapping(mapping: Map[String, String]): MedicalNerApproach.this.type

    A map specifying how old tags are mapped to new ones.

    A map specifying how old tags are mapped to new ones. Maps are specified either using a list of comma separated strings, e.g. ("OLDTAG1,NEWTAG1", "OLDTAG2,NEWTAG2", ...) or by a Map data structure.

  177. def setTagsMapping(mapping: ArrayList[String]): MedicalNerApproach.this.type
  178. def setTagsMapping(mapping: Array[String]): MedicalNerApproach.this.type

    A map specifying how old tags are mapped to new ones.

    A map specifying how old tags are mapped to new ones. Maps are specified either using a list of comma separated strings, e.g. ("OLDTAG1,NEWTAG1", "OLDTAG2,NEWTAG2", ...) or by a Map data structure. It only works if setOverrideExistingTags is false.

  179. def setTestDataset(er: ExternalResource): MedicalNerApproach.this.type
    Definition Classes
    EvaluationDLParams
  180. def setTestDataset(path: String, readAs: Format, options: Map[String, String]): MedicalNerApproach.this.type
    Definition Classes
    EvaluationDLParams
  181. def setUseBestModel(value: Boolean): MedicalNerApproach.this.type

  182. def setUseContrib(value: Boolean): MedicalNerApproach.this.type

    Whether to use contrib LSTM Cells.

    Whether to use contrib LSTM Cells. Not compatible with Windows. Might slightly improve accuracy.

  183. def setValidationSplit(validationSplit: Float): MedicalNerApproach.this.type
    Definition Classes
    EvaluationDLParams
  184. def setVerbose(verbose: Level): MedicalNerApproach.this.type
    Definition Classes
    EvaluationDLParams
  185. def setVerbose(verbose: Int): MedicalNerApproach.this.type
    Definition Classes
    EvaluationDLParams
  186. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  187. val tagsMapping: MapFeature[String, String]

    A map specifying how old tags are mapped to new ones.

    A map specifying how old tags are mapped to new ones.

    It only works if overrideExistingTags is set to false.

  188. val testDataset: ExternalResourceParam
    Definition Classes
    EvaluationDLParams
  189. def toString(): String
    Definition Classes
    Identifiable → AnyRef → Any
  190. def train(dataset: Dataset[_], recursivePipeline: Option[PipelineModel]): MedicalNerModel
    Definition Classes
    MedicalNerApproach → AnnotatorApproach
  191. final def transformSchema(schema: StructType): StructType
    Definition Classes
    AnnotatorApproach → PipelineStage
  192. def transformSchema(schema: StructType, logging: Boolean): StructType
    Attributes
    protected
    Definition Classes
    PipelineStage
    Annotations
    @DeveloperApi()
  193. val uid: String
    Definition Classes
    MedicalNerApproach → Identifiable
  194. val useBestModel: BooleanParam

    Whether to restore and use the model from the epoch that has achieved the best performance at the end of the training.

    Whether to restore and use the model from the epoch that has achieved the best performance at the end of the training.

    By default false (keep the model from the last trained epoch).

    The best model depends on the earlyStoppingCriterion, which can be F1-score on test/validation dataset or the value of loss.

  195. val useContrib: BooleanParam

    whether to use contrib LSTM Cells.

    whether to use contrib LSTM Cells. Not compatible with Windows. Might slightly improve accuracy. By default true.

  196. def validate(schema: StructType): Boolean
    Attributes
    protected
    Definition Classes
    AnnotatorApproach
  197. val validationSplit: FloatParam
    Definition Classes
    EvaluationDLParams
  198. val verbose: IntParam
    Definition Classes
    EvaluationDLParams
  199. val verboseLevel: Level
    Definition Classes
    MedicalNerApproach → Logging
  200. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  201. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  202. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  203. def write: MLWriter
    Definition Classes
    ParamsAndFeaturesWritable → DefaultParamsWritable → MLWritable

Inherited from CheckLicense

Inherited from EvaluationDLParams

Inherited from ParamsAndFeaturesWritable

Inherited from HasFeatures

Inherited from Logging

Inherited from NerApproach[MedicalNerApproach]

Inherited from AnnotatorApproach[MedicalNerModel]

Inherited from CanBeLazy

Inherited from DefaultParamsWritable

Inherited from MLWritable

Inherited from HasOutputAnnotatorType

Inherited from HasOutputAnnotationCol

Inherited from HasInputAnnotationCols

Inherited from Estimator[MedicalNerModel]

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

Parameters

Annotator types

Required input and expected output annotator types

Members

Parameter setters

Parameter getters