ContextualParserApproach

Companion object ContextualParserApproach

class ContextualParserApproach extends AnnotatorApproach[ContextualParserModel] with HandleExceptionParams with CheckLicense

Creates a model, that extracts entity from a document based on user defined rules. Rule matching is based on a RegexMatcher defined in a JSON file. It is set through the parameter setJsonPath() In this JSON file, regex is defined that you want to match along with the information that will output on metadata field. Additionally, a dictionary can be provided with setDictionary to map extracted entities to a unified representation. The first column of the dictionary file should be the representation with following columns the possible matches.

Example

An example JSON file regex_token.json can look like this:

{
"entity": "Stage",
"ruleScope": "sentence",
"regex": "[cpyrau]?[T][0-9X?][a-z^cpyrau]*",
"matchScope": "token"
}

Which means to extract the stage code on a sentence level. An example pipeline could then be defined like this

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val sentenceDetector = new SentenceDetector()
  .setInputCols("document")
  .setOutputCol("sentence")

val tokenizer = new Tokenizer()
  .setInputCols("sentence")
  .setOutputCol("token")

Define the parser (json file needs to be provided)

val data = Seq("A patient has liver metastases pT1bN0M0 and the T5 primary site may be colon or... ").toDF("text")
val contextualParser = new ContextualParserApproach()
  .setInputCols(Array("sentence", "token"))
  .setOutputCol("entity")
  .setJsonPath("/path/to/regex_token.json")
  .setCaseSensitive(true)
val pipeline = new Pipeline().setStages(Array(
    documentAssembler,
    sentenceDetector,
    tokenizer,
    contextualParser
  ))

val result = pipeline.fit(data).transform(data)

Show Results

result.selectExpr("explode(entity)").show(5, truncate=false)
+-------------------------------------------------------------------------------------------------------------------------+
|col                                                                                                                      |
+-------------------------------------------------------------------------------------------------------------------------+
|{chunk, 32, 39, pT1bN0M0, {field -> Stage, normalized -> , confidence -> 1.00, sentence -> 0}, []}                 |
|{chunk, 49, 50, T5, {field -> Stage, normalized -> , confidence -> 1.00, sentence -> 0}, []}                       |
|{chunk, 148, 156, cT4bcN2M1, {field -> Stage, normalized -> , confidence -> 1.00, sentence -> 1}, []}              |
|{chunk, 189, 194, T?N3M1, {field -> Stage, normalized -> , confidence -> 1.00, sentence -> 2}, []}                 |
|{chunk, 316, 323, pT1bN0M0, {field -> Stage, normalized -> , confidence -> 1.00, sentence -> 3}, []}               |
+-------------------------------------------------------------------------------------------------------------------------+

See also: ContextualParserModel for the trained model

Linear Supertypes

CheckLicense, HandleExceptionParams, AnnotatorApproach[ContextualParserModel], CanBeLazy, DefaultParamsWritable, MLWritable, HasOutputAnnotatorType, HasOutputAnnotationCol, HasInputAnnotationCols, Estimator[ContextualParserModel], PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any

Ordering

Grouped
Alphabetic
By Inheritance

Inherited

ContextualParserApproach
CheckLicense
HandleExceptionParams
AnnotatorApproach
CanBeLazy
DefaultParamsWritable
MLWritable
HasOutputAnnotatorType
HasOutputAnnotationCol
HasInputAnnotationCols
Estimator
PipelineStage
Logging
Params
Serializable
Serializable
Identifiable
AnyRef
Any

Hide All
Show All

Visibility

Public
All

Instance Constructors

new ContextualParserApproach()
new ContextualParserApproach(uid: String)
uid
a unique identifier for the instantiated AnnotatorModel

Type Members

type AnnotatorType = String

Definition Classes
HasOutputAnnotatorType

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def $[T](param: Param[T]): T

Attributes
protected
Definition Classes
Params
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def _fit(dataset: Dataset[_], recursiveStages: Option[PipelineModel]): ContextualParserModel

Attributes
protected
Definition Classes
AnnotatorApproach
final def asInstanceOf[T0]: T0

Definition Classes
Any
def beforeTraining(spark: SparkSession): Unit

Definition Classes
AnnotatorApproach
val caseSensitive: BooleanParam
Whether to use case sensitive when matching values (Default: false)
final def checkSchema(schema: StructType, inputAnnotatorType: String): Boolean

Attributes
protected
Definition Classes
HasInputAnnotationCols
def checkValidEnvironment(spark: Option[SparkSession], scopes: Seq[String], metadata: Option[Map[String, Value]]): Unit

Definition Classes
CheckLicense
def checkValidScope(scope: String): Unit

Definition Classes
CheckLicense
def checkValidScopeAndEnvironment(scope: String, spark: Option[SparkSession], checkLp: Boolean, metadata: Option[Map[String, Value]]): Unit

Definition Classes
CheckLicense
def checkValidScopesAndEnvironment(scopes: Seq[String], spark: Option[SparkSession], checkLp: Boolean, metadata: Option[Map[String, Value]]): Unit

Definition Classes
CheckLicense
final def clear(param: Param[_]): ContextualParserApproach.this.type

Definition Classes
Params
def clone(): AnyRef

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( ... ) @native()
val completeContextMatch: BooleanParam
Whether to do an exact match of prefix and suffix.
final def copy(extra: ParamMap): Estimator[ContextualParserModel]

Definition Classes
AnnotatorApproach → Estimator → PipelineStage → Params
def copyValues[T <: Params](to: T, extra: ParamMap): T

Attributes
protected
Definition Classes
Params
final def defaultCopy[T <: Params](extra: ParamMap): T

Attributes
protected
Definition Classes
Params
val description: String

Definition Classes
ContextualParserApproach → AnnotatorApproach
val dictionary: ExternalResourceParam
Path to dictionary file in tsv or csv format, where the first column should be the representation with following columns the possible matches.
val doExceptionHandling: BooleanParam
If true, exceptions are handled.
If true, exceptions are handled. If exception causing data is passed to the model, a error annotation is emitted which has the exception message. Processing continues with the next one. This comes with a performance penalty.

Definition Classes
HandleExceptionParams
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def explainParam(param: Param[_]): String

Definition Classes
Params
def explainParams(): String

Definition Classes
Params
final def extractParamMap(): ParamMap

Definition Classes
Params
final def extractParamMap(extra: ParamMap): ParamMap

Definition Classes
Params
def finalize(): Unit

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
final def fit(dataset: Dataset[_]): ContextualParserModel

Definition Classes
AnnotatorApproach → Estimator
def fit(dataset: Dataset[_], paramMaps: Seq[ParamMap]): Seq[ContextualParserModel]

Definition Classes
Estimator
Annotations
@Since( "2.0.0" )
def fit(dataset: Dataset[_], paramMap: ParamMap): ContextualParserModel

Definition Classes
Estimator
Annotations
@Since( "2.0.0" )
def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): ContextualParserModel

Definition Classes
Estimator
Annotations
@Since( "2.0.0" ) @varargs()
final def get[T](param: Param[T]): Option[T]

Definition Classes
Params
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
Annotations
@native()
final def getDefault[T](param: Param[T]): Option[T]

Definition Classes
Params
def getEntityDefinition: Option[EntityDefinition]
def getInputCols: Array[String]

Definition Classes
HasInputAnnotationCols
def getLazyAnnotator: Boolean

Definition Classes
CanBeLazy
final def getOrDefault[T](param: Param[T]): T

Definition Classes
Params
final def getOutputCol: String

Definition Classes
HasOutputAnnotationCol
def getParam(paramName: String): Param[Any]

Definition Classes
Params
final def hasDefault[T](param: Param[T]): Boolean

Definition Classes
Params
def hasParam(paramName: String): Boolean

Definition Classes
Params
def hashCode(): Int

Definition Classes
AnyRef → Any
Annotations
@native()
def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean

Attributes
protected
Definition Classes
Logging
def initializeLogIfNecessary(isInterpreter: Boolean): Unit

Attributes
protected
Definition Classes
Logging
val inputAnnotatorTypes: Array[String]
Input annotator types: DOCUMENT, TOKEN
Input annotator types: DOCUMENT, TOKEN

Definition Classes
ContextualParserApproach → HasInputAnnotationCols
final val inputCols: StringArrayParam

Attributes
protected
Definition Classes
HasInputAnnotationCols
final def isDefined(param: Param[_]): Boolean

Definition Classes
Params
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
final def isSet(param: Param[_]): Boolean

Definition Classes
Params
def isTraceEnabled(): Boolean

Attributes
protected
Definition Classes
Logging
val jsonPath: Param[String]
Path to json file with regex rules
val lazyAnnotator: BooleanParam

Definition Classes
CanBeLazy
def log: Logger

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logName: String

Attributes
protected
Definition Classes
Logging
def logTrace(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logTrace(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def msgHelper(schema: StructType): String

Attributes
protected
Definition Classes
HasInputAnnotationCols
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
Annotations
@native()
final def notifyAll(): Unit

Definition Classes
AnyRef
Annotations
@native()
def onTrained(model: ContextualParserModel, spark: SparkSession): Unit

Definition Classes
AnnotatorApproach
val optionalContextRules: BooleanParam
When set to true, it will output regex match regardless of context matches.
val optionalInputAnnotatorTypes: Array[String]

Definition Classes
HasInputAnnotationCols
val outputAnnotatorType: AnnotatorType
Output annotator types: CHUNK
Output annotator types: CHUNK

Definition Classes
ContextualParserApproach → HasOutputAnnotatorType
final val outputCol: Param[String]

Attributes
protected
Definition Classes
HasOutputAnnotationCol
lazy val params: Array[Param[_]]

Definition Classes
Params
val prefixAndSuffixMatch: BooleanParam
Whether to match both prefix and suffix to annotate the hit (Default: false)
def save(path: String): Unit

Definition Classes
MLWritable
Annotations
@Since( "1.6.0" ) @throws( ... )
final def set(paramPair: ParamPair[_]): ContextualParserApproach.this.type

Attributes
protected
Definition Classes
Params
final def set(param: String, value: Any): ContextualParserApproach.this.type

Attributes
protected
Definition Classes
Params
final def set[T](param: Param[T], value: T): ContextualParserApproach.this.type

Definition Classes
Params
def setCaseSensitive(value: Boolean): ContextualParserApproach.this.type
Whether to use case sensitive when matching values (Default: false)
def setCompleteContextMatch(value: Boolean): ContextualParserApproach.this.type
Whether to do an exact match of prefix and suffix.
final def setDefault(paramPairs: ParamPair[_]*): ContextualParserApproach.this.type

Attributes
protected
Definition Classes
Params
final def setDefault[T](param: Param[T], value: T): ContextualParserApproach.this.type

Attributes
protected[org.apache.spark.ml]
Definition Classes
Params
def setDictionary(path: String, readAs: Format = ReadAs.TEXT, options: Map[String, String] = Map("delimiter" -> "\t")): ContextualParserApproach.this.type
Path to dictionary file in tsv or csv format.
Path to dictionary file in tsv or csv format. The first column should be the representation, following columns the possible matches.
def setDoExceptionHandling(value: Boolean): ContextualParserApproach.this.type
If true, exceptions are handled.
If true, exceptions are handled. If exception causing data is passed to the model, a error annotation is emitted which has the exception message. Processing continues with the next one. This comes with a performance penalty.

Definition Classes
HandleExceptionParams
final def setInputCols(value: String*): ContextualParserApproach.this.type

Definition Classes
HasInputAnnotationCols
def setInputCols(value: Array[String]): ContextualParserApproach.this.type

Definition Classes
HasInputAnnotationCols
def setJsonPath(value: String): ContextualParserApproach.this.type
Path to json file with regex rules
def setLazyAnnotator(value: Boolean): ContextualParserApproach.this.type

Definition Classes
CanBeLazy
def setOptionalContextRules(value: Boolean): ContextualParserApproach.this.type
When set to true, it will output regex match regardless of context matches.
final def setOutputCol(value: String): ContextualParserApproach.this.type

Definition Classes
HasOutputAnnotationCol
def setPrefixAndSuffixMatch(value: Boolean): ContextualParserApproach.this.type
Whether to match both prefix and suffix to annotate the hit (Default: false)
def setShortestContextMatch(value: Boolean): ContextualParserApproach.this.type
When set to true, it will stop finding for matches when prefix/suffix data is found in the text.
val shortestContextMatch: BooleanParam
When set to true, it will stop finding for matches when prefix/suffix data is found in the text.
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
Identifiable → AnyRef → Any
def train(dataset: Dataset[_], recursivePipeline: Option[PipelineModel]): ContextualParserModel

Definition Classes
ContextualParserApproach → AnnotatorApproach
final def transformSchema(schema: StructType): StructType

Definition Classes
AnnotatorApproach → PipelineStage
def transformSchema(schema: StructType, logging: Boolean): StructType

Attributes
protected
Definition Classes
PipelineStage
Annotations
@DeveloperApi()
val uid: String

Definition Classes
ContextualParserApproach → Identifiable
def validate(schema: StructType): Boolean

Attributes
protected
Definition Classes
AnnotatorApproach
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... ) @native()
def write: MLWriter

Definition Classes
DefaultParamsWritable → MLWritable

Packages

ContextualParserApproach

Companion object ContextualParserApproach

class ContextualParserApproach extends AnnotatorApproach[ContextualParserModel] with HandleExceptionParams with CheckLicense

Example

Instance Constructors

Type Members

Value Members

Inherited from CheckLicense

Inherited from HandleExceptionParams

Inherited from AnnotatorApproach[ContextualParserModel]

Inherited from CanBeLazy

Inherited from DefaultParamsWritable

Inherited from MLWritable

Inherited from HasOutputAnnotatorType

Inherited from HasOutputAnnotationCol

Inherited from HasInputAnnotationCols

Inherited from Estimator[ContextualParserModel]

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

Parameters

Annotator types

Members

Parameter setters

Packages

ContextualParserApproach 

Companion object ContextualParserApproach

class ContextualParserApproach extends AnnotatorApproach[ContextualParserModel] with HandleExceptionParams with CheckLicense

Example

Instance Constructors

Type Members

Value Members

Inherited from CheckLicense

Inherited from HandleExceptionParams

Inherited from AnnotatorApproach[ContextualParserModel]

Inherited from CanBeLazy

Inherited from DefaultParamsWritable

Inherited from MLWritable

Inherited from HasOutputAnnotatorType

Inherited from HasOutputAnnotationCol

Inherited from HasInputAnnotationCols

Inherited from Estimator[ContextualParserModel]

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

Parameters

Annotator types

Members

Parameter setters

ContextualParserApproach