RENerChunksFilter

Companion object RENerChunksFilter

class RENerChunksFilter extends AnnotatorModel[RENerChunksFilter] with HasSimpleAnnotate[RENerChunksFilter] with CheckLicense

Filters entities' dependency relations.

The annotator filters desired relation pairs (defined by the parameter realtionPairs), and store those on the output column. Filtering the possible relations can be useful to perform additional analysis for a specific use case (e.g., checking adverse drug reactions and drug realations), which can be the input for further analysis using a pretrained RelationExtractionDLModel.

For example, the ner_clinical NER model can identify PROBLEM, TEST, and TREATMENT entities. By using this annotator, one can filter (select) the relations between PROBLEM and TREATMENT entities only, removing any relation between the other entities, to further analyze the associations between clinical problems and treatments.

Example

Define pipeline stages to extract entities

val documenter = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val sentencer = new SentenceDetector()
  .setInputCols("document")
  .setOutputCol("sentences")

val tokenizer = new Tokenizer()
  .setInputCols("sentences")
  .setOutputCol("tokens")

val words_embedder = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
  .setInputCols("sentences", "tokens")
  .setOutputCol("embeddings")

val pos_tagger = PerceptronModel.pretrained("pos_clinical", "en", "clinical/models")
  .setInputCols("sentences", "tokens")
  .setOutputCol("pos_tags")

val dependency_parser = DependencyParserModel.pretrained("dependency_conllu", "en")
  .setInputCols("sentences", "pos_tags", "tokens")
  .setOutputCol("dependencies")

val clinical_ner_tagger = MedicalNerModel.pretrained("jsl_ner_wip_greedy_clinical","en","clinical/models")
  .setInputCols("sentences", "tokens", "embeddings")
  .setOutputCol("ner_tags")

val ner_chunker = new NerConverter()
  .setInputCols("sentences", "tokens", "ner_tags")
  .setOutputCol("ner_chunks")

Define the relation pairs and the filter

val relationPairs = Array("direction-external_body_part_or_region",
                      "external_body_part_or_region-direction",
                      "direction-internal_organ_or_component",
                      "internal_organ_or_component-direction")

val re_ner_chunk_filter = new RENerChunksFilter()
    .setInputCols("ner_chunks", "dependencies")
    .setOutputCol("re_ner_chunks")
    .setMaxSyntacticDistance(4)
    .setRelationPairs(Array("internal_organ_or_component-direction"))

val trained_pipeline = new Pipeline().setStages(Array(
  documenter,
  sentencer,
  tokenizer,
  words_embedder,
  pos_tagger,
  clinical_ner_tagger,
  ner_chunker,
  dependency_parser,
  re_ner_chunk_filter
))

val data = Seq("MRI demonstrated infarction in the upper brain stem , left cerebellum and  right basil ganglia").toDF("text")
val result = trained_pipeline.fit(data).transform(data)

Show results

result.selectExpr("explode(re_ner_chunks) as re_chunks")
  .selectExpr("re_chunks.begin", "re_chunks.result", "re_chunks.metadata.entity", "re_chunks.metadata.paired_to")
  .show(6, truncate=false)
+-----+-------------+---------------------------+---------+
|begin|result       |entity                     |paired_to|
+-----+-------------+---------------------------+---------+
|35   |upper        |Direction                  |41       |
|41   |brain stem   |Internal_organ_or_component|35       |
|35   |upper        |Direction                  |59       |
|59   |cerebellum   |Internal_organ_or_component|35       |
|35   |upper        |Direction                  |81       |
|81   |basil ganglia|Internal_organ_or_component|35       |
+-----+-------------+---------------------------+---------+

See also: RelationExtractionDLModel for BERT based extraction

Linear Supertypes

CheckLicense, HasSimpleAnnotate[RENerChunksFilter], AnnotatorModel[RENerChunksFilter], CanBeLazy, RawAnnotator[RENerChunksFilter], HasOutputAnnotationCol, HasInputAnnotationCols, HasOutputAnnotatorType, ParamsAndFeaturesWritable, HasFeatures, DefaultParamsWritable, MLWritable, Model[RENerChunksFilter], Transformer, PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any

Known Subclasses

RENerChunksFilter

Ordering

Grouped
Alphabetic
By Inheritance

Inherited

RENerChunksFilter
CheckLicense
HasSimpleAnnotate
AnnotatorModel
CanBeLazy
RawAnnotator
HasOutputAnnotationCol
HasInputAnnotationCols
HasOutputAnnotatorType
ParamsAndFeaturesWritable
HasFeatures
DefaultParamsWritable
MLWritable
Model
Transformer
PipelineStage
Logging
Params
Serializable
Serializable
Identifiable
AnyRef
Any

Hide All
Show All

Visibility

Public
All

Instance Constructors

new RENerChunksFilter()
new RENerChunksFilter(uid: String)
uid
a unique identifier for the instantiated AnnotatorModel

Type Members

type AnnotationContent = Seq[Row]

Attributes
protected
Definition Classes
AnnotatorModel
type AnnotatorType = String

Definition Classes
HasOutputAnnotatorType

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def $[T](param: Param[T]): T

Attributes
protected
Definition Classes
Params
def $$[T](feature: StructFeature[T]): T

Attributes
protected
Definition Classes
HasFeatures
def $$[K, V](feature: MapFeature[K, V]): Map[K, V]

Attributes
protected
Definition Classes
HasFeatures
def $$[T](feature: SetFeature[T]): Set[T]

Attributes
protected
Definition Classes
HasFeatures
def $$[T](feature: ArrayFeature[T]): Array[T]

Attributes
protected
Definition Classes
HasFeatures
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def _transform(dataset: Dataset[_], recursivePipeline: Option[PipelineModel]): DataFrame

Attributes
protected
Definition Classes
AnnotatorModel
def afterAnnotate(dataset: DataFrame): DataFrame

Attributes
protected
Definition Classes
AnnotatorModel
def annotate(annotations: Seq[Annotation]): Seq[Annotation]

Definition Classes
RENerChunksFilter → HasSimpleAnnotate
final def asInstanceOf[T0]: T0

Definition Classes
Any
def beforeAnnotate(dataset: Dataset[_]): Dataset[_]

Attributes
protected
Definition Classes
AnnotatorModel
final def checkSchema(schema: StructType, inputAnnotatorType: String): Boolean

Attributes
protected
Definition Classes
HasInputAnnotationCols
def checkValidEnvironment(spark: Option[SparkSession], scopes: Seq[String]): Unit

Definition Classes
CheckLicense
def checkValidScope(scope: String): Unit

Definition Classes
CheckLicense
def checkValidScopeAndEnvironment(scope: String, spark: Option[SparkSession], checkLp: Boolean): Unit

Definition Classes
CheckLicense
def checkValidScopesAndEnvironment(scopes: Seq[String], spark: Option[SparkSession], checkLp: Boolean): Unit

Definition Classes
CheckLicense
final def clear(param: Param[_]): RENerChunksFilter.this.type

Definition Classes
Params
def clone(): AnyRef

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( ... ) @native()
def copy(extra: ParamMap): RENerChunksFilter

Definition Classes
RawAnnotator → Model → Transformer → PipelineStage → Params
def copyValues[T <: Params](to: T, extra: ParamMap): T

Attributes
protected
Definition Classes
Params
final def defaultCopy[T <: Params](extra: ParamMap): T

Attributes
protected
Definition Classes
Params
def dfAnnotate: UserDefinedFunction

Definition Classes
HasSimpleAnnotate
val directionSensitive: BooleanParam
If it is true, only relations in the form of "ENTITY1-ENTITY2" will be considered, If it is false, both "ENTITY1-ENTITY2" and "ENTITY2-ENTITY1" relations will be considered,
var docLevelRelations: BooleanParam
Whether to include relations between entities from different sentences (Default: False).
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def explainParam(param: Param[_]): String

Definition Classes
Params
def explainParams(): String

Definition Classes
Params
def extraValidate(structType: StructType): Boolean

Attributes
protected
Definition Classes
RawAnnotator
def extraValidateMsg: String

Attributes
protected
Definition Classes
RawAnnotator
final def extractParamMap(): ParamMap

Definition Classes
Params
final def extractParamMap(extra: ParamMap): ParamMap

Definition Classes
Params
val features: ArrayBuffer[Feature[_, _, _]]

Definition Classes
HasFeatures
val filterByTokenDistance: IntParam
filtering criterion based on number of token between entities.
filtering criterion based on number of token between entities. Model only finds relations that have fewer than the specified number of tokens between them.
def finalize(): Unit

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
def get[T](feature: StructFeature[T]): Option[T]

Attributes
protected
Definition Classes
HasFeatures
def get[K, V](feature: MapFeature[K, V]): Option[Map[K, V]]

Attributes
protected
Definition Classes
HasFeatures
def get[T](feature: SetFeature[T]): Option[Set[T]]

Attributes
protected
Definition Classes
HasFeatures
def get[T](feature: ArrayFeature[T]): Option[Array[T]]

Attributes
protected
Definition Classes
HasFeatures
final def get[T](param: Param[T]): Option[T]

Definition Classes
Params
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
Annotations
@native()
final def getDefault[T](param: Param[T]): Option[T]

Definition Classes
Params
def getDirectionSensitive: Boolean
Gets the directionSensitive
def getDocLevelRelations: Boolean
Include relations between entities from different sentences (Default: False)
def getInputCols: Array[String]

Definition Classes
HasInputAnnotationCols
def getLazyAnnotator: Boolean

Definition Classes
CanBeLazy
def getMaxSyntacticDistance: Float
Maximal syntactic distance, as threshold (Default: 0)
final def getOrDefault[T](param: Param[T]): T

Definition Classes
Params
final def getOutputCol: String

Definition Classes
HasOutputAnnotationCol
def getParam(paramName: String): Param[Any]

Definition Classes
Params
def getRelationPairs: Array[String]
List of dash-separated pairs of named entities ("ENTITY1-ENTITY2", e.g.
List of dash-separated pairs of named entities ("ENTITY1-ENTITY2", e.g. "Biomarker-RelativeDay"), which will be processed
def getRelationPairsCaseSensitive: Boolean
Gets the case sensitivity of relation pairs
final def hasDefault[T](param: Param[T]): Boolean

Definition Classes
Params
def hasParam(paramName: String): Boolean

Definition Classes
Params
def hasParent: Boolean

Definition Classes
Model
def hashCode(): Int

Definition Classes
AnyRef → Any
Annotations
@native()
def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean

Attributes
protected
Definition Classes
Logging
def initializeLogIfNecessary(isInterpreter: Boolean): Unit

Attributes
protected
Definition Classes
Logging
val inputAnnotatorTypes: Array[AnnotatorType]
Input annotator type : CHUNK, DEPENDENCY
Input annotator type : CHUNK, DEPENDENCY

Definition Classes
RENerChunksFilter → HasInputAnnotationCols
final val inputCols: StringArrayParam

Attributes
protected
Definition Classes
HasInputAnnotationCols
final def isDefined(param: Param[_]): Boolean

Definition Classes
Params
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
final def isSet(param: Param[_]): Boolean

Definition Classes
Params
def isTraceEnabled(): Boolean

Attributes
protected
Definition Classes
Logging
val lazyAnnotator: BooleanParam

Definition Classes
CanBeLazy
def log: Logger

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logName: String

Attributes
protected
Definition Classes
Logging
def logTrace(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logTrace(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
var maxSyntacticDistance: IntParam
Maximum syntactic distance between a pair of named entities to consider them as a relation.
Maximum syntactic distance between a pair of named entities to consider them as a relation.
Increase this value if you want to consider relations between entities that are far away, but be careful as it may add false positives (Default: 0).
def msgHelper(schema: StructType): String

Attributes
protected
Definition Classes
HasInputAnnotationCols
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
Annotations
@native()
final def notifyAll(): Unit

Definition Classes
AnyRef
Annotations
@native()
def onWrite(path: String, spark: SparkSession): Unit

Attributes
protected
Definition Classes
ParamsAndFeaturesWritable
val optionalInputAnnotatorTypes: Array[String]

Definition Classes
HasInputAnnotationCols
val outputAnnotatorType: String
Output annotator type : CHUNK
Output annotator type : CHUNK

Definition Classes
RENerChunksFilter → HasOutputAnnotatorType
final val outputCol: Param[String]

Attributes
protected
Definition Classes
HasOutputAnnotationCol
lazy val params: Array[Param[_]]

Definition Classes
Params
var parent: Estimator[RENerChunksFilter]

Definition Classes
Model
def processAnnotations(annotations: Seq[Annotation]): Seq[Annotation]
var relationPairs: Param[String]
List of dash-separated pairs of named entities ("ENTITY1-ENTITY2", e.g.
List of dash-separated pairs of named entities ("ENTITY1-ENTITY2", e.g. "Biomarker-RelativeDay"), which will be processed. If not set, all relations between the entities will be considered.
var relationPairsCaseSensitive: BooleanParam
Determines whether relation pairs are case sensitive
def save(path: String): Unit

Definition Classes
MLWritable
Annotations
@Since( "1.6.0" ) @throws( ... )
def set[T](feature: StructFeature[T], value: T): RENerChunksFilter.this.type

Attributes
protected
Definition Classes
HasFeatures
def set[K, V](feature: MapFeature[K, V], value: Map[K, V]): RENerChunksFilter.this.type

Attributes
protected
Definition Classes
HasFeatures
def set[T](feature: SetFeature[T], value: Set[T]): RENerChunksFilter.this.type

Attributes
protected
Definition Classes
HasFeatures
def set[T](feature: ArrayFeature[T], value: Array[T]): RENerChunksFilter.this.type

Attributes
protected
Definition Classes
HasFeatures
final def set(paramPair: ParamPair[_]): RENerChunksFilter.this.type

Attributes
protected
Definition Classes
Params
final def set(param: String, value: Any): RENerChunksFilter.this.type

Attributes
protected
Definition Classes
Params
final def set[T](param: Param[T], value: T): RENerChunksFilter.this.type

Definition Classes
Params
def setDefault[T](feature: StructFeature[T], value: () ⇒ T): RENerChunksFilter.this.type

Attributes
protected
Definition Classes
HasFeatures
def setDefault[K, V](feature: MapFeature[K, V], value: () ⇒ Map[K, V]): RENerChunksFilter.this.type

Attributes
protected
Definition Classes
HasFeatures
def setDefault[T](feature: SetFeature[T], value: () ⇒ Set[T]): RENerChunksFilter.this.type

Attributes
protected
Definition Classes
HasFeatures
def setDefault[T](feature: ArrayFeature[T], value: () ⇒ Array[T]): RENerChunksFilter.this.type

Attributes
protected
Definition Classes
HasFeatures
final def setDefault(paramPairs: ParamPair[_]*): RENerChunksFilter.this.type

Attributes
protected
Definition Classes
Params
final def setDefault[T](param: Param[T], value: T): RENerChunksFilter.this.type

Attributes
protected[org.apache.spark.ml]
Definition Classes
Params
def setDirectionSensitive(value: Boolean): RENerChunksFilter.this.type
If it is true, only relations in the form of "ENTITY1-ENTITY2" will be considered, If it is false, both "ENTITY1-ENTITY2" and "ENTITY2-ENTITY1" relations will be considered,
def setDocLevelRelations(docLevelRelations: Boolean): RENerChunksFilter.this.type
Include relations between entities from different sentences (Default: False).
Include relations between entities from different sentences (Default: False). If it set to True, then then the sentence ids of all entities is set to 0, i.e. the sentences are merged together. The original sentence ids are preserved in the metadata as 'orig_sentence'. The dependency annotations are not affected. Mind that the model which is used to extract relations may not be able to process texts that exceed a given length. It is generally a bad idea to search for relations between entities spanning more than 100 words unless using a model which is specifically trained on such data.
def setFilterByTokenDistance(value: Int): RENerChunksFilter.this.type
filtering criterion based on number of token between entities.
filtering criterion based on number of token between entities. Model only finds relations that have fewer than the specified number of tokens between them
final def setInputCols(value: String*): RENerChunksFilter.this.type

Definition Classes
HasInputAnnotationCols
def setInputCols(value: Array[String]): RENerChunksFilter.this.type

Definition Classes
HasInputAnnotationCols
def setLazyAnnotator(value: Boolean): RENerChunksFilter.this.type

Definition Classes
CanBeLazy
def setMaxSyntacticDistance(maxSyntacticDistance: Int): RENerChunksFilter.this.type
Maximal syntactic distance, as threshold (Default: 0).
Maximal syntactic distance, as threshold (Default: 0). This constraint may not work well if setDocLevelRelations is set to True, because dependency parsers are poor at detecting syntactic relations across multiple sentences.
final def setOutputCol(value: String): RENerChunksFilter.this.type

Definition Classes
HasOutputAnnotationCol
def setParent(parent: Estimator[RENerChunksFilter]): RENerChunksFilter

Definition Classes
Model
def setRelationPairs(relationPairs: Array[String]): RENerChunksFilter.this.type
List of dash-separated pairs of named entities ("ENTITY1-ENTITY2", e.g.
List of dash-separated pairs of named entities ("ENTITY1-ENTITY2", e.g. "Biomarker-RelativeDay"), which will be processed
def setRelationPairsCaseSensitive(value: Boolean): RENerChunksFilter.this.type
Sets the case sensitivity of relation pairs
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
Identifiable → AnyRef → Any
final def transform(dataset: Dataset[_]): DataFrame

Definition Classes
AnnotatorModel → Transformer
def transform(dataset: Dataset[_], paramMap: ParamMap): DataFrame

Definition Classes
Transformer
Annotations
@Since( "2.0.0" )
def transform(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): DataFrame

Definition Classes
Transformer
Annotations
@Since( "2.0.0" ) @varargs()
final def transformSchema(schema: StructType): StructType

Definition Classes
RawAnnotator → PipelineStage
def transformSchema(schema: StructType, logging: Boolean): StructType

Attributes
protected
Definition Classes
PipelineStage
Annotations
@DeveloperApi()
val uid: String

Definition Classes
RENerChunksFilter → Identifiable
def validate(schema: StructType): Boolean

Attributes
protected
Definition Classes
RawAnnotator
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... ) @native()
def wrapColumnMetadata(col: Column): Column

Attributes
protected
Definition Classes
RawAnnotator
def write: MLWriter

Definition Classes
ParamsAndFeaturesWritable → DefaultParamsWritable → MLWritable

Packages

RENerChunksFilter

Companion object RENerChunksFilter

class RENerChunksFilter extends AnnotatorModel[RENerChunksFilter] with HasSimpleAnnotate[RENerChunksFilter] with CheckLicense

Example

Instance Constructors

Type Members

Value Members

Inherited from CheckLicense

Inherited from HasSimpleAnnotate[RENerChunksFilter]

Inherited from AnnotatorModel[RENerChunksFilter]

Inherited from CanBeLazy

Inherited from RawAnnotator[RENerChunksFilter]

Inherited from HasOutputAnnotationCol

Inherited from HasInputAnnotationCols

Inherited from HasOutputAnnotatorType

Inherited from ParamsAndFeaturesWritable

Inherited from HasFeatures

Inherited from DefaultParamsWritable

Inherited from MLWritable

Inherited from Model[RENerChunksFilter]

Inherited from Transformer

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

Parameters

Annotator types

Members

Parameter setters

Parameter getters

Packages

RENerChunksFilter 

Companion object RENerChunksFilter

class RENerChunksFilter extends AnnotatorModel[RENerChunksFilter] with HasSimpleAnnotate[RENerChunksFilter] with CheckLicense

Example

Instance Constructors

Type Members

Value Members

Inherited from CheckLicense

Inherited from HasSimpleAnnotate[RENerChunksFilter]

Inherited from AnnotatorModel[RENerChunksFilter]

Inherited from CanBeLazy

Inherited from RawAnnotator[RENerChunksFilter]

Inherited from HasOutputAnnotationCol

Inherited from HasInputAnnotationCols

Inherited from HasOutputAnnotatorType

Inherited from ParamsAndFeaturesWritable

Inherited from HasFeatures

Inherited from DefaultParamsWritable

Inherited from MLWritable

Inherited from Model[RENerChunksFilter]

Inherited from Transformer

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

Parameters

Annotator types

Members

Parameter setters

Parameter getters

RENerChunksFilter