class RENerChunksFilter extends AnnotatorModel[RENerChunksFilter] with HasSimpleAnnotate[RENerChunksFilter] with CheckLicense
Filters entities' dependency relations.
The annotator filters desired relation pairs (defined by the parameter realtionPairs), and store those on the output column. Filtering the possible relations can be useful to perform additional analysis for a specific use case (e.g., checking adverse drug reactions and drug realations), which can be the input for further analysis using a pretrained RelationExtractionDLModel.
For example, the ner_clinical
NER model can identify PROBLEM
, TEST
, and TREATMENT
entities. By using this
annotator, one can filter (select) the relations between PROBLEM
and TREATMENT
entities only, removing any relation between the other entities, to further analyze
the associations between clinical problems and treatments.
Example
Define pipeline stages to extract entities
val documenter = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val sentencer = new SentenceDetector() .setInputCols("document") .setOutputCol("sentences") val tokenizer = new Tokenizer() .setInputCols("sentences") .setOutputCol("tokens") val words_embedder = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models") .setInputCols("sentences", "tokens") .setOutputCol("embeddings") val pos_tagger = PerceptronModel.pretrained("pos_clinical", "en", "clinical/models") .setInputCols("sentences", "tokens") .setOutputCol("pos_tags") val dependency_parser = DependencyParserModel.pretrained("dependency_conllu", "en") .setInputCols("sentences", "pos_tags", "tokens") .setOutputCol("dependencies") val clinical_ner_tagger = MedicalNerModel.pretrained("jsl_ner_wip_greedy_clinical","en","clinical/models") .setInputCols("sentences", "tokens", "embeddings") .setOutputCol("ner_tags") val ner_chunker = new NerConverter() .setInputCols("sentences", "tokens", "ner_tags") .setOutputCol("ner_chunks")
Define the relation pairs and the filter
val relationPairs = Array("direction-external_body_part_or_region", "external_body_part_or_region-direction", "direction-internal_organ_or_component", "internal_organ_or_component-direction") val re_ner_chunk_filter = new RENerChunksFilter() .setInputCols("ner_chunks", "dependencies") .setOutputCol("re_ner_chunks") .setMaxSyntacticDistance(4) .setRelationPairs(Array("internal_organ_or_component-direction")) val trained_pipeline = new Pipeline().setStages(Array( documenter, sentencer, tokenizer, words_embedder, pos_tagger, clinical_ner_tagger, ner_chunker, dependency_parser, re_ner_chunk_filter )) val data = Seq("MRI demonstrated infarction in the upper brain stem , left cerebellum and right basil ganglia").toDF("text") val result = trained_pipeline.fit(data).transform(data)
Show results
result.selectExpr("explode(re_ner_chunks) as re_chunks") .selectExpr("re_chunks.begin", "re_chunks.result", "re_chunks.metadata.entity", "re_chunks.metadata.paired_to") .show(6, truncate=false) +-----+-------------+---------------------------+---------+ |begin|result |entity |paired_to| +-----+-------------+---------------------------+---------+ |35 |upper |Direction |41 | |41 |brain stem |Internal_organ_or_component|35 | |35 |upper |Direction |59 | |59 |cerebellum |Internal_organ_or_component|35 | |35 |upper |Direction |81 | |81 |basil ganglia|Internal_organ_or_component|35 | +-----+-------------+---------------------------+---------+
- See also
RelationExtractionDLModel for BERT based extraction
- Grouped
- Alphabetic
- By Inheritance
- RENerChunksFilter
- CheckLicense
- HasSimpleAnnotate
- AnnotatorModel
- CanBeLazy
- RawAnnotator
- HasOutputAnnotationCol
- HasInputAnnotationCols
- HasOutputAnnotatorType
- ParamsAndFeaturesWritable
- HasFeatures
- DefaultParamsWritable
- MLWritable
- Model
- Transformer
- PipelineStage
- Logging
- Params
- Serializable
- Serializable
- Identifiable
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
Type Members
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
$[T](param: Param[T]): T
- Attributes
- protected
- Definition Classes
- Params
-
def
$$[T](feature: StructFeature[T]): T
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
$$[K, V](feature: MapFeature[K, V]): Map[K, V]
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
$$[T](feature: SetFeature[T]): Set[T]
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
$$[T](feature: ArrayFeature[T]): Array[T]
- Attributes
- protected
- Definition Classes
- HasFeatures
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
_transform(dataset: Dataset[_], recursivePipeline: Option[PipelineModel]): DataFrame
- Attributes
- protected
- Definition Classes
- AnnotatorModel
-
def
afterAnnotate(dataset: DataFrame): DataFrame
- Attributes
- protected
- Definition Classes
- AnnotatorModel
-
def
annotate(annotations: Seq[Annotation]): Seq[Annotation]
- Definition Classes
- RENerChunksFilter → HasSimpleAnnotate
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
beforeAnnotate(dataset: Dataset[_]): Dataset[_]
- Attributes
- protected
- Definition Classes
- AnnotatorModel
-
final
def
checkSchema(schema: StructType, inputAnnotatorType: String): Boolean
- Attributes
- protected
- Definition Classes
- HasInputAnnotationCols
-
def
checkValidEnvironment(spark: Option[SparkSession], scopes: Seq[String]): Unit
- Definition Classes
- CheckLicense
-
def
checkValidScope(scope: String): Unit
- Definition Classes
- CheckLicense
-
def
checkValidScopeAndEnvironment(scope: String, spark: Option[SparkSession], checkLp: Boolean): Unit
- Definition Classes
- CheckLicense
-
def
checkValidScopesAndEnvironment(scopes: Seq[String], spark: Option[SparkSession], checkLp: Boolean): Unit
- Definition Classes
- CheckLicense
-
final
def
clear(param: Param[_]): RENerChunksFilter.this.type
- Definition Classes
- Params
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
copy(extra: ParamMap): RENerChunksFilter
- Definition Classes
- RawAnnotator → Model → Transformer → PipelineStage → Params
-
def
copyValues[T <: Params](to: T, extra: ParamMap): T
- Attributes
- protected
- Definition Classes
- Params
-
final
def
defaultCopy[T <: Params](extra: ParamMap): T
- Attributes
- protected
- Definition Classes
- Params
-
def
dfAnnotate: UserDefinedFunction
- Definition Classes
- HasSimpleAnnotate
-
val
directionSensitive: BooleanParam
If it is
true
, only relations in the form of "ENTITY1-ENTITY2" will be considered, If it isfalse
, both "ENTITY1-ENTITY2" and "ENTITY2-ENTITY1" relations will be considered, -
var
docLevelRelations: BooleanParam
Whether to include relations between entities from different sentences (Default: False).
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
explainParam(param: Param[_]): String
- Definition Classes
- Params
-
def
explainParams(): String
- Definition Classes
- Params
-
def
extraValidate(structType: StructType): Boolean
- Attributes
- protected
- Definition Classes
- RawAnnotator
-
def
extraValidateMsg: String
- Attributes
- protected
- Definition Classes
- RawAnnotator
-
final
def
extractParamMap(): ParamMap
- Definition Classes
- Params
-
final
def
extractParamMap(extra: ParamMap): ParamMap
- Definition Classes
- Params
-
val
features: ArrayBuffer[Feature[_, _, _]]
- Definition Classes
- HasFeatures
-
val
filterByTokenDistance: IntParam
filtering criterion based on number of token between entities.
filtering criterion based on number of token between entities. Model only finds relations that have fewer than the specified number of tokens between them.
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
get[T](feature: StructFeature[T]): Option[T]
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
get[K, V](feature: MapFeature[K, V]): Option[Map[K, V]]
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
get[T](feature: SetFeature[T]): Option[Set[T]]
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
get[T](feature: ArrayFeature[T]): Option[Array[T]]
- Attributes
- protected
- Definition Classes
- HasFeatures
-
final
def
get[T](param: Param[T]): Option[T]
- Definition Classes
- Params
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
getDefault[T](param: Param[T]): Option[T]
- Definition Classes
- Params
-
def
getDirectionSensitive: Boolean
Gets the directionSensitive
-
def
getDocLevelRelations: Boolean
Include relations between entities from different sentences (Default: False)
-
def
getInputCols: Array[String]
- Definition Classes
- HasInputAnnotationCols
-
def
getLazyAnnotator: Boolean
- Definition Classes
- CanBeLazy
-
def
getMaxSyntacticDistance: Float
Maximal syntactic distance, as threshold (Default: 0)
-
final
def
getOrDefault[T](param: Param[T]): T
- Definition Classes
- Params
-
final
def
getOutputCol: String
- Definition Classes
- HasOutputAnnotationCol
-
def
getParam(paramName: String): Param[Any]
- Definition Classes
- Params
-
def
getRelationPairs: Array[String]
List of dash-separated pairs of named entities ("ENTITY1-ENTITY2", e.g.
List of dash-separated pairs of named entities ("ENTITY1-ENTITY2", e.g. "Biomarker-RelativeDay"), which will be processed
-
def
getRelationPairsCaseSensitive: Boolean
Gets the case sensitivity of relation pairs
-
final
def
hasDefault[T](param: Param[T]): Boolean
- Definition Classes
- Params
-
def
hasParam(paramName: String): Boolean
- Definition Classes
- Params
-
def
hasParent: Boolean
- Definition Classes
- Model
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
def
initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
val
inputAnnotatorTypes: Array[AnnotatorType]
Input annotator type : CHUNK, DEPENDENCY
Input annotator type : CHUNK, DEPENDENCY
- Definition Classes
- RENerChunksFilter → HasInputAnnotationCols
-
final
val
inputCols: StringArrayParam
- Attributes
- protected
- Definition Classes
- HasInputAnnotationCols
-
final
def
isDefined(param: Param[_]): Boolean
- Definition Classes
- Params
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
isSet(param: Param[_]): Boolean
- Definition Classes
- Params
-
def
isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
val
lazyAnnotator: BooleanParam
- Definition Classes
- CanBeLazy
-
def
log: Logger
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logName: String
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
var
maxSyntacticDistance: IntParam
Maximum syntactic distance between a pair of named entities to consider them as a relation.
Maximum syntactic distance between a pair of named entities to consider them as a relation.
Increase this value if you want to consider relations between entities that are far away, but be careful as it may add false positives (Default: 0).
-
def
msgHelper(schema: StructType): String
- Attributes
- protected
- Definition Classes
- HasInputAnnotationCols
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
onWrite(path: String, spark: SparkSession): Unit
- Attributes
- protected
- Definition Classes
- ParamsAndFeaturesWritable
-
val
optionalInputAnnotatorTypes: Array[String]
- Definition Classes
- HasInputAnnotationCols
-
val
outputAnnotatorType: String
Output annotator type : CHUNK
Output annotator type : CHUNK
- Definition Classes
- RENerChunksFilter → HasOutputAnnotatorType
-
final
val
outputCol: Param[String]
- Attributes
- protected
- Definition Classes
- HasOutputAnnotationCol
-
lazy val
params: Array[Param[_]]
- Definition Classes
- Params
-
var
parent: Estimator[RENerChunksFilter]
- Definition Classes
- Model
- def processAnnotations(annotations: Seq[Annotation]): Seq[Annotation]
-
var
relationPairs: Param[String]
List of dash-separated pairs of named entities ("ENTITY1-ENTITY2", e.g.
List of dash-separated pairs of named entities ("ENTITY1-ENTITY2", e.g. "Biomarker-RelativeDay"), which will be processed. If not set, all relations between the entities will be considered.
-
var
relationPairsCaseSensitive: BooleanParam
Determines whether relation pairs are case sensitive
-
def
save(path: String): Unit
- Definition Classes
- MLWritable
- Annotations
- @Since( "1.6.0" ) @throws( ... )
-
def
set[T](feature: StructFeature[T], value: T): RENerChunksFilter.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
set[K, V](feature: MapFeature[K, V], value: Map[K, V]): RENerChunksFilter.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
set[T](feature: SetFeature[T], value: Set[T]): RENerChunksFilter.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
set[T](feature: ArrayFeature[T], value: Array[T]): RENerChunksFilter.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
final
def
set(paramPair: ParamPair[_]): RENerChunksFilter.this.type
- Attributes
- protected
- Definition Classes
- Params
-
final
def
set(param: String, value: Any): RENerChunksFilter.this.type
- Attributes
- protected
- Definition Classes
- Params
-
final
def
set[T](param: Param[T], value: T): RENerChunksFilter.this.type
- Definition Classes
- Params
-
def
setDefault[T](feature: StructFeature[T], value: () ⇒ T): RENerChunksFilter.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
setDefault[K, V](feature: MapFeature[K, V], value: () ⇒ Map[K, V]): RENerChunksFilter.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
setDefault[T](feature: SetFeature[T], value: () ⇒ Set[T]): RENerChunksFilter.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
setDefault[T](feature: ArrayFeature[T], value: () ⇒ Array[T]): RENerChunksFilter.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
final
def
setDefault(paramPairs: ParamPair[_]*): RENerChunksFilter.this.type
- Attributes
- protected
- Definition Classes
- Params
-
final
def
setDefault[T](param: Param[T], value: T): RENerChunksFilter.this.type
- Attributes
- protected[org.apache.spark.ml]
- Definition Classes
- Params
-
def
setDirectionSensitive(value: Boolean): RENerChunksFilter.this.type
If it is
true
, only relations in the form of "ENTITY1-ENTITY2" will be considered, If it isfalse
, both "ENTITY1-ENTITY2" and "ENTITY2-ENTITY1" relations will be considered, -
def
setDocLevelRelations(docLevelRelations: Boolean): RENerChunksFilter.this.type
Include relations between entities from different sentences (Default: False).
Include relations between entities from different sentences (Default: False). If it set to True, then then the sentence ids of all entities is set to 0, i.e. the sentences are merged together. The original sentence ids are preserved in the metadata as 'orig_sentence'. The dependency annotations are not affected. Mind that the model which is used to extract relations may not be able to process texts that exceed a given length. It is generally a bad idea to search for relations between entities spanning more than 100 words unless using a model which is specifically trained on such data.
-
def
setFilterByTokenDistance(value: Int): RENerChunksFilter.this.type
filtering criterion based on number of token between entities.
filtering criterion based on number of token between entities. Model only finds relations that have fewer than the specified number of tokens between them
-
final
def
setInputCols(value: String*): RENerChunksFilter.this.type
- Definition Classes
- HasInputAnnotationCols
-
def
setInputCols(value: Array[String]): RENerChunksFilter.this.type
- Definition Classes
- HasInputAnnotationCols
-
def
setLazyAnnotator(value: Boolean): RENerChunksFilter.this.type
- Definition Classes
- CanBeLazy
-
def
setMaxSyntacticDistance(maxSyntacticDistance: Int): RENerChunksFilter.this.type
Maximal syntactic distance, as threshold (Default: 0).
Maximal syntactic distance, as threshold (Default: 0). This constraint may not work well if setDocLevelRelations is set to True, because dependency parsers are poor at detecting syntactic relations across multiple sentences.
-
final
def
setOutputCol(value: String): RENerChunksFilter.this.type
- Definition Classes
- HasOutputAnnotationCol
-
def
setParent(parent: Estimator[RENerChunksFilter]): RENerChunksFilter
- Definition Classes
- Model
-
def
setRelationPairs(relationPairs: Array[String]): RENerChunksFilter.this.type
List of dash-separated pairs of named entities ("ENTITY1-ENTITY2", e.g.
List of dash-separated pairs of named entities ("ENTITY1-ENTITY2", e.g. "Biomarker-RelativeDay"), which will be processed
-
def
setRelationPairsCaseSensitive(value: Boolean): RENerChunksFilter.this.type
Sets the case sensitivity of relation pairs
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- Identifiable → AnyRef → Any
-
final
def
transform(dataset: Dataset[_]): DataFrame
- Definition Classes
- AnnotatorModel → Transformer
-
def
transform(dataset: Dataset[_], paramMap: ParamMap): DataFrame
- Definition Classes
- Transformer
- Annotations
- @Since( "2.0.0" )
-
def
transform(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): DataFrame
- Definition Classes
- Transformer
- Annotations
- @Since( "2.0.0" ) @varargs()
-
final
def
transformSchema(schema: StructType): StructType
- Definition Classes
- RawAnnotator → PipelineStage
-
def
transformSchema(schema: StructType, logging: Boolean): StructType
- Attributes
- protected
- Definition Classes
- PipelineStage
- Annotations
- @DeveloperApi()
-
val
uid: String
- Definition Classes
- RENerChunksFilter → Identifiable
-
def
validate(schema: StructType): Boolean
- Attributes
- protected
- Definition Classes
- RawAnnotator
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
wrapColumnMetadata(col: Column): Column
- Attributes
- protected
- Definition Classes
- RawAnnotator
-
def
write: MLWriter
- Definition Classes
- ParamsAndFeaturesWritable → DefaultParamsWritable → MLWritable
Inherited from CheckLicense
Inherited from HasSimpleAnnotate[RENerChunksFilter]
Inherited from AnnotatorModel[RENerChunksFilter]
Inherited from CanBeLazy
Inherited from RawAnnotator[RENerChunksFilter]
Inherited from HasOutputAnnotationCol
Inherited from HasInputAnnotationCols
Inherited from HasOutputAnnotatorType
Inherited from ParamsAndFeaturesWritable
Inherited from HasFeatures
Inherited from DefaultParamsWritable
Inherited from MLWritable
Inherited from Model[RENerChunksFilter]
Inherited from Transformer
Inherited from PipelineStage
Inherited from Logging
Inherited from Params
Inherited from Serializable
Inherited from Serializable
Inherited from Identifiable
Inherited from AnyRef
Inherited from Any
Parameters
Annotator types
Required input and expected output annotator types