MultiChunk2Doc

Companion object MultiChunk2Doc

class MultiChunk2Doc extends AnnotatorModel[MultiChunk2Doc] with HasSimpleAnnotate[MultiChunk2Doc] with WhiteAndBlackListParams with CheckLicense

MultiChunk2Doc annotator merges a given chunks to create a document. During document creation, a specific whitelist and blacklist filter can be applied, and case sensitivity can be adjusted.

See also

WhiteAndBlackListParams Additionally, specified prefix and suffix texts can be placed before and after the merged chunks in the resulting document. And a separator can be placed between the chunks.

Example

val document_assembler = new DocumentAssembler()
 .setInputCol("text").setOutputCol("document")

val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models")
 .setInputCols("document").setOutputCol("sentence")

val tokenizer = new Tokenizer()
 .setInputCols("sentence").setOutputCol("token")

val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
 .setInputCols(Array("sentence", "token")).setOutputCol("embeddings")

val ner = MedicalNerModel.pretrained("ner_clinical_large_langtest", "en", "clinical/models")
 .setInputCols("sentence", "token", "embeddings").setOutputCol("ner")

val ner_converter = new NerConverterInternal()
 .setInputCols(Array("sentence", "token", "ner")).setOutputCol("ner_chunk")

val multi_chunk2_doc = new MultiChunk2Doc()
 .setInputCols("ner_chunk")
 .setOutputCol("new_doc")
 .setWhiteList(Array("test"))
 .setCaseSensitive(false)
 .setPrefix("<")
 .setSuffix(">")
 .setSeparator("><")

val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, word_embeddings, ner, ner_converter, multi_chunk2_doc))
import spark.implicits._
val data = Seq(
"""A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus (T2DM),
| one prior episode of HTG-induced pancreatitis three years prior to presentation, and associated with an acute hepatitis,
| presented with a one-week history of polyuria, poor appetite, and vomiting. She was on metformin, glipizide, and dapagliflozin for T2DM and atorvastatin and gemfibrozil for HTG.
| She had been on dapagliflozin for six months at the time of presentation. Physical examination on presentation was significant for dry oral mucosa; significantly,
| her abdominal examination was benign with no tenderness, guarding, or rigidity.""".stripMargin)
.toDF("text")
val result = pipeline.fit(data).transform(data)

Show Results

result.selectExpr("explode(new_doc) as result").show(false)

+----------------------------------------------------------------------------------------------------------+
|result                                                                                                    |
+----------------------------------------------------------------------------------------------------------+
|{document, 0, 48, <Physical examination><her abdominal examination>, {document -> 0, chunk_count -> 2}, []}|
+----------------------------------------------------------------------------------------------------------+

Linear Supertypes

CheckLicense, WhiteAndBlackListParams, HasSimpleAnnotate[MultiChunk2Doc], AnnotatorModel[MultiChunk2Doc], CanBeLazy, RawAnnotator[MultiChunk2Doc], HasOutputAnnotationCol, HasInputAnnotationCols, HasOutputAnnotatorType, ParamsAndFeaturesWritable, HasFeatures, DefaultParamsWritable, MLWritable, Model[MultiChunk2Doc], Transformer, PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any

Ordering

Grouped
Alphabetic
By Inheritance

Inherited

MultiChunk2Doc
CheckLicense
WhiteAndBlackListParams
HasSimpleAnnotate
AnnotatorModel
CanBeLazy
RawAnnotator
HasOutputAnnotationCol
HasInputAnnotationCols
HasOutputAnnotatorType
ParamsAndFeaturesWritable
HasFeatures
DefaultParamsWritable
MLWritable
Model
Transformer
PipelineStage
Logging
Params
Serializable
Serializable
Identifiable
AnyRef
Any

Hide All
Show All

Visibility

Public
All

Instance Constructors

new MultiChunk2Doc()
new MultiChunk2Doc(uid: String)
uid
a unique identifier for the instantiated AnnotatorModel

Type Members

type AnnotationContent = Seq[Row]

Attributes
protected
Definition Classes
AnnotatorModel
type AnnotatorType = String

Definition Classes
HasOutputAnnotatorType

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def $[T](param: Param[T]): T

Attributes
protected
Definition Classes
Params
def $$[T](feature: StructFeature[T]): T

Attributes
protected
Definition Classes
HasFeatures
def $$[K, V](feature: MapFeature[K, V]): Map[K, V]

Attributes
protected
Definition Classes
HasFeatures
def $$[T](feature: SetFeature[T]): Set[T]

Attributes
protected
Definition Classes
HasFeatures
def $$[T](feature: ArrayFeature[T]): Array[T]

Attributes
protected
Definition Classes
HasFeatures
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def _transform(dataset: Dataset[_], recursivePipeline: Option[PipelineModel]): DataFrame

Attributes
protected
Definition Classes
AnnotatorModel
def afterAnnotate(dataset: DataFrame): DataFrame

Attributes
protected
Definition Classes
AnnotatorModel
def annotate(annotations: Seq[Annotation]): Seq[Annotation]

Definition Classes
MultiChunk2Doc → HasSimpleAnnotate
final def asInstanceOf[T0]: T0

Definition Classes
Any
final def beforeAnnotate(dataset: Dataset[_]): Dataset[_]

Definition Classes
MultiChunk2Doc → AnnotatorModel
val blackList: StringArrayParam
If defined, list of entities to ignore.
If defined, list of entities to ignore. The rest will be processed. Should not include IOB prefix on labels. Default: Array()

Definition Classes
WhiteAndBlackListParams
val caseSensitive: BooleanParam
Determines whether the definitions of the white listed and black listed entities are case sensitive or not.
Determines whether the definitions of the white listed and black listed entities are case sensitive or not. Default: true

Definition Classes
WhiteAndBlackListParams
final def checkSchema(schema: StructType, inputAnnotatorType: String): Boolean

Attributes
protected
Definition Classes
HasInputAnnotationCols
def checkValidEnvironment(spark: Option[SparkSession], scopes: Seq[String], metadata: Option[Map[String, String]]): Unit

Definition Classes
CheckLicense
def checkValidScope(scope: String): Unit

Definition Classes
CheckLicense
def checkValidScopeAndEnvironment(scope: String, spark: Option[SparkSession], checkLp: Boolean, metadata: Option[Map[String, String]]): Unit

Definition Classes
CheckLicense
def checkValidScopesAndEnvironment(scopes: Seq[String], spark: Option[SparkSession], checkLp: Boolean, metadata: Option[Map[String, String]]): Unit

Definition Classes
CheckLicense
final def clear(param: Param[_]): MultiChunk2Doc.this.type

Definition Classes
Params
def clone(): AnyRef

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( ... ) @native()
def copy(extra: ParamMap): MultiChunk2Doc

Definition Classes
RawAnnotator → Model → Transformer → PipelineStage → Params
def copyValues[T <: Params](to: T, extra: ParamMap): T

Attributes
protected
Definition Classes
Params
final def defaultCopy[T <: Params](extra: ParamMap): T

Attributes
protected
Definition Classes
Params
def dfAnnotate: UserDefinedFunction

Definition Classes
HasSimpleAnnotate
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def evaluateFilter(filter: String): Boolean
Filter annotations by blackList and whiteList, taking into account the caseSensitive param.
Filter annotations by blackList and whiteList, taking into account the caseSensitive param.

Attributes
protected
Definition Classes
WhiteAndBlackListParams
def explainParam(param: Param[_]): String

Definition Classes
Params
def explainParams(): String

Definition Classes
Params
def extraValidate(structType: StructType): Boolean

Attributes
protected
Definition Classes
RawAnnotator
def extraValidateMsg: String

Attributes
protected
Definition Classes
RawAnnotator
final def extractParamMap(): ParamMap

Definition Classes
Params
final def extractParamMap(extra: ParamMap): ParamMap

Definition Classes
Params
val features: ArrayBuffer[Feature[_, _, _]]

Definition Classes
HasFeatures
def filterByEntityField(annotation: Annotation): Boolean
Filter annotation by blackList and whiteList, taking into account the caseSensitive param.
Filter annotation by blackList and whiteList, taking into account the caseSensitive param. It filters by annotation.metadata.getOrElse("entity", annotation.metadata.getOrElse("identifier", "")).toString
returns
Boolean

Attributes
protected
Definition Classes
WhiteAndBlackListParams
def filterByEntityField(annotations: Seq[Annotation]): Seq[Annotation]
Filter annotations by blackList and whiteList, taking into account the caseSensitive param.
Filter annotations by blackList and whiteList, taking into account the caseSensitive param. It filters by annotation.metadata.getOrElse("entity", annotation.metadata.getOrElse("identifier", "")).toString

Attributes
protected
Definition Classes
WhiteAndBlackListParams
def filterByWhiteAndBlackList(annotation: Annotation): Boolean
Filter annotation by blackList and whiteList, taking into account the caseSensitive param.
Filter annotation by blackList and whiteList, taking into account the caseSensitive param. It filters by annotation.result
returns
Boolean

Attributes
protected
Definition Classes
WhiteAndBlackListParams
def filterByWhiteAndBlackList(annotations: Seq[Annotation]): Seq[Annotation]
Filter annotations by blackList and whiteList, taking into account the caseSensitive param.
Filter annotations by blackList and whiteList, taking into account the caseSensitive param. It filters by annotation.result

Attributes
protected
Definition Classes
WhiteAndBlackListParams
def finalize(): Unit

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
def get[T](feature: StructFeature[T]): Option[T]

Attributes
protected
Definition Classes
HasFeatures
def get[K, V](feature: MapFeature[K, V]): Option[Map[K, V]]

Attributes
protected
Definition Classes
HasFeatures
def get[T](feature: SetFeature[T]): Option[Set[T]]

Attributes
protected
Definition Classes
HasFeatures
def get[T](feature: ArrayFeature[T]): Option[Array[T]]

Attributes
protected
Definition Classes
HasFeatures
final def get[T](param: Param[T]): Option[T]

Definition Classes
Params
def getBlackList: Array[String]
Gets blackList param
Gets blackList param

Definition Classes
WhiteAndBlackListParams
def getCaseSensitive: Boolean
Gets caseSensitive param
Gets caseSensitive param

Definition Classes
WhiteAndBlackListParams
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
Annotations
@native()
final def getDefault[T](param: Param[T]): Option[T]

Definition Classes
Params
def getInputCols: Array[String]

Definition Classes
HasInputAnnotationCols
def getLazyAnnotator: Boolean

Definition Classes
CanBeLazy
final def getOrDefault[T](param: Param[T]): T

Definition Classes
Params
final def getOutputCol: String

Definition Classes
HasOutputAnnotationCol
def getParam(paramName: String): Param[Any]

Definition Classes
Params
def getPrefix: String
Gets the prefix to add to the result.
Gets the prefix to add to the result. Default: "".
def getSeparator: String
Gets the separator to add between the results.
Gets the separator to add between the results. Default: ",".
def getSuffix: String
Gets the suffix to add to the result.
Gets the suffix to add to the result. Default: "".
def getWhiteList: Array[String]
Gets whiteList param
Gets whiteList param

Definition Classes
WhiteAndBlackListParams
final def hasDefault[T](param: Param[T]): Boolean

Definition Classes
Params
def hasParam(paramName: String): Boolean

Definition Classes
Params
def hasParent: Boolean

Definition Classes
Model
def hashCode(): Int

Definition Classes
AnyRef → Any
Annotations
@native()
def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean

Attributes
protected
Definition Classes
Logging
def initializeLogIfNecessary(isInterpreter: Boolean): Unit

Attributes
protected
Definition Classes
Logging
val inputAnnotatorTypes: Array[AnnotatorType]
Input annotator type: CHUNK
Input annotator type: CHUNK

Definition Classes
MultiChunk2Doc → HasInputAnnotationCols
final val inputCols: StringArrayParam

Attributes
protected
Definition Classes
HasInputAnnotationCols
final def isDefined(param: Param[_]): Boolean

Definition Classes
Params
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
final def isSet(param: Param[_]): Boolean

Definition Classes
Params
def isTraceEnabled(): Boolean

Attributes
protected
Definition Classes
Logging
def isValueInList(value: String, list: Array[String]): Boolean

Attributes
protected
Definition Classes
WhiteAndBlackListParams
def isWhiteListAndBlacklistEmpty: Boolean

Attributes
protected
Definition Classes
WhiteAndBlackListParams
val lazyAnnotator: BooleanParam

Definition Classes
CanBeLazy
def log: Logger

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logName: String

Attributes
protected
Definition Classes
Logging
def logTrace(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logTrace(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def msgHelper(schema: StructType): String

Attributes
protected
Definition Classes
HasInputAnnotationCols
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
Annotations
@native()
final def notifyAll(): Unit

Definition Classes
AnyRef
Annotations
@native()
def onWrite(path: String, spark: SparkSession): Unit

Attributes
protected
Definition Classes
ParamsAndFeaturesWritable
val optionalInputAnnotatorTypes: Array[String]

Definition Classes
HasInputAnnotationCols
val outputAnnotatorType: AnnotatorType
Output annotator types: DOCUMENT
Output annotator types: DOCUMENT

Definition Classes
MultiChunk2Doc → HasOutputAnnotatorType
final val outputCol: Param[String]

Attributes
protected
Definition Classes
HasOutputAnnotationCol
lazy val params: Array[Param[_]]

Definition Classes
Params
var parent: Estimator[MultiChunk2Doc]

Definition Classes
Model
val prefix: Param[String]
Prefix to add to the result.
Prefix to add to the result. Default: "".
def save(path: String): Unit

Definition Classes
MLWritable
Annotations
@Since( "1.6.0" ) @throws( ... )
val separator: Param[String]
Separator to add between the chunks.
Separator to add between the chunks. Default: ",".
def set[T](feature: StructFeature[T], value: T): MultiChunk2Doc.this.type

Attributes
protected
Definition Classes
HasFeatures
def set[K, V](feature: MapFeature[K, V], value: Map[K, V]): MultiChunk2Doc.this.type

Attributes
protected
Definition Classes
HasFeatures
def set[T](feature: SetFeature[T], value: Set[T]): MultiChunk2Doc.this.type

Attributes
protected
Definition Classes
HasFeatures
def set[T](feature: ArrayFeature[T], value: Array[T]): MultiChunk2Doc.this.type

Attributes
protected
Definition Classes
HasFeatures
final def set(paramPair: ParamPair[_]): MultiChunk2Doc.this.type

Attributes
protected
Definition Classes
Params
final def set(param: String, value: Any): MultiChunk2Doc.this.type

Attributes
protected
Definition Classes
Params
final def set[T](param: Param[T], value: T): MultiChunk2Doc.this.type

Definition Classes
Params
def setAllowList(list: String*): MultiChunk2Doc.this.type

Definition Classes
WhiteAndBlackListParams
def setAllowList(list: Array[String]): MultiChunk2Doc.this.type

Definition Classes
WhiteAndBlackListParams
def setBlackList(list: String*): MultiChunk2Doc.this.type

Definition Classes
WhiteAndBlackListParams
def setBlackList(list: Array[String]): MultiChunk2Doc.this.type
If defined, list of entities to ignore.
If defined, list of entities to ignore. The rest will be processed. Should not include IOB prefix on labels. Default: Array()

Definition Classes
WhiteAndBlackListParams
def setCaseSensitive(value: Boolean): MultiChunk2Doc.this.type
Determines whether the definitions of the white listed and black listed entities are case sensitive or not.
Determines whether the definitions of the white listed and black listed entities are case sensitive or not. Default: true

Definition Classes
WhiteAndBlackListParams
def setDefault[T](feature: StructFeature[T], value: () ⇒ T): MultiChunk2Doc.this.type

Attributes
protected
Definition Classes
HasFeatures
def setDefault[K, V](feature: MapFeature[K, V], value: () ⇒ Map[K, V]): MultiChunk2Doc.this.type

Attributes
protected
Definition Classes
HasFeatures
def setDefault[T](feature: SetFeature[T], value: () ⇒ Set[T]): MultiChunk2Doc.this.type

Attributes
protected
Definition Classes
HasFeatures
def setDefault[T](feature: ArrayFeature[T], value: () ⇒ Array[T]): MultiChunk2Doc.this.type

Attributes
protected
Definition Classes
HasFeatures
final def setDefault(paramPairs: ParamPair[_]*): MultiChunk2Doc.this.type

Attributes
protected
Definition Classes
Params
final def setDefault[T](param: Param[T], value: T): MultiChunk2Doc.this.type

Attributes
protected[org.apache.spark.ml]
Definition Classes
Params
def setDenyList(list: String*): MultiChunk2Doc.this.type

Definition Classes
WhiteAndBlackListParams
def setDenyList(list: Array[String]): MultiChunk2Doc.this.type

Definition Classes
WhiteAndBlackListParams
final def setInputCols(value: String*): MultiChunk2Doc.this.type

Definition Classes
HasInputAnnotationCols
def setInputCols(value: Array[String]): MultiChunk2Doc.this.type

Definition Classes
HasInputAnnotationCols
def setLazyAnnotator(value: Boolean): MultiChunk2Doc.this.type

Definition Classes
CanBeLazy
final def setOutputCol(value: String): MultiChunk2Doc.this.type

Definition Classes
HasOutputAnnotationCol
def setParent(parent: Estimator[MultiChunk2Doc]): MultiChunk2Doc

Definition Classes
Model
def setPrefix(value: String): MultiChunk2Doc.this.type
Sets the prefix to add to the result.
Sets the prefix to add to the result. Default: "".
def setSeparator(value: String): MultiChunk2Doc.this.type
Sets the separator to add between the results.
Sets the separator to add between the results. Default: ",".
def setSuffix(value: String): MultiChunk2Doc.this.type
Sets the suffix to add to the result.
Sets the suffix to add to the result. Default: "".
def setWhiteList(list: String*): MultiChunk2Doc.this.type

Definition Classes
WhiteAndBlackListParams
def setWhiteList(list: Array[String]): MultiChunk2Doc.this.type
Sets the list of entities to process.
Sets the list of entities to process. The rest will be ignored. Should not include IOB prefix on labels. Default: Array()

Definition Classes
WhiteAndBlackListParams
val suffix: Param[String]
Suffix to add to the result.
Suffix to add to the result. Default: "".
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
Identifiable → AnyRef → Any
final def transform(dataset: Dataset[_]): DataFrame

Definition Classes
AnnotatorModel → Transformer
def transform(dataset: Dataset[_], paramMap: ParamMap): DataFrame

Definition Classes
Transformer
Annotations
@Since( "2.0.0" )
def transform(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): DataFrame

Definition Classes
Transformer
Annotations
@Since( "2.0.0" ) @varargs()
final def transformSchema(schema: StructType): StructType

Definition Classes
RawAnnotator → PipelineStage
def transformSchema(schema: StructType, logging: Boolean): StructType

Attributes
protected
Definition Classes
PipelineStage
Annotations
@DeveloperApi()
val uid: String

Definition Classes
MultiChunk2Doc → Identifiable
def validate(schema: StructType): Boolean

Attributes
protected
Definition Classes
RawAnnotator
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... ) @native()
val whiteList: StringArrayParam
If defined, list of entities to process.
If defined, list of entities to process. The rest will be ignored. Should not include IOB prefix on labels. Default: Array()

Definition Classes
WhiteAndBlackListParams
def wrapColumnMetadata(col: Column): Column

Attributes
protected
Definition Classes
RawAnnotator
def write: MLWriter

Definition Classes
ParamsAndFeaturesWritable → DefaultParamsWritable → MLWritable

Packages

MultiChunk2Doc

Companion object MultiChunk2Doc

class MultiChunk2Doc extends AnnotatorModel[MultiChunk2Doc] with HasSimpleAnnotate[MultiChunk2Doc] with WhiteAndBlackListParams with CheckLicense

Example

Instance Constructors

Type Members

Value Members

Inherited from CheckLicense

Inherited from WhiteAndBlackListParams

Inherited from HasSimpleAnnotate[MultiChunk2Doc]

Inherited from AnnotatorModel[MultiChunk2Doc]

Inherited from CanBeLazy

Inherited from RawAnnotator[MultiChunk2Doc]

Inherited from HasOutputAnnotationCol

Inherited from HasInputAnnotationCols

Inherited from HasOutputAnnotatorType

Inherited from ParamsAndFeaturesWritable

Inherited from HasFeatures

Inherited from DefaultParamsWritable

Inherited from MLWritable

Inherited from Model[MultiChunk2Doc]

Inherited from Transformer

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

Parameters

Annotator types

Members

Parameter setters

Parameter getters

Packages

MultiChunk2Doc 

Companion object MultiChunk2Doc

class MultiChunk2Doc extends AnnotatorModel[MultiChunk2Doc] with HasSimpleAnnotate[MultiChunk2Doc] with WhiteAndBlackListParams with CheckLicense

Example

Instance Constructors

Type Members

Value Members

Inherited from CheckLicense

Inherited from WhiteAndBlackListParams

Inherited from HasSimpleAnnotate[MultiChunk2Doc]

Inherited from AnnotatorModel[MultiChunk2Doc]

Inherited from CanBeLazy

Inherited from RawAnnotator[MultiChunk2Doc]

Inherited from HasOutputAnnotationCol

Inherited from HasInputAnnotationCols

Inherited from HasOutputAnnotatorType

Inherited from ParamsAndFeaturesWritable

Inherited from HasFeatures

Inherited from DefaultParamsWritable

Inherited from MLWritable

Inherited from Model[MultiChunk2Doc]

Inherited from Transformer

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

Parameters

Annotator types

Members

Parameter setters

Parameter getters

MultiChunk2Doc