com.johnsnowlabs.nlp.annotators.parser
StructuredJsonConverter
Companion object StructuredJsonConverter
class StructuredJsonConverter extends Transformer with HasOutputAnnotationCol with ParamsAndFeaturesWritable with CheckLicense
StructuredJsonConverter is a transformer that converts the output of the pipeline into a structured JSON format.
The output can be a string or a struct, depending on the value of the outputAsStr
parameter.
The schema of the input columns is defined by the ConverterSchema
case class, which outlines the structure of input columns.
The schema includes fields for the document identifier, document text, entities, assertions, resolutions, relations, summaries, deidentifications, and classifications.
The ConverterSchema
case class provides methods for parsing the schema from a JSON string and extracting column names from the input schema.
The transformer includes parameters for setting the schema, returning entities in relations, removing spark-nlp annotation columns, and outputting the result as a string or a structured JSON.
The transformer checks the input columns and document identifier column and ensures that the input columns are compatible with the transformer.
PipelineParser
class can be used to extract the schema from a pipeline.
- Note
document_identifier
field is empty or not found in the input schema, a random UUID will be generated. If thedocument_identifier
field is found in the input schema and It is not the column name, the value of thedocument_identifier
field will be used. If thedocument_identifier
field is found in the input schema and It is the column name, the column must be of typeStringType
.
- Grouped
- Alphabetic
- By Inheritance
- StructuredJsonConverter
- CheckLicense
- ParamsAndFeaturesWritable
- HasFeatures
- DefaultParamsWritable
- MLWritable
- HasOutputAnnotationCol
- Transformer
- PipelineStage
- Logging
- Params
- Serializable
- Serializable
- Identifiable
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
$[T](param: Param[T]): T
- Attributes
- protected
- Definition Classes
- Params
-
def
$$[T](feature: StructFeature[T]): T
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
$$[K, V](feature: MapFeature[K, V]): Map[K, V]
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
$$[T](feature: SetFeature[T]): Set[T]
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
$$[T](feature: ArrayFeature[T]): Array[T]
- Attributes
- protected
- Definition Classes
- HasFeatures
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
checkValidEnvironment(spark: Option[SparkSession], scopes: Seq[String]): Unit
- Definition Classes
- CheckLicense
-
def
checkValidScope(scope: String): Unit
- Definition Classes
- CheckLicense
-
def
checkValidScopeAndEnvironment(scope: String, spark: Option[SparkSession], checkLp: Boolean): Unit
- Definition Classes
- CheckLicense
-
def
checkValidScopesAndEnvironment(scopes: Seq[String], spark: Option[SparkSession], checkLp: Boolean): Unit
- Definition Classes
- CheckLicense
-
val
cleanAnnotations: BooleanParam
Whether to remove spark-nlp annotation columns.
Whether to remove spark-nlp annotation columns. Default:
false
-
final
def
clear(param: Param[_]): StructuredJsonConverter.this.type
- Definition Classes
- Params
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
val
converterSchema: StructFeature[ConverterSchema]
Defines the schema for converting the output of the pipeline into a structured JSON format.
Defines the schema for converting the output of the pipeline into a structured JSON format.
The schema is represented by the
ConverterSchema
case class, which outlines the structure of input columns.Fields in the schema:
document_identifier
: The identifier of the document. This column must be of typeStringType
.document_text
: The text of the document, typically created by theDocumentAssembler
annotator.entities
: Chunk columns generated by various annotators, such as theChunkMergeModel
annotator.assertions
: Assertion columns produced by annotators like theAssertionDLModel
annotator.resolutions
: The schema for resolutions. SeeResolutionSchema
for details.relations
: Relation columns created by annotators such as theRelationExtractionModel
annotator.summaries
: Summary columns generated by annotators like theMedicalSummarizer
annotator.deidentifications
: The schema for deidentifications. SeeDeIdentificationSchema
for details.classifications
: The schema for classifications. SeeClassificationSchema
for details.
See
ConverterSchema
for detailed information about the schema structure. -
def
copy(extra: ParamMap): StructuredJsonConverter
- Definition Classes
- StructuredJsonConverter → Transformer → PipelineStage → Params
-
def
copyValues[T <: Params](to: T, extra: ParamMap): T
- Attributes
- protected
- Definition Classes
- Params
-
final
def
defaultCopy[T <: Params](extra: ParamMap): T
- Attributes
- protected
- Definition Classes
- Params
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
explainParam(param: Param[_]): String
- Definition Classes
- Params
-
def
explainParams(): String
- Definition Classes
- Params
-
final
def
extractParamMap(): ParamMap
- Definition Classes
- Params
-
final
def
extractParamMap(extra: ParamMap): ParamMap
- Definition Classes
- Params
-
val
features: ArrayBuffer[Feature[_, _, _]]
- Definition Classes
- HasFeatures
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
get[T](feature: StructFeature[T]): Option[T]
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
get[K, V](feature: MapFeature[K, V]): Option[Map[K, V]]
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
get[T](feature: SetFeature[T]): Option[Set[T]]
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
get[T](feature: ArrayFeature[T]): Option[Array[T]]
- Attributes
- protected
- Definition Classes
- HasFeatures
-
final
def
get[T](param: Param[T]): Option[T]
- Definition Classes
- Params
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
getCleanAnnotations: Boolean
Get the value of cleanAnnotations param.
-
def
getConverterSchema: ConverterSchema
Get the value of converterSchema param.
-
final
def
getDefault[T](param: Param[T]): Option[T]
- Definition Classes
- Params
-
final
def
getOrDefault[T](param: Param[T]): T
- Definition Classes
- Params
-
def
getOutputAsStr: Boolean
Get the value of outputAsStr param.
-
final
def
getOutputCol: String
- Definition Classes
- HasOutputAnnotationCol
-
def
getParam(paramName: String): Param[Any]
- Definition Classes
- Params
-
def
getReturnRelationEntities: Boolean
Get the value of returnRelationEntities param.
-
final
def
hasDefault[T](param: Param[T]): Boolean
- Definition Classes
- Params
-
def
hasParam(paramName: String): Boolean
- Definition Classes
- Params
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
def
initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
final
def
isDefined(param: Param[_]): Boolean
- Definition Classes
- Params
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
isSet(param: Param[_]): Boolean
- Definition Classes
- Params
-
def
isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
def
log: Logger
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logName: String
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
onWrite(path: String, spark: SparkSession): Unit
- Attributes
- protected
- Definition Classes
- ParamsAndFeaturesWritable
-
val
outputAsStr: BooleanParam
Whether to output the result as a string or as a structured json.
Whether to output the result as a string or as a structured json. Default:
true
.When set to
true
, the output column will be a string:|-- column_name: string (nullable = true)
When set to
false
, the output column will be a struct with the following schema:|-- column_name: struct (nullable = true) |-- document_identifier: string (nullable = true) |-- document_text: array (nullable = true) | |-- element: string (containsNull = true) |-- entities: array (nullable = true) | |-- element: map (containsNull = true) | |-- key: string | |-- value: string (valueContainsNull = true) |-- assertions: array (nullable = true) | |-- element: map (containsNull = true) | |-- key: string | |-- value: string (valueContainsNull = true) |-- resolutions: array (nullable = true) | |-- element: map (containsNull = true) | |-- key: string | |-- value: string (valueContainsNull = true) |-- relations: array (nullable = true) | |-- element: map (containsNull = true) | |-- key: string | |-- value: string (valueContainsNull = true) |-- summaries: array (nullable = true) | |-- element: string (containsNull = true) |-- deidentifications: array (nullable = true) | |-- element: map (containsNull = true) | |-- key: string | |-- value: string (valueContainsNull = true) |-- classifications: array (nullable = true) | |-- element: map (containsNull = true) | |-- key: string | |-- value: string (valueContainsNull = true)
Use this parameter to control the format of the output based on your specific requirements.
-
final
val
outputCol: Param[String]
- Attributes
- protected
- Definition Classes
- HasOutputAnnotationCol
-
lazy val
params: Array[Param[_]]
- Definition Classes
- Params
-
val
returnRelationEntities: BooleanParam
Whether to return the entities in the relations or not.
Whether to return the entities in the relations or not. Default:
false
-
def
save(path: String): Unit
- Definition Classes
- MLWritable
- Annotations
- @Since( "1.6.0" ) @throws( ... )
-
def
set[T](feature: StructFeature[T], value: T): StructuredJsonConverter.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
set[K, V](feature: MapFeature[K, V], value: Map[K, V]): StructuredJsonConverter.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
set[T](feature: SetFeature[T], value: Set[T]): StructuredJsonConverter.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
set[T](feature: ArrayFeature[T], value: Array[T]): StructuredJsonConverter.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
final
def
set(paramPair: ParamPair[_]): StructuredJsonConverter.this.type
- Attributes
- protected
- Definition Classes
- Params
-
final
def
set(param: String, value: Any): StructuredJsonConverter.this.type
- Attributes
- protected
- Definition Classes
- Params
-
final
def
set[T](param: Param[T], value: T): StructuredJsonConverter.this.type
- Definition Classes
- Params
- def setCleanAnnotations(value: Boolean): StructuredJsonConverter.this.type
-
def
setConverterSchema(value: ConverterSchema): StructuredJsonConverter.this.type
Set the value of converterSchema param.
-
def
setConverterSchemaAsStr(value: String): StructuredJsonConverter.this.type
Set the value of converterSchema param as a sting.
Set the value of converterSchema param as a sting.
setConverterSchemaAsStr( """{ | "document_identifier": "id", | "document_text": "document", | "entities": ["ner_chunk"], | "assertions": [], | "resolutions": [], | "relations": [], | "summaries": [], | "deidentifications": [ | { | "original": "sentence", | "obfuscated": "obfuscated", | "masked": "" | }], | "classifications": [] |}""".stripMargin )
Example: -
def
setDefault[T](feature: StructFeature[T], value: () ⇒ T): StructuredJsonConverter.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
setDefault[K, V](feature: MapFeature[K, V], value: () ⇒ Map[K, V]): StructuredJsonConverter.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
setDefault[T](feature: SetFeature[T], value: () ⇒ Set[T]): StructuredJsonConverter.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
setDefault[T](feature: ArrayFeature[T], value: () ⇒ Array[T]): StructuredJsonConverter.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
final
def
setDefault(paramPairs: ParamPair[_]*): StructuredJsonConverter.this.type
- Attributes
- protected
- Definition Classes
- Params
-
final
def
setDefault[T](param: Param[T], value: T): StructuredJsonConverter.this.type
- Attributes
- protected[org.apache.spark.ml]
- Definition Classes
- Params
-
def
setOutputAsStr(value: Boolean): StructuredJsonConverter.this.type
Set whether to output the result as a string or as a structured json.
Set whether to output the result as a string or as a structured json. Set the value of outputAsStr param. Default:
true
-
final
def
setOutputCol(value: String): StructuredJsonConverter.this.type
- Definition Classes
- HasOutputAnnotationCol
-
def
setReturnRelationEntities(value: Boolean): StructuredJsonConverter.this.type
Set whether to return the entities in the relations or not.
Set whether to return the entities in the relations or not. Default:
false
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- Identifiable → AnyRef → Any
-
def
transform(dataset: Dataset[_]): DataFrame
- Definition Classes
- StructuredJsonConverter → Transformer
-
def
transform(dataset: Dataset[_], paramMap: ParamMap): DataFrame
- Definition Classes
- Transformer
- Annotations
- @Since( "2.0.0" )
-
def
transform(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): DataFrame
- Definition Classes
- Transformer
- Annotations
- @Since( "2.0.0" ) @varargs()
-
final
def
transformSchema(schema: StructType): StructType
- Definition Classes
- StructuredJsonConverter → PipelineStage
-
def
transformSchema(schema: StructType, logging: Boolean): StructType
- Attributes
- protected
- Definition Classes
- PipelineStage
- Annotations
- @DeveloperApi()
-
val
uid: String
- Definition Classes
- StructuredJsonConverter → Identifiable
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
write: MLWriter
- Definition Classes
- ParamsAndFeaturesWritable → DefaultParamsWritable → MLWritable