Packages

class StructuredJsonConverter extends Transformer with HasOutputAnnotationCol with ParamsAndFeaturesWritable with CheckLicense

StructuredJsonConverter is a transformer that converts the output of the pipeline into a structured JSON format. The output can be a string or a struct, depending on the value of the outputAsStr parameter. The schema of the input columns is defined by the ConverterSchema case class, which outlines the structure of input columns. The schema includes fields for the document identifier, document text, entities, assertions, resolutions, relations, summaries, deidentifications, and classifications. The ConverterSchema case class provides methods for parsing the schema from a JSON string and extracting column names from the input schema. The transformer includes parameters for setting the schema, returning entities in relations, removing spark-nlp annotation columns, and outputting the result as a string or a structured JSON. The transformer checks the input columns and document identifier column and ensures that the input columns are compatible with the transformer. PipelineParser class can be used to extract the schema from a pipeline.

Note

document_identifier field is empty or not found in the input schema, a random UUID will be generated. If the document_identifier field is found in the input schema and It is not the column name, the value of the document_identifier field will be used. If the document_identifier field is found in the input schema and It is the column name, the column must be of type StringType.

Linear Supertypes
CheckLicense, ParamsAndFeaturesWritable, HasFeatures, DefaultParamsWritable, MLWritable, HasOutputAnnotationCol, Transformer, PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any
Ordering
  1. Grouped
  2. Alphabetic
  3. By Inheritance
Inherited
  1. StructuredJsonConverter
  2. CheckLicense
  3. ParamsAndFeaturesWritable
  4. HasFeatures
  5. DefaultParamsWritable
  6. MLWritable
  7. HasOutputAnnotationCol
  8. Transformer
  9. PipelineStage
  10. Logging
  11. Params
  12. Serializable
  13. Serializable
  14. Identifiable
  15. AnyRef
  16. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new StructuredJsonConverter()
  2. new StructuredJsonConverter(uid: String)

    uid

    a unique identifier for the instantiated AnnotatorModel

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def $[T](param: Param[T]): T
    Attributes
    protected
    Definition Classes
    Params
  4. def $$[T](feature: StructFeature[T]): T
    Attributes
    protected
    Definition Classes
    HasFeatures
  5. def $$[K, V](feature: MapFeature[K, V]): Map[K, V]
    Attributes
    protected
    Definition Classes
    HasFeatures
  6. def $$[T](feature: SetFeature[T]): Set[T]
    Attributes
    protected
    Definition Classes
    HasFeatures
  7. def $$[T](feature: ArrayFeature[T]): Array[T]
    Attributes
    protected
    Definition Classes
    HasFeatures
  8. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  9. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  10. def checkValidEnvironment(spark: Option[SparkSession], scopes: Seq[String]): Unit
    Definition Classes
    CheckLicense
  11. def checkValidScope(scope: String): Unit
    Definition Classes
    CheckLicense
  12. def checkValidScopeAndEnvironment(scope: String, spark: Option[SparkSession], checkLp: Boolean): Unit
    Definition Classes
    CheckLicense
  13. def checkValidScopesAndEnvironment(scopes: Seq[String], spark: Option[SparkSession], checkLp: Boolean): Unit
    Definition Classes
    CheckLicense
  14. val cleanAnnotations: BooleanParam

    Whether to remove spark-nlp annotation columns.

    Whether to remove spark-nlp annotation columns. Default: false

  15. final def clear(param: Param[_]): StructuredJsonConverter.this.type
    Definition Classes
    Params
  16. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  17. val converterSchema: StructFeature[ConverterSchema]

    Defines the schema for converting the output of the pipeline into a structured JSON format.

    Defines the schema for converting the output of the pipeline into a structured JSON format.

    The schema is represented by the ConverterSchema case class, which outlines the structure of input columns.

    Fields in the schema:

    • document_identifier: The identifier of the document. This column must be of type StringType.
    • document_text: The text of the document, typically created by the DocumentAssembler annotator.
    • entities: Chunk columns generated by various annotators, such as the ChunkMergeModel annotator.
    • assertions: Assertion columns produced by annotators like the AssertionDLModel annotator.
    • resolutions: The schema for resolutions. See ResolutionSchema for details.
    • relations: Relation columns created by annotators such as the RelationExtractionModel annotator.
    • summaries: Summary columns generated by annotators like the MedicalSummarizer annotator.
    • deidentifications: The schema for deidentifications. See DeIdentificationSchema for details.
    • classifications: The schema for classifications. See ClassificationSchema for details.

    See ConverterSchema for detailed information about the schema structure.

  18. def copy(extra: ParamMap): StructuredJsonConverter
    Definition Classes
    StructuredJsonConverter → Transformer → PipelineStage → Params
  19. def copyValues[T <: Params](to: T, extra: ParamMap): T
    Attributes
    protected
    Definition Classes
    Params
  20. final def defaultCopy[T <: Params](extra: ParamMap): T
    Attributes
    protected
    Definition Classes
    Params
  21. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  22. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  23. def explainParam(param: Param[_]): String
    Definition Classes
    Params
  24. def explainParams(): String
    Definition Classes
    Params
  25. final def extractParamMap(): ParamMap
    Definition Classes
    Params
  26. final def extractParamMap(extra: ParamMap): ParamMap
    Definition Classes
    Params
  27. val features: ArrayBuffer[Feature[_, _, _]]
    Definition Classes
    HasFeatures
  28. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  29. def get[T](feature: StructFeature[T]): Option[T]
    Attributes
    protected
    Definition Classes
    HasFeatures
  30. def get[K, V](feature: MapFeature[K, V]): Option[Map[K, V]]
    Attributes
    protected
    Definition Classes
    HasFeatures
  31. def get[T](feature: SetFeature[T]): Option[Set[T]]
    Attributes
    protected
    Definition Classes
    HasFeatures
  32. def get[T](feature: ArrayFeature[T]): Option[Array[T]]
    Attributes
    protected
    Definition Classes
    HasFeatures
  33. final def get[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  34. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  35. def getCleanAnnotations: Boolean

    Get the value of cleanAnnotations param.

  36. def getConverterSchema: ConverterSchema

    Get the value of converterSchema param.

  37. final def getDefault[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  38. final def getOrDefault[T](param: Param[T]): T
    Definition Classes
    Params
  39. def getOutputAsStr: Boolean

    Get the value of outputAsStr param.

  40. final def getOutputCol: String
    Definition Classes
    HasOutputAnnotationCol
  41. def getParam(paramName: String): Param[Any]
    Definition Classes
    Params
  42. def getParentSource: String

    Get the value of parentSource param.

  43. def getReturnRelationEntities: Boolean

    Get the value of returnRelationEntities param.

  44. def getSentenceColumn: String
  45. final def hasDefault[T](param: Param[T]): Boolean
    Definition Classes
    Params
  46. def hasParam(paramName: String): Boolean
    Definition Classes
    Params
  47. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  48. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  49. def initializeLogIfNecessary(isInterpreter: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Logging
  50. final def isDefined(param: Param[_]): Boolean
    Definition Classes
    Params
  51. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  52. final def isSet(param: Param[_]): Boolean
    Definition Classes
    Params
  53. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  54. def log: Logger
    Attributes
    protected
    Definition Classes
    Logging
  55. def logDebug(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  56. def logDebug(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  57. def logError(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  58. def logError(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  59. def logInfo(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  60. def logInfo(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  61. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  62. def logTrace(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  63. def logTrace(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  64. def logWarning(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  65. def logWarning(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  66. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  67. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  68. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  69. def onWrite(path: String, spark: SparkSession): Unit
    Attributes
    protected
    Definition Classes
    ParamsAndFeaturesWritable
  70. val outputAsStr: BooleanParam

    Whether to output the result as a string or as a structured json.

    Whether to output the result as a string or as a structured json. Default: true.

    When set to true, the output column will be a string:

    |-- column_name: string (nullable = true)

    When set to false, the output column will be a struct with the following schema:

    |-- column_name: struct (nullable = true)
         |-- document_identifier: string (nullable = true)
         |-- document_text: array (nullable = true)
         |    |-- element: string (containsNull = true)
         |-- entities: array (nullable = true)
         |    |-- element: map (containsNull = true)
         |        |-- key: string
         |        |-- value: string (valueContainsNull = true)
         |-- assertions: array (nullable = true)
         |    |-- element: map (containsNull = true)
         |        |-- key: string
         |        |-- value: string (valueContainsNull = true)
         |-- resolutions: array (nullable = true)
         |    |-- element: map (containsNull = true)
         |        |-- key: string
         |        |-- value: string (valueContainsNull = true)
         |-- relations: array (nullable = true)
         |    |-- element: map (containsNull = true)
         |        |-- key: string
         |        |-- value: string (valueContainsNull = true)
         |-- summaries: array (nullable = true)
         |    |-- element: string (containsNull = true)
         |-- deidentifications: array (nullable = true)
         |    |-- element: map (containsNull = true)
         |        |-- key: string
         |        |-- value: string (valueContainsNull = true)
         |-- classifications: array (nullable = true)
         |    |-- element: map (containsNull = true)
         |        |-- key: string
         |        |-- value: string (valueContainsNull = true)

    Use this parameter to control the format of the output based on your specific requirements.

  71. final val outputCol: Param[String]
    Attributes
    protected
    Definition Classes
    HasOutputAnnotationCol
  72. lazy val params: Array[Param[_]]
    Definition Classes
    Params
  73. val parentSource: Param[String]

    The parent source of the output.

    The parent source of the output. Default: "". Available options: chunk and "".

  74. val returnRelationEntities: BooleanParam

    Whether to return the entities in the relations or not.

    Whether to return the entities in the relations or not. Default: false

  75. def save(path: String): Unit
    Definition Classes
    MLWritable
    Annotations
    @Since( "1.6.0" ) @throws( ... )
  76. val sentenceColumn: Param[String]
  77. def set[T](feature: StructFeature[T], value: T): StructuredJsonConverter.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  78. def set[K, V](feature: MapFeature[K, V], value: Map[K, V]): StructuredJsonConverter.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  79. def set[T](feature: SetFeature[T], value: Set[T]): StructuredJsonConverter.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  80. def set[T](feature: ArrayFeature[T], value: Array[T]): StructuredJsonConverter.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  81. final def set(paramPair: ParamPair[_]): StructuredJsonConverter.this.type
    Attributes
    protected
    Definition Classes
    Params
  82. final def set(param: String, value: Any): StructuredJsonConverter.this.type
    Attributes
    protected
    Definition Classes
    Params
  83. final def set[T](param: Param[T], value: T): StructuredJsonConverter.this.type
    Definition Classes
    Params
  84. def setCleanAnnotations(value: Boolean): StructuredJsonConverter.this.type
  85. def setConverterSchema(value: ConverterSchema): StructuredJsonConverter.this.type

    Set the value of converterSchema param.

  86. def setConverterSchemaAsStr(value: String): StructuredJsonConverter.this.type

    Set the value of converterSchema param as a sting.

    Set the value of converterSchema param as a sting.

    Example:
    1. setConverterSchemaAsStr(
      """{
      | "document_identifier": "id",
      | "document_text": "document",
      | "entities": ["ner_chunk"],
      | "assertions": [],
      | "resolutions": [],
      | "relations": [],
      | "summaries": [],
      | "deidentifications": [
      |   {
      |     "original": "sentence",
      |     "obfuscated": "obfuscated",
      |     "masked": ""
      |   }],
      | "classifications": []
      |}""".stripMargin
      )
  87. def setDefault[T](feature: StructFeature[T], value: () ⇒ T): StructuredJsonConverter.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  88. def setDefault[K, V](feature: MapFeature[K, V], value: () ⇒ Map[K, V]): StructuredJsonConverter.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  89. def setDefault[T](feature: SetFeature[T], value: () ⇒ Set[T]): StructuredJsonConverter.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  90. def setDefault[T](feature: ArrayFeature[T], value: () ⇒ Array[T]): StructuredJsonConverter.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  91. final def setDefault(paramPairs: ParamPair[_]*): StructuredJsonConverter.this.type
    Attributes
    protected
    Definition Classes
    Params
  92. final def setDefault[T](param: Param[T], value: T): StructuredJsonConverter.this.type
    Attributes
    protected[org.apache.spark.ml]
    Definition Classes
    Params
  93. def setOutputAsStr(value: Boolean): StructuredJsonConverter.this.type

    Set whether to output the result as a string or as a structured json.

    Set whether to output the result as a string or as a structured json. Set the value of outputAsStr param. Default: true

  94. final def setOutputCol(value: String): StructuredJsonConverter.this.type
    Definition Classes
    HasOutputAnnotationCol
  95. def setParentSource(value: String): StructuredJsonConverter.this.type

    Set the value of parentSource param.

    Set the value of parentSource param. Default: "" Available options: chunk and "".

  96. def setReturnRelationEntities(value: Boolean): StructuredJsonConverter.this.type

    Set whether to return the entities in the relations or not.

    Set whether to return the entities in the relations or not. Default: false

  97. def setSentenceColumn(value: String): StructuredJsonConverter.this.type
  98. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  99. def toString(): String
    Definition Classes
    Identifiable → AnyRef → Any
  100. def transform(dataset: Dataset[_]): DataFrame
    Definition Classes
    StructuredJsonConverter → Transformer
  101. def transform(dataset: Dataset[_], paramMap: ParamMap): DataFrame
    Definition Classes
    Transformer
    Annotations
    @Since( "2.0.0" )
  102. def transform(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): DataFrame
    Definition Classes
    Transformer
    Annotations
    @Since( "2.0.0" ) @varargs()
  103. final def transformSchema(schema: StructType): StructType
    Definition Classes
    StructuredJsonConverter → PipelineStage
  104. def transformSchema(schema: StructType, logging: Boolean): StructType
    Attributes
    protected
    Definition Classes
    PipelineStage
    Annotations
    @DeveloperApi()
  105. val uid: String
    Definition Classes
    StructuredJsonConverter → Identifiable
  106. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  107. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  108. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  109. def write: MLWriter
    Definition Classes
    ParamsAndFeaturesWritable → DefaultParamsWritable → MLWritable

Inherited from CheckLicense

Inherited from ParamsAndFeaturesWritable

Inherited from HasFeatures

Inherited from DefaultParamsWritable

Inherited from MLWritable

Inherited from HasOutputAnnotationCol

Inherited from Transformer

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

Parameters

Members

Parameter setters

Parameter getters