Packages

class StructuredJsonConverter extends Transformer with HasOutputAnnotationCol with ParamsAndFeaturesWritable with CheckLicense

StructuredJsonConverter is a transformer that converts the output of the pipeline into a structured JSON format. The output can be a string or a struct, depending on the value of the outputAsStr parameter. The schema of the input columns is defined by the ConverterSchema case class, which outlines the structure of input columns. The schema includes fields for the document identifier, document text, entities, assertions, resolutions, relations, summaries, deidentifications, and classifications. The ConverterSchema case class provides methods for parsing the schema from a JSON string and extracting column names from the input schema. The transformer includes parameters for setting the schema, returning entities in relations, removing spark-nlp annotation columns, and outputting the result as a string or a structured JSON. The transformer checks the input columns and document identifier column and ensures that the input columns are compatible with the transformer. PipelineParser class can be used to extract the schema from a pipeline.

Note

document_identifier field is empty or not found in the input schema, a random UUID will be generated. If the document_identifier field is found in the input schema and It is not the column name, the value of the document_identifier field will be used. If the document_identifier field is found in the input schema and It is the column name, the column must be of type StringType.

Linear Supertypes
CheckLicense, ParamsAndFeaturesWritable, HasFeatures, DefaultParamsWritable, MLWritable, HasOutputAnnotationCol, Transformer, PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any
Ordering
  1. Grouped
  2. Alphabetic
  3. By Inheritance
Inherited
  1. StructuredJsonConverter
  2. CheckLicense
  3. ParamsAndFeaturesWritable
  4. HasFeatures
  5. DefaultParamsWritable
  6. MLWritable
  7. HasOutputAnnotationCol
  8. Transformer
  9. PipelineStage
  10. Logging
  11. Params
  12. Serializable
  13. Serializable
  14. Identifiable
  15. AnyRef
  16. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new StructuredJsonConverter()
  2. new StructuredJsonConverter(uid: String)

    uid

    a unique identifier for the instantiated AnnotatorModel

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def $[T](param: Param[T]): T
    Attributes
    protected
    Definition Classes
    Params
  4. def $$[T](feature: StructFeature[T]): T
    Attributes
    protected
    Definition Classes
    HasFeatures
  5. def $$[K, V](feature: MapFeature[K, V]): Map[K, V]
    Attributes
    protected
    Definition Classes
    HasFeatures
  6. def $$[T](feature: SetFeature[T]): Set[T]
    Attributes
    protected
    Definition Classes
    HasFeatures
  7. def $$[T](feature: ArrayFeature[T]): Array[T]
    Attributes
    protected
    Definition Classes
    HasFeatures
  8. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  9. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  10. def checkValidEnvironment(spark: Option[SparkSession], scopes: Seq[String]): Unit
    Definition Classes
    CheckLicense
  11. def checkValidScope(scope: String): Unit
    Definition Classes
    CheckLicense
  12. def checkValidScopeAndEnvironment(scope: String, spark: Option[SparkSession], checkLp: Boolean): Unit
    Definition Classes
    CheckLicense
  13. def checkValidScopesAndEnvironment(scopes: Seq[String], spark: Option[SparkSession], checkLp: Boolean): Unit
    Definition Classes
    CheckLicense
  14. val cleanAnnotations: BooleanParam

    Whether to remove spark-nlp annotation columns.

    Whether to remove spark-nlp annotation columns. Default: false

  15. final def clear(param: Param[_]): StructuredJsonConverter.this.type
    Definition Classes
    Params
  16. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  17. val converterSchema: StructFeature[ConverterSchema]

    Defines the schema for converting the output of the pipeline into a structured JSON format.

    Defines the schema for converting the output of the pipeline into a structured JSON format.

    The schema is represented by the ConverterSchema case class, which outlines the structure of input columns.

    Fields in the schema:

    • document_identifier: The identifier of the document. This column must be of type StringType.
    • document_text: The text of the document, typically created by the DocumentAssembler annotator.
    • entities: Chunk columns generated by various annotators, such as the ChunkMergeModel annotator.
    • assertions: Assertion columns produced by annotators like the AssertionDLModel annotator.
    • resolutions: The schema for resolutions. See ResolutionSchema for details.
    • relations: Relation columns created by annotators such as the RelationExtractionModel annotator.
    • summaries: Summary columns generated by annotators like the MedicalSummarizer annotator.
    • deidentifications: The schema for deidentifications. See DeIdentificationSchema for details.
    • classifications: The schema for classifications. See ClassificationSchema for details.

    See ConverterSchema for detailed information about the schema structure.

  18. def copy(extra: ParamMap): StructuredJsonConverter
    Definition Classes
    StructuredJsonConverter → Transformer → PipelineStage → Params
  19. def copyValues[T <: Params](to: T, extra: ParamMap): T
    Attributes
    protected
    Definition Classes
    Params
  20. final def defaultCopy[T <: Params](extra: ParamMap): T
    Attributes
    protected
    Definition Classes
    Params
  21. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  22. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  23. def explainParam(param: Param[_]): String
    Definition Classes
    Params
  24. def explainParams(): String
    Definition Classes
    Params
  25. final def extractParamMap(): ParamMap
    Definition Classes
    Params
  26. final def extractParamMap(extra: ParamMap): ParamMap
    Definition Classes
    Params
  27. val features: ArrayBuffer[Feature[_, _, _]]
    Definition Classes
    HasFeatures
  28. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  29. def get[T](feature: StructFeature[T]): Option[T]
    Attributes
    protected
    Definition Classes
    HasFeatures
  30. def get[K, V](feature: MapFeature[K, V]): Option[Map[K, V]]
    Attributes
    protected
    Definition Classes
    HasFeatures
  31. def get[T](feature: SetFeature[T]): Option[Set[T]]
    Attributes
    protected
    Definition Classes
    HasFeatures
  32. def get[T](feature: ArrayFeature[T]): Option[Array[T]]
    Attributes
    protected
    Definition Classes
    HasFeatures
  33. final def get[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  34. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  35. def getCleanAnnotations: Boolean

    Get the value of cleanAnnotations param.

  36. def getConverterSchema: ConverterSchema

    Get the value of converterSchema param.

  37. final def getDefault[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  38. final def getOrDefault[T](param: Param[T]): T
    Definition Classes
    Params
  39. def getOutputAsStr: Boolean

    Get the value of outputAsStr param.

  40. final def getOutputCol: String
    Definition Classes
    HasOutputAnnotationCol
  41. def getParam(paramName: String): Param[Any]
    Definition Classes
    Params
  42. def getReturnRelationEntities: Boolean

    Get the value of returnRelationEntities param.

  43. final def hasDefault[T](param: Param[T]): Boolean
    Definition Classes
    Params
  44. def hasParam(paramName: String): Boolean
    Definition Classes
    Params
  45. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  46. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  47. def initializeLogIfNecessary(isInterpreter: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Logging
  48. final def isDefined(param: Param[_]): Boolean
    Definition Classes
    Params
  49. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  50. final def isSet(param: Param[_]): Boolean
    Definition Classes
    Params
  51. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  52. def log: Logger
    Attributes
    protected
    Definition Classes
    Logging
  53. def logDebug(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  54. def logDebug(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  55. def logError(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  56. def logError(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  57. def logInfo(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  58. def logInfo(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  59. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  60. def logTrace(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  61. def logTrace(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  62. def logWarning(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  63. def logWarning(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  64. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  65. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  66. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  67. def onWrite(path: String, spark: SparkSession): Unit
    Attributes
    protected
    Definition Classes
    ParamsAndFeaturesWritable
  68. val outputAsStr: BooleanParam

    Whether to output the result as a string or as a structured json.

    Whether to output the result as a string or as a structured json. Default: true.

    When set to true, the output column will be a string:

    |-- column_name: string (nullable = true)

    When set to false, the output column will be a struct with the following schema:

    |-- column_name: struct (nullable = true)
         |-- document_identifier: string (nullable = true)
         |-- document_text: array (nullable = true)
         |    |-- element: string (containsNull = true)
         |-- entities: array (nullable = true)
         |    |-- element: map (containsNull = true)
         |        |-- key: string
         |        |-- value: string (valueContainsNull = true)
         |-- assertions: array (nullable = true)
         |    |-- element: map (containsNull = true)
         |        |-- key: string
         |        |-- value: string (valueContainsNull = true)
         |-- resolutions: array (nullable = true)
         |    |-- element: map (containsNull = true)
         |        |-- key: string
         |        |-- value: string (valueContainsNull = true)
         |-- relations: array (nullable = true)
         |    |-- element: map (containsNull = true)
         |        |-- key: string
         |        |-- value: string (valueContainsNull = true)
         |-- summaries: array (nullable = true)
         |    |-- element: string (containsNull = true)
         |-- deidentifications: array (nullable = true)
         |    |-- element: map (containsNull = true)
         |        |-- key: string
         |        |-- value: string (valueContainsNull = true)
         |-- classifications: array (nullable = true)
         |    |-- element: map (containsNull = true)
         |        |-- key: string
         |        |-- value: string (valueContainsNull = true)

    Use this parameter to control the format of the output based on your specific requirements.

  69. final val outputCol: Param[String]
    Attributes
    protected
    Definition Classes
    HasOutputAnnotationCol
  70. lazy val params: Array[Param[_]]
    Definition Classes
    Params
  71. val returnRelationEntities: BooleanParam

    Whether to return the entities in the relations or not.

    Whether to return the entities in the relations or not. Default: false

  72. def save(path: String): Unit
    Definition Classes
    MLWritable
    Annotations
    @Since( "1.6.0" ) @throws( ... )
  73. def set[T](feature: StructFeature[T], value: T): StructuredJsonConverter.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  74. def set[K, V](feature: MapFeature[K, V], value: Map[K, V]): StructuredJsonConverter.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  75. def set[T](feature: SetFeature[T], value: Set[T]): StructuredJsonConverter.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  76. def set[T](feature: ArrayFeature[T], value: Array[T]): StructuredJsonConverter.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  77. final def set(paramPair: ParamPair[_]): StructuredJsonConverter.this.type
    Attributes
    protected
    Definition Classes
    Params
  78. final def set(param: String, value: Any): StructuredJsonConverter.this.type
    Attributes
    protected
    Definition Classes
    Params
  79. final def set[T](param: Param[T], value: T): StructuredJsonConverter.this.type
    Definition Classes
    Params
  80. def setCleanAnnotations(value: Boolean): StructuredJsonConverter.this.type
  81. def setConverterSchema(value: ConverterSchema): StructuredJsonConverter.this.type

    Set the value of converterSchema param.

  82. def setConverterSchemaAsStr(value: String): StructuredJsonConverter.this.type

    Set the value of converterSchema param as a sting.

    Set the value of converterSchema param as a sting.

    Example:
    1. setConverterSchemaAsStr(
      """{
      | "document_identifier": "id",
      | "document_text": "document",
      | "entities": ["ner_chunk"],
      | "assertions": [],
      | "resolutions": [],
      | "relations": [],
      | "summaries": [],
      | "deidentifications": [
      |   {
      |     "original": "sentence",
      |     "obfuscated": "obfuscated",
      |     "masked": ""
      |   }],
      | "classifications": []
      |}""".stripMargin
      )
  83. def setDefault[T](feature: StructFeature[T], value: () ⇒ T): StructuredJsonConverter.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  84. def setDefault[K, V](feature: MapFeature[K, V], value: () ⇒ Map[K, V]): StructuredJsonConverter.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  85. def setDefault[T](feature: SetFeature[T], value: () ⇒ Set[T]): StructuredJsonConverter.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  86. def setDefault[T](feature: ArrayFeature[T], value: () ⇒ Array[T]): StructuredJsonConverter.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  87. final def setDefault(paramPairs: ParamPair[_]*): StructuredJsonConverter.this.type
    Attributes
    protected
    Definition Classes
    Params
  88. final def setDefault[T](param: Param[T], value: T): StructuredJsonConverter.this.type
    Attributes
    protected[org.apache.spark.ml]
    Definition Classes
    Params
  89. def setOutputAsStr(value: Boolean): StructuredJsonConverter.this.type

    Set whether to output the result as a string or as a structured json.

    Set whether to output the result as a string or as a structured json. Set the value of outputAsStr param. Default: true

  90. final def setOutputCol(value: String): StructuredJsonConverter.this.type
    Definition Classes
    HasOutputAnnotationCol
  91. def setReturnRelationEntities(value: Boolean): StructuredJsonConverter.this.type

    Set whether to return the entities in the relations or not.

    Set whether to return the entities in the relations or not. Default: false

  92. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  93. def toString(): String
    Definition Classes
    Identifiable → AnyRef → Any
  94. def transform(dataset: Dataset[_]): DataFrame
    Definition Classes
    StructuredJsonConverter → Transformer
  95. def transform(dataset: Dataset[_], paramMap: ParamMap): DataFrame
    Definition Classes
    Transformer
    Annotations
    @Since( "2.0.0" )
  96. def transform(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): DataFrame
    Definition Classes
    Transformer
    Annotations
    @Since( "2.0.0" ) @varargs()
  97. final def transformSchema(schema: StructType): StructType
    Definition Classes
    StructuredJsonConverter → PipelineStage
  98. def transformSchema(schema: StructType, logging: Boolean): StructType
    Attributes
    protected
    Definition Classes
    PipelineStage
    Annotations
    @DeveloperApi()
  99. val uid: String
    Definition Classes
    StructuredJsonConverter → Identifiable
  100. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  101. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  102. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  103. def write: MLWriter
    Definition Classes
    ParamsAndFeaturesWritable → DefaultParamsWritable → MLWritable

Inherited from CheckLicense

Inherited from ParamsAndFeaturesWritable

Inherited from HasFeatures

Inherited from DefaultParamsWritable

Inherited from MLWritable

Inherited from HasOutputAnnotationCol

Inherited from Transformer

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

Parameters

Members

Parameter setters

Parameter getters