DateShiftFiller

class DateShiftFiller extends Serializable

Utility class to fill missing or empty shift values in a DataFrame column using deterministic, ID-based pseudo-random values.

This is particularly useful in de-identification tasks where date shift values must be: - Consistent across all rows with the same identifier - Present even when the original shift value is missing or empty

Logic

For each row:

If another row with the same ID has a known (non-empty) shift value, reuse it.
If not, generate a fallback shift value deterministically using the ID and a seed. The generated value will always fall in the range [1, maxShiftDays].

Example usage

val filler = new DateShiftFiller(spark, seed = 42, maxShiftDays = 60)
val resultDf = filler.fillMissingShifts(df, "note_id", "date_shift", "_filled")

Linear Supertypes

Serializable, Serializable, AnyRef, Any

Ordering

Alphabetic
By Inheritance

Inherited

DateShiftFiller
Serializable
Serializable
AnyRef
Any

Hide All
Show All

Visibility

Public
All

Instance Constructors

new DateShiftFiller(spark: SparkSession, seed: Int = 42, maxShiftDays: Int = 60)
spark
The active SparkSession
seed
Seed used for deterministic hashing, ensuring repeatable fallback values
maxShiftDays
Maximum number of days used in fallback shift generation (inclusive upper bound)

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clone(): AnyRef

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( ... ) @native()
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def fillMissingShifts(df: DataFrame, idCol: String, shiftCol: String, suffix: String = "_filled", resolvedMode: String = "first"): DataFrame
Fills missing or empty values in a date-shift column using the following logic: - If other rows with the same ID have a valid value, reuse it.
Fills missing or empty values in a date-shift column using the following logic: - If other rows with the same ID have a valid value, reuse it. - If not, generate a deterministic pseudo-random value based on ID and seed.
The result is written to a new column using the given suffix, keeping the original column untouched.
df
Input DataFrame
idCol
ID column name (grouping key)
shiftCol
Column with optional shift values
suffix
Suffix to append to the new output column (e.g., "_filled"), Default is "_filled"
resolvedMode
How to resolve conflicts when multiple rows have the same ID (default: "first") Options: "first", "all". "all" option will duplicate rows with the same ID.
returns
DataFrame with a new shift column: shiftCol + suffix
def finalize(): Unit

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
Annotations
@native()
def hashCode(): Int

Definition Classes
AnyRef → Any
Annotations
@native()
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
Annotations
@native()
final def notifyAll(): Unit

Definition Classes
AnyRef
Annotations
@native()
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... ) @native()

Packages

DateShiftFiller

class DateShiftFiller extends Serializable

Logic

Example usage

Instance Constructors

Value Members

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped

Packages

DateShiftFiller 

class DateShiftFiller extends Serializable

Logic

Example usage

Instance Constructors

Value Members

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped

DateShiftFiller