Drops duplicate rows from the input for given column subset, rows seen will expire after configured amount of event time. Row that arrive late by an amount greater than the configured amount of event time will always be dropped. Partitions by keys specified. Each drop duplicates will be computed separately for distinct key column values.
Transform categories: Other
Declared arguments
Dataset - Dataset to deduplicate rows. Table
Key expiration time unit - Unit for amount of time to wait for data to deduplicate over. Enum<Days, Hours, Milliseconds, Minutes, Seconds, Weeks>
Key expiration time value - Value for the amount of time to wait for data to deduplicate over. Literal<Long>
optionalColumn subset - If any columns are specified only those will be used when determining uniqueness, otherwise the key subset that the stream is keyed by is implicitly used to determine uniqueness. Set<Column<AnyType>>
optionalEviction window slide - Value for how long the tumbling window of eviction should be, indicating the cadence at which stale state will be evicted. State is considered stale when more than the specified timeout in event-time has elapsed. Duplicates will be dropped between (key_expiry : key_expiry + eviction_slide] since the last duplicate was seen. Changing this value is considered a state break and will require a replay. Tuple<Literal<Long>, Enum<Days, Hours, Milliseconds, Minutes, Seconds, Weeks>>
optionalKey by columns - Columns on which to partition the input by key. Each drop duplicates will be computed separately in parallel for each distinct key value. Set<Column<AnyType>>