Supported in: Streaming
Drops duplicate rows from the input for given column subset, rows seen will expire after configured amount of event time. Row that arrive late by an amount greater than the configured amount of event time will always be dropped. Partitions by keys specified. Each drop duplicates will be computed separately for distinct key column values.
Transform categories: Other
Description: The first record at 00:00:00 is emitted and its state is scheduled for eviction at the next tumbling window boundary determined by the eviction window slide (default 1 minute). Although the configured timeout is 10 seconds, the subsequent records at 00:00:09, 00:00:18, and 00:00:28 are all dropped as duplicates because the watermark does not advance far enough for the eviction timer to fire. Duplicates are dropped between the key expiry and key expiry plus the eviction window slide.
Argument values:
SECONDSInput:
| row_order | day | temperature | measurement_timestamp |
|---|---|---|---|
| 4 | Monday | 10.4 | 2024-09-30T00:00:28 |
| 3 | Monday | 10.3 | 2024-09-30T00:00:18 |
| 2 | Monday | 10.2 | 2024-09-30T00:00:09 |
| 1 | Monday | 10.1 | 2024-09-30T00:00:00 |
Output:
| day | temperature | measurement_timestamp |
|---|---|---|
| Monday | 10.1 | 2024-09-30T00:00:00 |
Description: With deduplication partitioned by the day column, each key maintains independent state. The first record for Monday at 00:00:20 is emitted and advances the watermark. The record for Tuesday at 00:00:05 is dropped because it arrives too late: its event time plus the 10 second timeout (00:00:15) is behind the watermark (approximately 00:00:20). This occurs even though Tuesday has no prior records. The record for Wednesday at 00:00:25 is not late and is emitted as the first record for its key.
Argument values:
SECONDSday}Input:
| row_order | day | temperature | measurement_timestamp |
|---|---|---|---|
| 3 | Wednesday | 22.1 | 2024-09-30T00:00:25 |
| 2 | Tuesday | 18.3 | 2024-09-30T00:00:05 |
| 1 | Monday | 20.5 | 2024-09-30T00:00:20 |
Output:
| day | temperature | measurement_timestamp |
|---|---|---|
| Monday | 20.5 | 2024-09-30T00:00:20 |
| Wednesday | 22.1 | 2024-09-30T00:00:25 |