Supported in: Streaming
Drops duplicate rows from the input for given column subset, rows seen will expire after configured amount of event time. Rows that arrive late by an amount greater than the configured amount of event time will always be dropped. Partitions by keys specified. Each drop duplicates will be computed separately for distinct key column values.
Transform categories: Other
In all tables, the most recently streamed rows appear higher. Additionally, each example uses the time bounded drop duplicates node shown below:
Input
row_order | day | temperature | measurement_timestamp |
---|---|---|---|
4 | Monday | 10.4 | 2024-09-30T00:00:28 |
3 | Monday | 10.3 | 2024-09-30T00:00:18 |
2 | Monday | 10.2 | 2024-09-30T00:00:09 |
1 | Monday | 10.1 | 2024-09-30T00:00:00 |
Output
day | temperature | measurement_timestamp |
---|---|---|
Monday | 10.4 | 2024-09-30T00:00:28 |
Monday | 10.1 | 2024-09-30T00:00:00 |
Explanation: Records with temperatures of 10.2 and 10.3 arrive within the threshold time gap of 10 seconds relative to the previous record. As a result, these additional records are not emitted. The increasing timestamps mean that the watermark is updated for each of these records. This is why the record with a temperature reading of 10.3 is not emitted, despite arriving more than 10 seconds after the first record with a reading of 10.1. The last streamed row, with a temperature of 10.4, arrives more than 10 seconds after the last row that updated the watermark (the row with a reading of 10.3), and is therefore emitted.
Input
row_order | day | temperature | measurement_timestamp |
---|---|---|---|
4 | Monday | 10.4 | 2024-09-30T00:00:30 |
3 | Monday | 10.3 | 2024-09-30T00:00:00 |
2 | Monday | 10.2 | 2024-09-30T00:00:05 |
1 | Monday | 10.1 | 2024-09-30T00:00:20 |
Output
day | temperature | measurement_timestamp |
---|---|---|
Monday | 10.4 | 2024-09-30T00:00:30 |
Monday | 10.1 | 2024-09-30T00:00:20 |
Explanation: Records with temperatures of 10.2 and 10.3 arrive after the record with a temperature of 10.1, but have earlier timestamps. As a result, these records are dropped and do not advance the watermark. The last streamed row, with a temperature of 10.4, arrives after the threshold time gap of 10 seconds from the last row that updated the watermark (the row with a reading of 10.1), and is therefore emitted.