Time bounded drop out of order

Supported in: Streaming

Drops rows with the same values for all key columns that are out of order. A row is out of order if it would have come before an already received row with the same key values based on sort columns and directions. Two rows are compared by evaluating the first sort column and direction first, and then moving on to the next sort column and direction if and only if there was a tie, and so on until order is determined or all sort columns are tied in which case the rows are equal. The current maximum for each key is stored until no new rows have been seen for that key for an event time greater than or equal to the expiry. After a key has received no new rows for greater or equal to the expiry time, any new row for that key will be never be dropped, and will always be stored as the new current maximum.

Transform categories: Other

Declared arguments

  • Dataset - Dataset to drop out of order rows.
    Table
  • Key expiration time unit - Unit for amount of time to store the greatest record for a given key. If state is stored for a key, and a different key is processed with a watermark greater than this expiration period, then state is expired for the key and any new records of the same key will not be dropped. For any key, a new record pushes the expiry to this amount of time in the future, whether or not it has the highest order precedence.
    Enum<Days, Hours, Milliseconds, Minutes, Seconds, Weeks>
  • Key expiration time value - Value for amount of time to store the greatest record for a given key. If state is stored for a key, and a different key is processed with a watermark greater than this expiration period, then state is expired for the key and any new records of the same key will not be dropped. For any key, a new record pushes the expiry to this amount of time in the future, whether or not it has the highest order precedence.
    Literal<Long>
  • Sort specification - Specification for how to sort the dataset. This list is a the precedence that each column is used to sort two records. The first column and sort direction is is used to compare two records, and if there is a tie, then the second column and sort direction are used, and so on.
    List<Tuple<Column<ComparableType>, Enum<Ascending, Descending>>>
  • optional Key by columns - Columns used to partition the input by key. Rows sharing the same key column values are processed in the order they are received. The order in which rows with the same key columns are processed may differ from the order defined by the sort spec. A row is considered out of order when it ought to be placed before the state stored highest precedence already processed row with the same key, based on the sort spec. For such out-of-order rows, they are dropped during the process so long as such state for this key exists and has not expired.
    Set<Column<AnyType>>