Time bounded drop out of order

Supported in: Streaming

Drops rows with the same values for all key columns that are out of order. A row is out of order if it would have come before an already received row with the same key values based on sort columns and directions. Two rows are compared by evaluating the first sort column and direction first, and then moving on to the next sort column and direction if and only if there was a tie, and so on until order is determined or all sort columns are tied in which case the rows are equal. The current maximum for each key is stored until no new rows have been seen for that key for an event time greater than or equal to the expiry. After a key has received no new rows for greater or equal to the expiry time, any new row for that key will be never be dropped, and will always be stored as the new current maximum.

Transform categories: Other

Declared arguments

  • Dataset - Dataset to drop out of order rows.
    Table
  • Key expiration time unit - Unit for amount of time to store the greatest record for a given key. If state is stored for a key, and a different key is processed with a watermark greater than this expiration period, then state is expired for the key and any new records of the same key will not be dropped. For any key, a new record pushes the expiry to this amount of time in the future, whether or not it has the highest order precedence.
    Enum<Days, Hours, Milliseconds, Minutes, Seconds, Weeks>
  • Key expiration time value - Value for amount of time to store the greatest record for a given key. If state is stored for a key, and a different key is processed with a watermark greater than this expiration period, then state is expired for the key and any new records of the same key will not be dropped. For any key, a new record pushes the expiry to this amount of time in the future, whether or not it has the highest order precedence.
    Literal<Long>
  • Sort specification - Defines the criteria for comparing rows. This list specifies the order of precedence for columns used in sorting records. The first column and its sort direction are applied initially; if records are identical based on this criterion, subsequent columns and their corresponding sort directions are used to break ties.
    List<Tuple<Column<ComparableType>, Enum<Ascending, Descending>>>
  • optional Key by columns - Columns used to partition the input by key. Rows sharing the same key column values are processed in the order they are received. The order in which rows with the same key columns are processed may differ from the order defined by the sort spec. A row is considered out of order when it ought to be placed before the state stored highest precedence already processed row with the same key, based on the sort spec. For such out-of-order rows, they are dropped during the process so long as such state for this key exists and has not expired.
    Set<Column<AnyType>>