Outer caching join

Supported in: Streaming

Joins left and right dataset inputs together, caching the record with the highest event time from each side for use in subsequent joins. Processing time of a record is used as a tiebreaker. In the case of a time results are optimistically emitted if there's no value to join against.

Transform categories: Join

Declared arguments

  • Default cache time unit - Default unit for amount of time data will be cached for before eviction for both the lhs and rhs cache.
    Enum<Days, Hours, Milliseconds, Minutes, Seconds, Weeks>
  • Default cache time value - Default value for the amount of time data will be cached for before eviction for both the lhs and rhs cache.
    Literal<Long>
  • Join key - A list of columns from left and right input to join on.
    List<Tuple<Column<AnyType>, Column<AnyType>>>
  • Left dataset - Left dataset to use in join.
    Table
  • Right dataset - Right dataset to use in join.
    Table
  • optional Rhs cache time override - Value and unit of time that data from the rhs dataset will be cached for before eviction. If cache time is set to 0, the rhs will not cache. If you want the lhs to be uncached, set the default cache time to 0 and override with this parameter.
    Tuple<Literal<Long>, Enum<Days, Hours, Milliseconds, Minutes, Seconds, Weeks>>