Outer caching join

Supported in: Streaming

Joins left and right dataset inputs together, caching the record with the highest event time from each side for use in subsequent joins. Processing time of a record is used as a tiebreaker. In the case of a time results are optimistically emitted if there's no value to join against.

Transform categories: Join

Declared arguments

  • Default cache time unit - Default unit for amount of time data will be cached for before eviction for both the lhs and rhs cache.
    Enum<Days, Hours, Milliseconds, Minutes, Seconds, Weeks>
  • Default cache time value - Default value for the amount of time data will be cached for before eviction for both the lhs and rhs cache.
    Literal<Long>
  • Join key - A list of columns from left and right input to join on.
    List<Tuple<Column<AnyType>, Column<AnyType>>>
  • Left dataset - Left dataset to use in join.
    Table
  • Right dataset - Right dataset to use in join.
    Table
  • optional Rhs cache time override - Value and unit of time that data from the rhs dataset will be cached for before eviction.
    Tuple<Literal<Long>, Enum<Days, Hours, Milliseconds, Minutes, Seconds, Weeks>>