Outer join

Supported in: Batch

Outer joins the provided dataset inputs together, keeping all rows from both datasets. Columns have nulls when there is no row satisfying the provided condition.

Transform categories: Join

Declared arguments

  • Condition for columns to select on the left - All columns in the left input schema will be tested to see if they match this condition. If they match, the column will be selected in the output.
    ColumnPredicate
  • Condition for columns to select on the right - All columns in the right input schema will be tested to see if they match this condition. If they match, the column will be selected in the output.
    ColumnPredicate
  • Join condition - Condition on which to join.
    Expression<Boolean>
  • Left dataset - Left dataset to use in join.
    Table
  • Right dataset - Right dataset to use in join.
    Table
  • optional Prefix for columns from right - Prefix to add to all columns on the right hand side.
    Literal<String>

Examples

Example 1: Base case

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [tail_number, airline],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [home_airport],
    )
  • Join condition:
    equals(
     left: tail_number,
     right: tail_number,
    )
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs: ri.foundry.main.dataset.left

tail_numberairlinemilesfactor
XB-123foundry air1242
MT-222new airline11235
XB-123foundry airline3355
MT-222new air5654
KK-452new air2221
PA-452new air2122
XB-123foundry airline11342

ri.foundry.main.dataset.right

tail_numberhome_airport
XB-123LHR
MT-222CPH
KK-452JFK
JR-201IAD

Output:

tail_numberairlinehome_airport
XB-123foundry airLHR
MT-222new airlineCPH
XB-123foundry airlineLHR
MT-222new airCPH
KK-452new airJFK
PA-452new airnull
XB-123foundry airlineLHR
JR-201nullIAD

Example 2: Base case

Description: Simple complex join condition. Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [tail_number, airline, factor],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [tail_number, home_airport, factor],
    )
  • Join condition:
    lessThan(
     left: factor,
     right: factor,
    )
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: right_

Inputs: ri.foundry.main.dataset.left

tail_numberairlinemilesfactor
XB-123foundry air1242
MT-222new airline11235
XB-123foundry airline3355
MT-222new air5654
KK-452new air2221
PA-452new air2122
XB-123foundry airline11342

ri.foundry.main.dataset.right

tail_numberhome_airportfactor
XB-123LHR2
MT-222CPH1
KK-452JFK10
JR-201IAD4

Output:

tail_numberairlinefactorright_tail_numberright_home_airportright_factor
XB-123foundry air2KK-452JFK10
XB-123foundry air2JR-201IAD4
MT-222new airline5KK-452JFK10
XB-123foundry airline5KK-452JFK10
MT-222new air4KK-452JFK10
KK-452new air1XB-123LHR2
KK-452new air1KK-452JFK10
KK-452new air1JR-201IAD4
PA-452new air2KK-452JFK10
PA-452new air2JR-201IAD4
XB-123foundry airline2KK-452JFK10
XB-123foundry airline2JR-201IAD4
nullnullnullMT-222CPH1

Example 3: Base case

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [tail_number, airline, factor],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [home_airport],
    )
  • Join condition:
    and(
     conditions: [
    equals(
     left: tail_number,
     right: tail_number,
    ),
    equals(
     left: factor,
     right: factor,
    )],
    )
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs: ri.foundry.main.dataset.left

tail_numberairlinemilesfactor
XB-123foundry air1242
MT-222new airline11235
XB-123foundry airline3355
MT-222new air5654
KK-452new air2221
PA-452new air2122
XB-123foundry airline11342

ri.foundry.main.dataset.right

tail_numberhome_airportfactor
XB-123LHR2
MT-222CPH1
KK-452JFK10
JR-201IAD4

Output:

tail_numberairlinefactorhome_airport
XB-123foundry air2LHR
MT-222new airline5null
XB-123foundry airline5null
MT-222new air4null
KK-452new air1null
PA-452new air2null
XB-123foundry airline2LHR
JR-201null4IAD
KK-452null10JFK
MT-222null1CPH

Example 4: Base case

Description: Note, values from the left and right of tail_number are coalesced. Argument values:

  • Condition for columns to select on the left:
    allColumns(

    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [home_airport],
    )
  • Join condition:
    equals(
     left: tail_number,
     right: tail_number,
    )
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs: ri.foundry.main.dataset.left

tail_numberairlinemilesfactor
XB-123foundry air1242
MT-222new airline11235
XB-123foundry airline3355
MT-222new air5654
KK-452new air2221
PA-452new air2122
XB-123foundry airline11342

ri.foundry.main.dataset.right

tail_numberhome_airport
XB-123LHR
MT-222CPH
KK-452JFK
JR-201IAD

Output:

tail_numberairlinemilesfactorhome_airport
XB-123foundry air1242LHR
MT-222new airline11235CPH
XB-123foundry airline3355LHR
MT-222new air5654CPH
KK-452new air2221JFK
PA-452new air2122null
XB-123foundry airline11342LHR
JR-201nullnullnullIAD