Data connectivity & integrationPipeline Builder TransformsSemi join

Semi join

Supported in: Batch

Semi joins left and right dataset inputs together. This removes all rows that don't match the join condition.

Transform categories: Join

Declared arguments

  • Condition for columns to select on the left - All columns in the left input schema will be tested to see if they match this condition. If they match, the column will be selected in the output.
    ColumnPredicate
  • Join condition - Condition on which to join.
    Expression<Boolean>
  • Left dataset - Left dataset to use in join.
    Table
  • Right dataset - Right dataset to use in join.
    Table

Examples

Example 1: Base case

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [tail_number, airline],
    )
  • Join condition:
    equals(
     left: tail_number,
     right: tail_number,
    )
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right

Inputs: ri.foundry.main.dataset.left

tail_numberairlinemilesfactor
XB-123foundry air1242
MT-222new airline11235
XB-123foundry airline3355
MT-222new air5654
KK-452new air2221
PA-452new air2122
XB-123foundry airline11342

ri.foundry.main.dataset.right

tail_numberhome_airport
XB-123LHR
MT-222CPH
KK-452JFK
JR-201IAD

Output:

tail_numberairline
XB-123foundry air
MT-222new airline
XB-123foundry airline
MT-222new air
KK-452new air
XB-123foundry airline

Example 2: Base case

Description: Simple complex join condition. Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [tail_number, airline, factor],
    )
  • Join condition:
    and(
     conditions: [
    lessThan(
     left: factor,
     right: factor,
    ),
    equals(
     left: tail_number,
     right: tail_number,
    )],
    )
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right

Inputs: ri.foundry.main.dataset.left

tail_numberairlinemilesfactor
XB-123foundry air1242
MT-222new airline11235
XB-123foundry airline3355
MT-222new air5654
KK-452new air2221
PA-452new air2122
XB-123foundry airline11342

ri.foundry.main.dataset.right

tail_numberhome_airportfactor
XB-123LHR2
MT-222CPH1
KK-452JFK10
JR-201IAD4

Output:

tail_numberairlinefactor
KK-452new air1

Example 3: Base case

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [tail_number, airline, factor],
    )
  • Join condition:
    and(
     conditions: [
    equals(
     left: tail_number,
     right: tail_number,
    ),
    equals(
     left: factor,
     right: factor,
    )],
    )
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right

Inputs: ri.foundry.main.dataset.left

tail_numberairlinemilesfactor
XB-123foundry air1242
MT-222new airline11235
XB-123foundry airline3355
MT-222new air5654
KK-452new air2221
PA-452new air2122
XB-123foundry airline11342

ri.foundry.main.dataset.right

tail_numberhome_airportfactor
XB-123LHR2
MT-222CPH1
KK-452JFK10
JR-201IAD4

Output:

tail_numberairlinefactor
XB-123foundry air2
XB-123foundry airline2

Example 4: Base case

Argument values:

  • Condition for columns to select on the left:
    allColumns(

    )
  • Join condition:
    equals(
     left: tail_number,
     right: tail_number,
    )
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right

Inputs: ri.foundry.main.dataset.left

tail_numberairlinemilesfactor
XB-123foundry air1242
MT-222new airline11235
XB-123foundry airline3355
MT-222new air5654
KK-452new air2221
PA-452new air2122
XB-123foundry airline11342

ri.foundry.main.dataset.right

tail_numberhome_airport
XB-123LHR
MT-222CPH
KK-452JFK
JR-201IAD

Output:

tail_numberairlinemilesfactor
XB-123foundry air1242
MT-222new airline11235
XB-123foundry airline3355
MT-222new air5654
KK-452new air2221
XB-123foundry airline11342