Project on condition

Supported in: Batch, Streaming

Transforms input dataset either by selecting columns or applying functions to columns.

Transform categories: Popular

Declared arguments

  • Condition for columns to project - All columns in the input schema will be tested to see if they match this condition. If they match, the given expression will be applied to them.
    ColumnPredicate
  • Dataset - Dataset to apply operations to.
    Table
  • Expression to apply - The expression to apply once per each column that matches condition.
    Expression<AnyType>
  • Keep remaining columns - Keeps all columns not projected in the dataset.
    Literal<Boolean>
  • optional Keep matched columns - Keep the original columns that were matched by the condition. If a projected column has the same name, the original column will be overridden.
    Literal<Boolean>

Examples

Example 1: Base case

Description: Rename matched columns based on regex. Argument values:

  • Condition for columns to project:
    columnHasType(
     type: String,
    )
  • Dataset: ri.foundry.main.dataset.a
  • Expression to apply:
    dynamicAlias(
     expression:
    cast(
     expression: column,
     type: Integer,
    ),
     transformer:
    columnNameRegexReplace(
     input: column,
     pattern: str,
     replace: int,
    ),
    )
  • Keep remaining columns: true
  • Keep matched columns: false

Input:

iddistance_strfactor_str
120001265

Output:

distance_intfactor_intid
200012651

Example 2: Edge case

Description: You can choose to keep both matched and remaining columns. Argument values:

  • Condition for columns to project:
    columnHasType(
     type: String,
    )
  • Dataset: ri.foundry.main.dataset.a
  • Expression to apply:
    dynamicAlias(
     expression:
    cast(
     expression: column,
     type: Integer,
    ),
     transformer:
    columnNameConcat(
     inputs: [column, _as_integer],
    ),
    )
  • Keep remaining columns: true
  • Keep matched columns: true

Input:

iddistance
12000

Output:

distance_as_integeriddistance
200012000

Example 3: Edge case

Description: You can choose to keep the columns that the condition matches, in addition to the new columns that are created. Argument values:

  • Condition for columns to project:
    columnHasType(
     type: String,
    )
  • Dataset: ri.foundry.main.dataset.a
  • Expression to apply:
    dynamicAlias(
     expression:
    cast(
     expression: column,
     type: Integer,
    ),
     transformer:
    columnNameConcat(
     inputs: [column, _as_integer],
    ),
    )
  • Keep remaining columns: false
  • Keep matched columns: true

Input:

iddistance
12000

Output:

distance_as_integerdistance
20002000

Example 4: Edge case

Description: When keeping matching columns but the projected column overrides the existing column, then the matched column isn't kept. In order to keep the original column, you must rename the projected column to a new name. Argument values:

  • Condition for columns to project:
    columnHasType(
     type: String,
    )
  • Dataset: ri.foundry.main.dataset.a
  • Expression to apply:
    cast(
     expression: column,
     type: Integer,
    )
  • Keep remaining columns: false
  • Keep matched columns: true

Input:

iddistance
12000

Output:

distance
2000

Example 5: Edge case

Description: You can choose to keep only the columns that are projected. Argument values:

  • Condition for columns to project:
    columnHasType(
     type: String,
    )
  • Dataset: ri.foundry.main.dataset.a
  • Expression to apply:
    dynamicAlias(
     expression:
    cast(
     expression: column,
     type: Integer,
    ),
     transformer:
    columnNameConcat(
     inputs: [column, _as_integer],
    ),
    )
  • Keep remaining columns: false
  • Keep matched columns: false

Input:

iddistance
12000

Output:

distance_as_integer
2000

Example 6: Edge case

Description: You can choose to keep only remaining columns that did not match the condition. Argument values:

  • Condition for columns to project:
    columnHasType(
     type: String,
    )
  • Dataset: ri.foundry.main.dataset.a
  • Expression to apply:
    cast(
     expression: column,
     type: Integer,
    )
  • Keep remaining columns: true
  • Keep matched columns: false

Input:

iddistance
12000

Output:

distanceid
20001