Aggregate on condition

Supported in: Batch

Aggregate expressions based on a condition statement.

Transform categories: Aggregate

Declared arguments

  • Condition for columns to aggregate on - All columns in the input schema will be tested to see if they match this condition. If they match, the given expressions will be applied to them.
    ColumnPredicate
  • Dataset - Dataset to apply operations to.
    Table
  • Expressions to aggregate - The aggregate expression to apply once per each column that matches condition.
    List<Expression<AnyType>>
  • optional Group by columns - List of columns to group the dataset by when aggregating. If empty, no group by is applied.
    List<Column<AnyType>>

Examples

Example 1: Edge case

Description: Count non null rows for all columns. Argument values:

  • Condition for columns to aggregate on:
    allColumns(

    )
  • Dataset: ri.foundry.main.dataset.a
  • Expressions to aggregate: [
    dynamicAlias(
     expression:
    rowCount(
     expression: column,
    ),
     transformer:
    columnNameConcat(
     inputs: [column, _non_null],
    ),
    )]
  • Group by columns: null

Input:

idvaluedistance
11002000
2null100
3500300

Output:

id_non_nullvalue_non_nulldistance_non_null
323

Example 2: Edge case

Description: Count non null and mean of integer columns. Argument values:

  • Condition for columns to aggregate on:
    columnHasType(
     type: Integer,
    )
  • Dataset: ri.foundry.main.dataset.a
  • Expressions to aggregate: [
    dynamicAlias(
     expression:
    rowCount(
     expression: column,
    ),
     transformer:
    columnNameConcat(
     inputs: [column, _non_null],
    ),
    ),
    dynamicAlias(
     expression:
    mean(
     expression: column,
    ),
     transformer:
    columnNameConcat(
     inputs: [column, _mean],
    ),
    )]
  • Group by columns: null

Input:

idvaluedistance
11002000
2null100
3500300

Output:

id_non_nullid_meanvalue_non_nullvalue_mean
32.02300.0

Example 3: Edge case

Argument values:

  • Condition for columns to aggregate on:
    columnHasType(
     type: Integer,
    )
  • Dataset: ri.foundry.main.dataset.a
  • Expressions to aggregate: [
    dynamicAlias(
     expression:
    mean(
     expression: column,
    ),
     transformer:
    columnNameConcat(
     inputs: [column, _mean],
    ),
    )]
  • Group by columns: [id]

Input:

idvaluedistanceairline
11002000new air
12003000new air
25003000foundry air
24001000foundry air

Output:

idid_meanvalue_meandistance_mean
11.0150.02500.0
22.0450.02000.0

Example 4: Edge case

Description: Mean of all integer columns. Argument values:

  • Condition for columns to aggregate on:
    columnHasType(
     type: Integer,
    )
  • Dataset: ri.foundry.main.dataset.a
  • Expressions to aggregate: [
    dynamicAlias(
     expression:
    mean(
     expression: column,
    ),
     transformer:
    columnNameConcat(
     inputs: [column, _mean],
    ),
    )]
  • Group by columns: null

Input:

idvaluedistance
11002000
3500300

Output:

id_meanvalue_mean
2.0300.0