Numeric distribution

Supported in: Batch

Computes the distribution of numeric values in a specified column.

Transform categories: Numeric

Declared arguments

  • Bucket count - Number of buckets to distribute over.
    Literal<Long>
  • Column - Column to compute distribution for.
    Column<Numeric>
  • Dataset - Dataset to apply distribution to.
    Table
  • Maximum value - Maximum value for distribution.
    Literal<Double>
  • Minimum value - Minimum value for distribution.
    Literal<Double>

Examples

Example 1: Base case

Argument values:

  • Bucket count: 10
  • Column: value
  • Dataset: ri.foundry.main.dataset.a
  • Maximum value: 20.0
  • Minimum value: 0.0

Input:

value
0.0
0.0
1.3
5.3
10.5

Output:

bucketmin_valuemax_valuecountbucket_startbucket_end
00.01.330.02.0
25.35.314.06.0
510.510.5110.012.0

Example 2: Base case

Argument values:

  • Bucket count: 3
  • Column: value
  • Dataset: ri.foundry.main.dataset.a
  • Maximum value: 25.0
  • Minimum value: -5.0

Input:

value
-15
-5
0
15
20

Output:

bucketmin_valuemax_valuecountbucket_startbucket_end
0-502-5.05.0
21520215.025.0

Example 3: Edge case

Argument values:

  • Bucket count: 1
  • Column: value
  • Dataset: ri.foundry.main.dataset.a
  • Maximum value: 20.0
  • Minimum value: 20.0

Input:

value
-15
-5
0
15
20

Output:

bucketmin_valuemax_valuecountbucket_startbucket_end
02020120.020.0

Example 4: Edge case

Argument values:

  • Bucket count: 1
  • Column: value
  • Dataset: ri.foundry.main.dataset.a
  • Maximum value: 20.0
  • Minimum value: -5.0

Input:

value
-15
-5
0
15
20

Output:

bucketmin_valuemax_valuecountbucket_startbucket_end
0-5153-5.020.0
12020120.045.0