Functions index

Pipeline Builder provides expressions that operate at different levels. They can generally be categorized as row level, aggregations or generators.

Row level functions operate on values from a single row. Most expressions fall in this category, for example add.

Aggregations aggregate multiple row values into one. For example the 'sum' expression.

Generators produce multiple values from a single row. For example the 'explode_array' expression

Transforms are functions that operate on a whole table or multiple tables. For example the 'drop' transform.The following document will outline the available expressions and transforms.

Row level expressions


Absolute value

Supported in: Batch, Streaming

Returns the absolute value.

Expression categories: Numeric

Type variable bounds: T accepts Numeric

Output type: T

Example

Argument values:

  • Expression: numeric_column
numeric_columnOutput
0.00.0
1.11.1
-1.11.1

See more details here.


Add numbers

Supported in: Batch, Streaming

Calculates the sum of all input columns.

Expression categories: Numeric

Output type: Numeric

Example

Argument values:

  • Expressions: [col_a, col_b]
col_acol_bOutput
011
3-21

See more details here.


Add or update struct field

Supported in: Batch, Streaming

Updates a field of a struct or adds a new field.

Expression categories: Struct

Output type: Struct

Example

Argument values:

  • Expression: value
  • Locator: airline.id
  • Struct: struct
structvalueOutput
{
airline: {
id: NA,
},
}
1{
airline: {
id: 1,
},
}
{
airline: {
id: FE,
},
}
2{
airline: {
id: 2,
},
}

See more details here.


Add value to date

Supported in: Batch, Streaming

Returns the date that is 'value' days/weeks/months/quarter/years after 'start'.

Expression categories: Datetime

Output type: Date

Example

Argument values:

  • Date: 2022-02-01
  • Unit: DAYS
  • Value: 2

Output: 2022-02-03

See more details here.


All array elements satisfy

Supported in: Batch, Streaming

Return true if the expression is true for all elements in the array.

Expression categories: Array

Output type: Boolean

Example

Argument values:

  • Array: miles
  • Boolean condition:
    isNull(
     expression: element,
    )
milesOutput
[ 12300, null ]false
[ null, null ]true

See more details here.


And

Supported in: Batch, Streaming

Returns true if all of the specified conditions are true. Nulls are considered false.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Conditions: [left_boolean, right_boolean]
left_booleanright_booleanOutput
truetruetrue
truefalsefalse
falsetruefalse
falsefalsefalse

See more details here.


Any array element satisfy

Supported in: Batch, Streaming

Return true if the expression is true for any element in the array.

Expression categories: Array

Output type: Boolean

Example

Argument values:

  • Array: miles
  • Boolean condition:
    isNull(
     expression: element,
    )
milesOutput
[ 12300, null ]true
[ 12300, 12000 ]false

See more details here.


Arccos

Supported in: Batch, Streaming

Inverse cosine function.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Angle unit: radians
  • Value: 1.0

Output: 0.0

See more details here.


Arcsin

Supported in: Batch, Streaming

Inverse sine function.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Angle unit: radians
  • Value: 0.0

Output: 0.0

See more details here.


Arctan

Supported in: Batch, Streaming

Inverse tangent function.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Angle unit: degrees
  • Value: angle
angleOutput
-1.0-45.0
0.00.0
1.045.0

See more details here.


Arctan2

Supported in: Batch, Streaming

Returns the angle θ between the ray from the origin to the point (x, y) and the positive x-axis, confined to −π<θ<=π.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Angle unit: degrees
  • X: x
  • Y: y
yxOutput
0.00.00.0
1.00.090.0
0.0-1.0180.0
-1.00.0-90.0

See more details here.


Area

Supported in: Batch, Streaming

Calculates area of a geometry in meters squared using a spherical approximation of the globe. For a line string or a point, this equals 0.

Expression categories: Geospatial

Output type: Double

See more details here.


Array add

Supported in: Batch, Streaming

Adds a value to the array at a specified index.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array<T>

Example

Argument values:

  • Array: numbers
  • Index: 1
  • Value: 1
numbersOutput
[ 3, 5 ][ 1, 3, 5 ]
[ 2 ][ 1, 2 ]
[ ][ 1 ]

See more details here.


Array cartesian product

Supported in: Batch, Streaming

Compute the cartesian product of arrays.

Expression categories: Array

Output type: Array<Struct>

Example

Argument values:

  • Expression: [first, second]
firstsecondOutput
[ 1, 2 ][ 3, 4 ][ {
first: 1,
second: 3,
}, {
first: 1,
 *second...

See more details here.


Array concat

Supported in: Batch, Streaming

Concatenates the provided arrays into a single array, without de-duplication.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array<T>

Example

Argument values:

  • Expressions: [[ 1, 2, 3 ], [ 4, 5 ]]

Output: [ 1, 2, 3, 4, 5 ]

See more details here.


Array contains

Supported in: Batch, Streaming

Returns true if the array contains the value.

Expression categories: Array, Boolean

Output type: Boolean

Example

Argument values:

  • Array: part_ids
  • Value: BRR-123
part_idsOutput
[ AWE-112, BRR-123 ]true
[ AWE-222, ABC-543 ]false

See more details here.


Array contains null

Supported in: Batch, Streaming

Returns true if the array contains null.

Expression categories: Array, Boolean

Output type: Boolean

Example

Argument values:

  • Expression: part_ids
part_idsOutput
[ AWE-112, BRR-123, null ]true
[ AWE-222, ABC-543 ]false

See more details here.


Array difference

Supported in: Batch, Streaming

Returns all unique elements in the left array that are not in the right array.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array<T>

Example

Argument values:

  • Left array: [ 1, 2, 3 ]
  • Right array: [ 2, 3, 4 ]

Output: [ 1 ]

See more details here.


Array distinct

Supported in: Batch, Streaming

Removes duplicates and returns distinct values from the array.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array<T>

Example

Argument values:

  • Expression: [ 1, 1, 2, 3 ]

Output: [ 1, 2, 3 ]

See more details here.


Array element

Supported in: Batch, Streaming

Returns the element at a given position from the input array. Positions outside of the array will return null.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: T

Example

Argument values:

  • Array: [ 10, 11, 12 ]
  • Position: 1

Output: 10

See more details here.


Array elements are distinct

Supported in: Batch, Streaming

Returns true if the array's elements are distinct, false otherwise. If the array is null, the returned value is false.

Expression categories: Array, Boolean

Output type: Boolean

Example

Argument values:

  • Expression: part_ids
part_idsOutput
[ ABC-123, DCE-123, EFG-123 ]true
[ ABC-123, ABC-123, EFG-123 ]false

See more details here.


Array flatten

Supported in: Batch, Streaming

Creates a single array from an input nested array by unioning the elements within the first level of nesting.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array<T>

Example

Argument values:

  • Expression: array
arrayOutput
[ [ 1, 2, 3 ], [ 4, 5, 6 ] ][ 1, 2, 3, 4, 5, 6 ]

See more details here.


Array intersect

Supported in: Batch, Streaming

Removes duplicates and intersects a list of arrays.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array<T>

Example

Argument values:

  • Expressions: [[ 1, 2, 3 ], [ 3, 4 ]]

Output: [ 3 ]

See more details here.


Array maximum

Supported in: Batch, Streaming

Returns the maximum value of an array column.

Expression categories: Array

Type variable bounds: T accepts Numeric

Output type: T

Example

Argument values:

  • Expression: [ 1, 2, 3 ]

Output: 3

See more details here.


Array minimum

Supported in: Batch, Streaming

Returns the minimum value of an array column.

Expression categories: Array

Type variable bounds: T accepts Numeric

Output type: T

Example

Argument values:

  • Expression: [ 1, 2, 3 ]

Output: 1

See more details here.


Array position

Supported in: Batch, Streaming

Returns a position/index of the first occurrence of the 'value' in a given array. Returns null when value is not found or when any of the arguments are null.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Long

Example

Argument values:

  • Array: [ 10, 11, 12 ]
  • Value: 10

Output: 1

See more details here.


Array remove

Supported in: Batch, Streaming

Returns an array after removing all provided 'value' from the given array.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array<T>

Example

Argument values:

  • Array: [ 1, 2, 3 ]
  • Value: 1

Output: [ 2, 3 ]

See more details here.


Array repeat

Supported in: Batch, Streaming

Returns an array with the contents of array concatenated value times.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array<T>

Example

Argument values:

  • Array: [ 1, 2 ]
  • Value: 2

Output: [ 1, 2, 1, 2 ]

See more details here.


Array reverse

Supported in: Batch, Streaming

Reverse the order of elements in 'array'.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array<T>

Example

Argument values:

  • Expression: [ 1, 2, 3 ]

Output: [ 3, 2, 1 ]

See more details here.


Array sort

Supported in: Batch, Streaming

Returns a sorted array of the given input array. All null values are placed at the end of a descending array and at the front of an ascending array.

Expression categories: Array

Type variable bounds: T accepts ComparableType

Output type: Array<T>

Example

Argument values:

  • Direction: ASCENDING
  • Expression: [ 5, 3, 6 ]

Output: [ 3, 5, 6 ]

See more details here.


Array sort by struct key

Supported in: Batch, Streaming

Returns a sorted array of the given input array of structs sorted by the values of the given struct keys.

Expression categories: Array

Output type: Array<Struct>

Example

Argument values:

  • Input array: [ {
    age: 20,
    }, {
    age: 10,
    }, {
    age: 30,
    } ]
  • Sort keys: [(age, ASCENDING)]

Output: [ {
age: 10,
}, {
age: 20,
}, {
age: 30,
} ]

See more details here.


Array union

Supported in: Batch, Streaming

Removes duplicates and unions a list of arrays.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array<T>

Example

Argument values:

  • Expressions: [[ 1, 2, 3 ], [ 3, 4 ]]

Output: [ 1, 2, 3, 4 ]

See more details here.


Arrays have intersection

Supported in: Batch, Streaming

Checks if given arrays have at least one shared element.

Expression categories: Array, Boolean

Type variable bounds: T accepts AnyType

Output type: Boolean

Example

Argument values:

  • Expressions: [[ 1, 2, 3 ], [ 3, 4 ]]

Output: true

See more details here.


Arrays zip

Supported in: Batch, Streaming

Zips a list of given arrays into a merged array of structs in which the n-th struct contains all n-th values of input arrays.

Expression categories: Array

Output type: Array<Struct>

Example

Argument values:

  • Expressions: [first_array, second_array]
first_arraysecond_arrayOutput
[ 1, 2, 3 ][ 4, 5, 6 ][ {
first_array: 1,
second_array: 4,
}, {
first_array: 2,<...

See more details here.


Base 64 decode to string

Supported in: Batch, Streaming

Base64 decode the given expression. Uses utf-8 encoding for binary.

Expression categories: Binary, Cast, String

Output type: String

Example

Argument values:

  • Expression: encoded
encodedOutput
Zm9vfoo
YmFybar

See more details here.


Base64 decode

Supported in: Batch, Streaming

Base64 decode the given expression.

Expression categories: Binary, Cast

Output type: Binary

Example

Argument values:

  • Expression: city_base64
city_base64Output
TG9uZG9uTG9uZG9u
Q29wZW5oYWdlbg==Q29wZW5oYWdlbg==
TmV3IFlvcms=TmV3IFlvcms=

See more details here.


Base64 encode

Supported in: Batch, Streaming

Base64 encode the given expression.

Expression categories: Binary, Cast

Output type: String

Example

Argument values:

  • Expression: city
cityOutput
LondonTG9uZG9u
CopenhagenQ29wZW5oYWdlbg==
New YorkTmV3IFlvcms=

See more details here.


Bit shift left

Supported in: Batch, Streaming

Shift the given value a number of bits left.

Expression categories: Binary

Type variable bounds: E accepts Byte | Integer | Long | Short

Output type: E

Example

Argument values:

  • Expression: 1
  • Number of bits: 1

Output: 2

See more details here.


Bit shift right

Supported in: Batch, Streaming

Shift the given value a number of bits right.

Expression categories: Binary

Type variable bounds: E accepts Byte | Integer | Long | Short

Output type: E

Example

Argument values:

  • Expression: 1
  • Number of bits: 1

Output: 0

See more details here.


Buffer H3 indices

Supported in: Batch, Streaming

Creates a buffer of distance k from an array of H3 indices.

Expression categories: Geospatial

Output type: Array<H3 Index>

See more details here.


Calculate destination point

Supported in: Batch, Streaming

Calculates the destination point along a specified path given a starting point, course, and distance.

Expression categories: Geospatial

Output type: GeoPoint

Example

Argument values:

  • Course: course
  • Distance: distance
  • Starting point: point_a
  • Calculation method.: GREAT_CIRCLE
point_acoursedistanceOutput
{
latitude: 48.8567,
longitude: 2.3508,
}
225.032000.0{
latitude: 48.65279552300661,
longitude: 2.0427666779658806,
}

See more details here.


Calculate haversine distance

Supported in: Batch, Streaming

Calculates the haversine distance between two latitude and longitude point pairs in meters.

Expression categories: Geospatial

Output type: Double

Example

Argument values:

  • Point a: point_a
  • Point b: point_b
point_apoint_bOutput
{
latitude: 41.507483,
longitude: -99.436554,
}
{
latitude: 38.504048,
longitude: -98.315949,
}
347328.82778977347
{
latitude: 22.308919,
longitude: 113.914603,
}
{
latitude: -33.946111,
longitude: 151.177222,
}
7393894.00134442

See more details here.


Case

Supported in: Batch, Streaming

Choose between different branches based on conditions.

Expression categories: Popular

Type variable bounds: T accepts AnyType

Output type: T

Example

Argument values:

  • Default: Yes
  • Branches: [(
    lessThan(
     left: miles,
     right: 15000,
    ), No)]
milesOutput
20053Yes
10210No
34120Yes

See more details here.


Cast

Supported in: Batch, Streaming

Cast expression to given type.

Expression categories: Cast, Popular

Type variable bounds: C accepts AnyType

Output type: C

Example

Description: Casting long to string Argument values:

  • Expression: 1234
  • Type: String

Output: 1234

See more details here.


Ceil

Supported in: Batch, Streaming

Returns ceil of a given fractional value.

Expression categories: Numeric

Output type: Decimal | Long

Example

Argument values:

  • Expression: 10.123

Output: 11

See more details here.


Change timestamp time zone

Supported in: Batch

Changes the time zone of a timestamp.

Expression categories: Datetime

Output type: Timestamp

Example

Argument values:

  • Output time zone: America/Chicago
  • Timestamp: 2020-04-28T05:09:00Z
  • Input time zone: US/Eastern

Output: 2020-04-28T04:09:00Z

See more details here.


Character-wise translate string

Supported in: Batch, Streaming

Replaces individual characters from the input column that are found in the matching with the corresponding character in the replacement string. If the matching string is longer than the replacement string, characters at the end of the matching string will be dropped.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: translate
  • Matching string: rnlt
  • Replacement string: 123

Output: 1a2s3ae

See more details here.


Chunk string

Supported in: Batch

Chunk string into chunks of a specified size and on specified separators.

Expression categories: String

Output type: Array<String>

Example

Argument values:

  • Expression: string
  • Chunk overlap: null
  • Chunk size: 10
  • Keep separator: null
  • Separators: null
stringOutput
hello[ hello ]
hello world. the quick brown fox jumps over the fence.[ hello, world., the quick, brown fox, jumps, over the, fence. ]
hello world.
the quick brown fox
jumps over the fence.
[ hello, world., the quick, brown fox, jumps, over the, fence. ]
hello world.
the quick brown fox
jumps over the fence.
[ hello, world., the quick, brown fox, jumps, over the, fence. ]

See more details here.


Cipher decrypt

Supported in: Batch, Streaming

Decrypts expression with cipher.

Expression categories: Other

Output type: String

Example

Argument values:

  • Cipher license rid: ri.bellaso.main.cipher-license.1-decrypt
  • Expression: string
stringOutput
CIPHER::ri.bellaso.main.cipher-channel.1::OCRBIW3iHDltOGa6MEHwb7f/Dw==::CIPHERbar

See more details here.


Cipher encrypt

Supported in: Batch, Streaming

Encrypts expression with cipher.

Expression categories: Other

Output type: Cipher Text

Example

Argument values:

  • Cipher license rid: ri.bellaso.main.cipher-license.1-encrypt
  • Expression: string
stringOutput
barCIPHER::ri.bellaso.main.cipher-channel.1::OCRBIW3iHDltOGa6MEHwb7f/Dw==::CIPHER

See more details here.


Cipher hash

Supported in: Batch, Streaming

Hashes expression with cipher.

Expression categories: Other

Output type: Cipher Text

Example

Argument values:

  • Cipher license rid: ri.bellaso.main.cipher-license.1-hash
  • Expression: string
stringOutput
barCIPHER::ri.bellaso.main.cipher-channel.1::c70a14f5cc57c940e3265045a5554d641bd549ee27a571a05cdbc75c77762eb86b1144c12f1bb7811a0bcec08b2f143989c44022e4664f615d6885ad640332cb::CIPHER

See more details here.


Clean string

Supported in: Batch, Streaming

Applies the set of clean actions on the expression.

Expression categories: Data preparation, String

Output type: String

Example

Argument values:

  • Clean actions: {trim}
  • Expression: hello world

Output: hello world

See more details here.


Compact a set of H3 indices

Supported in: Batch, Streaming

Compact H3 indices into a subset of mixed resolutions if possible. Running the inverse operation uncompact is guaranteed to yield the same set of indices that were compacted if the input indices were all the same resolution. If any of the input indices are invalid this transform will return null. Output indices are sorted in ascending order.

Expression categories: Geospatial

Output type: Array<H3 Index>

Example

Argument values:

  • H3 indices: h3_set
h3_setOutput
[ 87754a914ffffff, 87754a916ffffff, 87754a930ffffff, 87754a932ffffff, 87754a933ffffff, 87754a934ffff...[ 86754e64fffffff, 87754a914ffffff, 87754a916ffffff, 87754a930ffffff, 87754a932ffffff, 87754a933ffff...

See more details here.


Concatenate strings

Supported in: Batch, Streaming

Concatenates a list of strings with the specified separator.

Expression categories: String

Output type: String

Example

Argument values:

  • Expressions: [hello, world]
  • Null output if any input is null: null
  • Separator: _

Output: hello_world

See more details here.


Construct GeoPoint column

Supported in: Batch, Streaming

Constructs a GeoPoint column from a latitude and longitude column. Validates that the latitude parameter is between -90 and 90, inclusive, and that the longitude parameter is between -180 and 180, inclusive; if not, returns a null value.

Expression categories: Geospatial

Output type: GeoPoint

See more details here.


Construct delegated media Gotham identifier (GID)

Supported in: Batch, Streaming

Expression to construct a valid delegated media Gotham identifier (GID) from components. If result is more than 1024 characters, produces a null row.

Expression categories: Other

Output type: Delegated media Gotham identifier (GID)

Example

Argument values:

  • Media locator: locator
  • Media type: mediaType
  • Producer instance: invalidUuid
mediaTypelocatorOutput
testaudiotypeempty stringnull

See more details here.


Convert DMS to GeoPoint

Supported in: Batch, Streaming

Converts a geospatial coordinate string in degrees, minutes, seconds (DMS) format to a GeoPoint in accordance to user-provided formats. The default formats are DDD*°MM*'SS*"H and DDD*MMSSssH. The formats are run in order, and the first matching format will be returned. See formatting guide on how to write user-generated formats.

Expression categories: Geospatial

Output type: GeoPoint

Example

Argument values:

  • Coordinates: coordinates
  • Formats: null
coordinatesOutput
078261594N075220923E{
latitude: 78.43776111111112,
longitude: 75.36923055555555,
}
046115095S069524119W{
latitude: -46.19748611111111,
longitude: -69.87810833333333,
}
023°45'55"N 069°52'11"W{
latitude: 23.76527777777777,
longitude: -69.86972222222222,
}
-123°55'55"N 069°53'00"W{
latitude: -123.93194444444445,
longitude: -69.88333333333334,
}
123456789N23456789E{
latitude: 123.76885833333333,
longitude: 23.768858333333334,
}

See more details here.


Convert GeoPoint to Geohash

Supported in: Batch, Streaming

Converts a GeoPoint to a base32-encoded Geohash with specified precision that contains the GeoPoint. For more information on Geohash, see: https://en.wikipedia.org/wiki/Geohash .

Expression categories: Geospatial

Output type: Geohash

See more details here.


Convert GeoPoint to MGRS

Supported in: Batch, Streaming

Converts a GeoPoint following the WGS84 coordinate system (which is EPSG:4326) to a MGRS (military grid reference system) coordinate. The output MGRS will follow a space-delimited format with 5 digits of precision.

Expression categories: Geospatial

Output type: MGRS

Example

Argument values:

  • Expression: geoPoint
geoPointOutput
{
 latitude -> 88.99999659707431,
 longitude -> 0.9996456505181999,
}
Z AF 01937 88990

See more details here.


Convert GeoPoint to geometry

Supported in: Batch, Streaming

Convert GeoPoint to a GeoJSON of type point.

Expression categories: Geospatial

Output type: Geometry

See more details here.


Convert MGRS to GeoPoint

Supported in: Batch, Streaming

Converts a MGRS (military grid reference system) coordinate into a GeoPoint following the WGS84 coordinate system (which is EPSG:4326).

Expression categories: Geospatial

Output type: GeoPoint

Example

Argument values:

  • Expression: mgrs
mgrsOutput
ZAF0193788990{
latitude: 88.99999659707431,
longitude: 0.9996456505181999,
}

See more details here.


Convert a string to date

Supported in: Batch, Streaming

Returns the date given a formatted string in accordance to the Java DateTimeFormatter. The default formats are yyyy-MM-dd and yyyy-MM-dd'T'HH:mm:ss.SSSXXX. The formats are run in order, the first matching format will be returned.

Expression categories: Cast, Datetime

Output type: Date

Example

Description: Date formats are optional Argument values:

  • String: 2020-04-28
  • Formats: null

Output: 2020-04-28

See more details here.


Convert a string to timestamp

Supported in: Batch, Streaming

Returns the timestamp given a formatted string in accordance to the Java DateTimeFormatter. The default formats are yyyy-MM-dd'T'HH:mm:ss.SSSXXX and yyyy-MM-dd. The formats are run in order, the first matching format will be returned.

Expression categories: Cast, Datetime

Output type: Timestamp

Example

Argument values:

  • String: timestamp
  • Formats: [dd-yyyy-MM HH:mm, yyyy-MM-dd]
  • Time zone: null
timestampOutput
28-2020-04 10:09:002020-04-28T10:09:00Z
2020-04-282020-04-28T00:00:00Z

See more details here.


Convert base

Supported in: Batch

Convert a number (or it string representation) from one base to another.

Expression categories: Binary, Cast, Numeric

Output type: String

Example

Argument values:

  • Expression: 4A801
  • From base: 16
  • To base: 10

Output: 305153

See more details here.


Convert between angle units

Supported in: Batch, Streaming

Expression categories: Geospatial, Numeric

Output type: Double

See more details here.


Convert between distance units

Supported in: Batch, Streaming

Expression categories: Numeric

Output type: Double

See more details here.


Convert between time units

Supported in: Batch, Streaming

Expression categories: Datetime

Output type: Double

See more details here.


Convert between weight units

Supported in: Batch, Streaming

Expression categories: Numeric

Output type: Double

See more details here.


Convert data to JSON

Supported in: Batch, Streaming

Transforms input into json string.

Expression categories: File, String

Output type: String

Example

Argument values:

  • Input: struct
structOutput
{
airline: {
id: NA,
},
}
{"airline":{"id":"NA"}}

See more details here.


Convert from Ontology GeoPoint

Supported in: Batch, Streaming

Convert an Ontology GeoPoint into a regular GeoPoint. Ontology GeoPoints are strings of the format '{lat},{lon}', where -90 <= lat <= 90 and -180 <= lon <= 180. Regular GeoPoints are structures of the format {"longitude": {long},"latitude": {lat}}.

Expression categories: Geospatial

Output type: GeoPoint

Example

Argument values:

  • Expression: geopoint
geopointOutput
-20.0000000,80.0000000{
latitude: -20.0,
longitude: 80.0,
}
38.9031000,-77.0599000{
latitude: 38.9031,
longitude: -77.0599,
}
41.9876543,-99.1234568{
latitude: 41.9876543,
longitude: -99.1234568,
}

See more details here.


Convert from hexadecimal

Supported in: Batch

Inverse of hex. Interprets each pair of characters as a hexadecimal number and converts to the byte representation of the number.

Expression categories: Numeric, String

Output type: Binary

Example

Argument values:

  • Expression: string_hex
string_hexOutput
68656C6C6FaGVsbG8=
3039MDk=
FFFFFFFFFFFFCFC7////////z8c=
4C6F6E646F6ETG9uZG9u

See more details here.


Convert from hexadecimal to string

Supported in: Batch, Streaming

Inverse of hex, interprets each pair of characters as a hexadecimal number and converts to the utf-8 string of the byte representation of the number.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: string_hex
string_hexOutput
68656C6C6Fhello
4C6F6E646F6ELondon

See more details here.


Convert geocentric coordinates to WGS 84 geodesic coordinates

Supported in: Batch, Streaming

Converts geocentric cartesian coordinates to geodesic polar coordinates. Altitude is defined as height-above-ellipsoid. If any coordinates are null, the output will be null.

Expression categories: Geospatial

Output type: GeoPoint with altitude

Example

Argument values:

  • X coordinate: x_coordinate
  • Y coordinate: y_coordinate
  • Z coordinate: z_coordinate
x_coordinatey_coordinatez_coordinateOutput
0.06378137.00.0{
 altitude -> 0.0,
 geoPoint -> {
 latitude -> 0.0,
 longitude -> 90.0,
},
}
0.0-6378137.00.0{
 altitude -> 0.0,
 geoPoint -> {
 latitude -> 0.0,
 longitude -> -90.0,
},
}
-6378137.00.00.0{
 altitude -> 0.0,
 geoPoint -> {
 latitude -> 0.0,
 longitude -> 180.0,
},
}
-6378137.0-0.00.0{
 altitude -> 0.0,
 geoPoint -> {
 latitude -> 0.0,
 longitude -> -180.0,
},
}
0.00.06356752.314245179{
 altitude -> 0.0,
 geoPoint -> {
 latitude -> 90.0,
 longitude -> 0.0,
},
}
0.00.0-6356752.314245179{
 altitude -> 0.0,
 geoPoint -> {
 latitude -> -90.0,
 longitude -> 0.0,
},
}

See more details here.


Convert legacy OffsetDateTime

Supported in: Batch

Converts a legacy OffsetDateTime column to a timestamp that can be used in all Foundry pipelines. The timestamp is returned in UTC.

Expression categories: Datetime

Output type: Timestamp

See more details here.


Convert linestring to polygon

Supported in: Batch, Streaming

Convert a linestring geometry to a polygon geometry. This expression assumes the linestring geometry is closed. If not, the expression will return null.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Expression: polygon_points
polygon_pointsOutput
{"type":"LineString","coordinates":[[-77.49,38.01],[-77.47,38.15],[-77.19,38.14],[-77.49,38.01]]}{"type":"Polygon","coordinates":[[[-77.49,38.01],[-77.47,38.15],[-77.19,38.14],[-77.49,38.01]]]}

See more details here.


Convert timestamp from UTC

Supported in: Batch, Streaming

Converts a timestamp from UTC to a given time zone.

Expression categories: Datetime

Output type: Timestamp

Example

Argument values:

  • Time zone: EST
  • Timestamp: 2020-04-28T10:09:00Z

Output: 2020-04-28T05:09:00Z

See more details here.


Convert timestamp to UTC

Supported in: Batch, Streaming

Converts a timestamp to UTC based on a given time zone.

Expression categories: Datetime

Output type: Timestamp

Example

Argument values:

  • Time zone: EST
  • Timestamp: 2020-04-28T10:09:00Z

Output: 2020-04-28T15:09:00Z

See more details here.


Convert to Ontology GeoPoint

Supported in: Batch, Streaming

Convert a GeoPoint into a string that the Ontology will accept for a geo-indexed column (a geohash type column). Ontology GeoPoints are strings of the format '{lat},{lon}', where -90 <= lat <= 90 and -180 <= lon <= 180.

Expression categories: Geospatial

Output type: Ontology GeoPoint

Example

Argument values:

  • Expression: point
pointOutput
{
latitude: -20.0,
longitude: 80.0,
}
-20.0000000,80.0000000
{
latitude: 38.9031,
longitude: -77.0599,
}
38.9031000,-77.0599000
{
latitude: 41.987654321,
longitude: -99.123456789,
}
41.9876543,-99.1234568
nullnull

See more details here.


Convert to hexadecimal

Supported in: Batch, Streaming

Computes hex value of given expression.

Expression categories: Numeric, String

Output type: String

Example

Argument values:

  • Expression: city_hex
city_hexOutput
TG9uZG9u4C6F6E646F6E

See more details here.


Convert to octal

Supported in: Batch, Streaming

Computes octal value of given expression.

Expression categories: Numeric

Output type: String

Example

Argument values:

  • Expression: 12345

Output: 30071

See more details here.


Cosine

Supported in: Batch, Streaming

Takes the cosine of an angle.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Angle unit: degrees
  • Angle value: angle
angleOutput
0.01.0
90.00.0
180.0-1.0

See more details here.


Create GeoPoint from coordinate system

Supported in: Batch, Streaming

Takes a pair of coordinates from a source coordinate system and transforms them into WGS 84 latitude/longitude values. Coordinate systems (also know as coordinate reference systems or spatial reference systems) represent different systems for identifying the location of a point on the globe and are often identified by key in standardized databases such as EPSG. If the given projection is not supported or either coordinate is null, returns null.

Expression categories: Geospatial

Output type: GeoPoint

Example

Argument values:

  • Source coordinate system: EPSG:32618
  • X coordinate: x_coordinate
  • Y coordinate: y_coordinate
x_coordinatey_coordinateOutput
322190.22339529654306505.703879281{
 latitude -> 38.88944258,
 longitude -> -77.05014581,
}
323243.13615360594318298.06539618{
 latitude -> 38.99585379643137,
 longitude -> -77.04105678275415,
}
407063.634653000164764873.719585404{
 latitude -> 43.03086518778498,
 longitude -> -76.14077251822197,
}

See more details here.


Create an empty array

Supported in: Batch, Streaming

Returns an empty array of the given type.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array<T>

Example

Argument values:

  • Type: String

Output: [ ]

See more details here.


Create array

Supported in: Batch, Streaming

Creates an array from the columns provided.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array<T>

Example

Argument values:

  • Expressions: [1, 2, 3]

Output: [ 1, 2, 3 ]

See more details here.


Create ellipse geometry

Supported in: Batch, Streaming

Approximates an ellipse as a polygon centered at the given geo coordinate. The distance between points is computed along the surface of the WGS84 ellipsoid approximating the surface of the earth.

Expression categories: Geospatial

Output type: Geometry

See more details here.


Create geodesic line string

Supported in: Batch, Streaming

Creates a geodesic line between two points.

Expression categories: Geospatial

Output type: Geometry

See more details here.


Create linestring geometry

Supported in: Batch, Streaming

Creates a GeoJSON linestring geometry from the given points.

Expression categories: Geospatial

Type variable bounds: T accepts Struct<longitude, latitude>

Output type: Geometry

Example

Argument values:

  • Points: points
pointsOutput
[ {
latitude: 10.0,
longitude: 0.0,
}, {
latitude: 10.0,
longitude: 10.0,
} ]
{"type":"LineString","coordinates":[[0.0,10.0],[10.0,10.0]]}
[ {
latitude: 10.0,
longitude: 10.0,
}, {
latitude: 20.0,<...
{"type":"LineString","coordinates":[[10.0,10.0],[20.0,20.0],[30.0,30.0]]}
[ {
latitude: 0.0,
longitude: 179.0,
}, {
latitude: 0.0,
longitude: 181.0,
} ]
{"type":"MultiLineString","coordinates":[[[179.0,0.0],[180.0,0.0]],[[-180.0,0.0],[-179.0,0.0]]]}
[ {
latitude: 0.0,
longitude: -179.0,
}, {
latitude: 0.0,
longitude: -181.0,
} ]
{"type":"MultiLineString","coordinates":[[[180.0,0.0],[179.0,0.0]],[[-179.0,0.0],[-180.0,0.0]]]}

See more details here.


Create map from arrays

Supported in: Batch, Streaming

Returns a map using key-value pairs from the zipped arrays. Null values are not allowed as keys and will cause a runtime error.

Expression categories: Array, Map

Type variable bounds: K accepts AnyType**V accepts AnyType

Output type: Map<K, V>

Example

Argument values:

  • Array of keys: [ 1, 2, 3 ]
  • Array of values: [ 4, 5, 6 ]

Output: {
 1 -> 4,
 2 -> 5,
 3 -> 6,
}

See more details here.


Create null value

Supported in: Batch, Streaming

Returns a null value of the given type.

Expression categories: Data preparation

Type variable bounds: T accepts AnyType

Output type: T

Example

Argument values:

  • Type: String

Output: null

See more details here.


Create range fan geometry

Supported in: Batch, Streaming

Approximates a range fan as a polygon, specifying the region of all points whose haversine distance to the origin point is between the minimum and maximum radii, and to which the bearing from the origin is contained with the angular range centered around the specified bearing parameter. The left and right sides of the range fan are drawn as geodesic lines computed along the surface of the WGS84 ellipsoid approximating the surface of the earth. Returns null if the range spans more than 180 degrees while also crossing the anti-meridian, or if the maximum radius spans more than half of the circumference of the earth.

Expression categories: Geospatial

Output type: Geometry

See more details here.


Create struct column

Supported in: Batch, Streaming

Combines multiple columns into a single structured column.

Expression categories: Struct

Output type: Struct

Example

Argument values:

  • Struct elements: [tail_number, id]
tail_numberidOutput
MT-1121{
id: 1,
tail_number: MT-112,
}
XB-1232{
id: 2,
tail_number: XB-123,
}
PA-6543{
id: 3,
tail_number: PA-654,
}

See more details here.


Create time series reference values

Supported in: Batch, Streaming

Creates time series reference values.

Expression categories: String

Output type: String

Example

Argument values:

  • Series identifier: seriesId
  • Time series sync RID: ri.time-series-catalog.main.sync.11111111
seriesIdOutput
seriesOne{"seriesId":"seriesOne","syncRid":"ri.time-series-catalog.main.sync.11111111"}

See more details here.


Current date

Supported in: Batch, Streaming

Returns the current date of when computation started.

Expression categories: Datetime

Output type: Date

See more details here.


Current timestamp

Supported in: Batch, Streaming

Returns the current timestamp when computation started.

Expression categories: Datetime

Output type: Timestamp

See more details here.


Date sequence

Supported in: Batch

Creates an array with dates in range from start to end.

Expression categories: Datetime

Output type: Array<Date>

Example

Argument values:

  • End date: last_planned_flight
  • Start date: first_planned_flight
  • Step unit: DAYS
  • Step size: null
first_planned_flightlast_planned_flightOutput
2023-01-012023-01-03[ 2023-01-01, 2023-01-02, 2023-01-03 ]
2023-01-312023-02-02[ 2023-01-31, 2023-02-01, 2023-02-02 ]
2023-02-282023-03-01[ 2023-02-28, 2023-03-01 ]

See more details here.


Decode Geobuf as GeoJSON

Supported in: Batch, Streaming

Decode Geobuf geometry as GeoJSON.

Expression categories: Geospatial

Output type: Geometry

See more details here.


Divide numbers

Supported in: Batch, Streaming

Divide one number by another number.

Expression categories: Numeric

Output type: Decimal | Double

Example

Argument values:

  • Left: col_a
  • Right: col_b
col_acol_bOutput
422.0
1125.5

See more details here.


Encode GeoJSON as Geobuf

Supported in: Batch, Streaming

Encodes GeoJSON geometry as Geobuf.

Expression categories: Geospatial

Output type: Geobuf

See more details here.


Ends with

Supported in: Batch, Streaming

Expression categories: Boolean, String

Output type: Boolean

Example

Argument values:

  • Expression: Hello World
  • Ignore case: true
  • Value: world

Output: true

See more details here.


Epoch milliseconds to date

Supported in: Batch, Streaming

Converts from epoch milliseconds to date, UTC.

Expression categories: Cast, Datetime

Output type: Date

Example

Description: You can convert epoch timestamps in milliseconds to the date type Argument values:

  • Expression: 1673964111000

Output: 2023-01-17

See more details here.


Epoch milliseconds to timestamp

Supported in: Batch, Streaming

Converts from epoch milliseconds to timestamp in UTC.

Expression categories: Cast, Datetime

Output type: Timestamp

Example

Description: You can convert epoch timestamps in milliseconds to the timestamp type Argument values:

  • Expression: 1673964111000

Output: 2023-01-17T14:01:51Z

See more details here.


Epoch seconds to date

Supported in: Batch, Streaming

Converts from epoch seconds to date in UTC.

Expression categories: Cast, Datetime

Output type: Date

Example

Description: You can convert epoch timestamps to the date type Argument values:

  • Expression: 1673964111

Output: 2023-01-17

See more details here.


Epoch seconds to timestamp

Supported in: Batch, Streaming

Converts from epoch seconds to timestamp in UTC.

Expression categories: Cast, Datetime

Output type: Timestamp

Example

Description: You can convert epoch timestamps to the timestamp type Argument values:

  • Expression: 1673964111

Output: 2023-01-17T14:01:51Z

See more details here.


Equals

Supported in: Batch, Streaming

Returns true if left and right are equal.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Left: a
  • Right: b
abOutput
11true
10false

See more details here.


Exponential

Supported in: Batch, Streaming

Calculates the exponential, e^x, of a column.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Expression: 2.0

Output: 7.38905609893

See more details here.


Extract all regex matches

Supported in: Batch

Extract all instances of a regex match into an array.

Expression categories: Regex, String

Output type: Array<String>

Example

Description: Extract the first two initials from each code. Argument values:

  • Expression: MT-112, XB-967
  • Group: 1
  • Pattern: (\w\w)(-)

Output: [ MT, XB ]

See more details here.


Extract date part

Supported in: Batch, Streaming

Extracts a part of a date like year or day of week.

Expression categories: Datetime

Output type: Integer

See more details here.


Extract document metadata

Supported in: Batch

Extract metadata fields from a document.

Expression categories: Media

Output type: Struct

Example

Argument values:

  • Document metadata information to include: [Document Author, Page Count, Document Title]
  • Media reference: Media Reference
Media ReferenceOutput
{"mimeType":"application/pdf","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}}{
author: Jane Doe,
page_count: 23,
title: Document Title,
}

See more details here.


Extract imagery metadata

Supported in: Batch

Extract metadata fields from an image.

Expression categories: Media

Output type: Struct

Example

Argument values:

  • Imagery metadata information to include: [Attributes, Bands, Bytes, Dimensions, Format, Geographic Metadata, ICC Profile]
  • Media reference: Media Reference
Media ReferenceOutput
{"mimeType":"image/tiff","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}}{
attributes: {
 outer_key1 -> {
 inner_key1 -> inner_value1,
},
...

See more details here.


Extract map keys

Supported in: Batch, Streaming

Return map keys as an array. Note the order of array elements is not deterministic.

Expression categories: Map

Type variable bounds: K accepts AnyType

Output type: Array<K>

Example

Argument values:

  • Map: flight_number
flight_numberOutput
{
 MT-111 -> 2,
 XB-134 -> 1,
}
[ XB-134, MT-111 ]

See more details here.


Extract map values

Supported in: Batch, Streaming

Return map values as an array. Note the order of array elements is not deterministic.

Expression categories: Map

Type variable bounds: V accepts AnyType

Output type: Array<V>

Example

Argument values:

  • Map: flight_number
flight_numberOutput
{
 MT-111 -> 2,
 XB-134 -> 1,
}
[ 1, 2 ]

See more details here.


Extract text from PDF

Supported in: Batch

Extract raw text from pages in PDF files.

Expression categories: Media

Output type: Array<String>

Example

Argument values:

  • Media reference: Media Reference
  • End page: End Page
  • Start page: Start Page
Media ReferenceStart PageEnd PageOutput
{"mimeType":"application/pdf","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}}12[ first page, second page ]

See more details here.


Extract text from PDF (using OCR)

Supported in: Batch

Run OCR on PDF files in a media set to extract text.

Expression categories: Media

Output type: Array<String>

See more details here.


Extract text from images (using OCR)

Supported in: Batch

Run OCR on image files in a media set to extract text.

Expression categories: Media

Output type: String

See more details here.


Extract timestamp part

Supported in: Batch, Streaming

Extracts a part of a timestamp like year or day of week.

Expression categories: Datetime

Output type: Integer

See more details here.


Filter array elements

Supported in: Batch, Streaming

Filters an array based on the filter expression. Note, array index starts at 1.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array<T>

Example

Argument values:

  • Array: array
  • Expression to filter:
    isNotNull(
     expression: element,
    )
arrayOutput
[ 2, 5, null, 11 ][ 2, 5, 11 ]

See more details here.


Filter by geometry type

Supported in: Batch, Streaming

Nulls any values in the geometry column that are not of the provided geometry types.

Expression categories: Geospatial

Output type: Geometry

See more details here.


First non null value (coalesce)

Supported in: Batch, Streaming

Picks first non null value of the inputs. Known as coalesce in sql.

Expression categories: Data preparation

Type variable bounds: T accepts AnyType

Output type: T

Example

Argument values:

  • Expressions: [tail_number, airline]
  • Treat empty strings as null.: null
tail_numberairlineOutput
XB-123nullXB-123
nullMTMT

See more details here.


Floor

Supported in: Batch, Streaming

Returns floor of a given fractional value.

Expression categories: Numeric

Output type: Decimal | Long

Example

Argument values:

  • Expression: 10.123

Output: 10

See more details here.


Format date as string

Supported in: Batch, Streaming

Returns the date as formatted string in accordance to the Java DateTimeFormatter. The default format is ISO8601.

Expression categories: Cast, String

Output type: String

Example

Argument values:

  • Date: 2022-12-20
  • Format: yy-MM-dd

Output: 22-12-20

See more details here.


Format number

Supported in: Batch

Formats a number to a specific number of decimal places.

Expression categories: Cast, Numeric, String

Output type: String

Example

Description: Formats a number to 2 decimal places. Argument values:

  • Decimal places: 2
  • Number: 1234.5678

Output: 1,234.57

See more details here.


Format string

Supported in: Batch, Streaming

Formats string printf style.

Expression categories: String

Output type: String

Example

Argument values:

  • Format arguments: [argument1, argument2]
  • Format string: Hello %s, my name is %s
argument1argument2Output
AliceBobHello Alice, my name is Bob
JaneJohnHello Jane, my name is John

See more details here.


Format timestamp as string

Supported in: Batch, Streaming

Returns the timestamp as ISO8601 formatted string.

Expression categories: Cast, Datetime, String

Output type: String

Example

Argument values:

  • Timestamp: 2022-10-01T09:00:00Z
  • Format: yyyy-MM-dd
  • Time zone: null

Output: 2022-10-01

See more details here.


Geometries have intersection

Supported in: Batch, Streaming

Determines if two geometries intersect.

Expression categories: Geospatial

Output type: Boolean

Example

Argument values:

  • Geometry a: geometry_a
  • Geometry b: geometry_b
geometry_ageometry_bOutput
{"coordinates":[[[-112.94377956164206,34.81725414459382],[-112.94377956164206,30.006795384733323], [...{"coordinates":[[[-103.78627755867336,33.162750522563925],[-103.78627755867336,28.29724741894266],[-...true
{"coordinates":[[[0.3651446504365481,15.159518507965103],[0.3651446504365481,13.427462911044273],[3....{"coordinates":[[[5.656394524666183,13.405417496831944],[5.656394524666183,11.29869961209053],[8.551...false

See more details here.


Geometry 3d affine transformation

Supported in: Batch, Streaming

Applies a three dimensional affine transformation to the input geometry. This transformation occurs in the user-provided projected coordinate system, and the result is projected back to WGS84. Two dimensional geometries will have their z-coordinates set to 0 before the affine transformation is applied. The returned geometry is three dimensional and for each coordinate [x,y,z] represents the matrix multiplication [[x0, x1, x2, x-offset], [y0, y1, y2, y-offset], [z0, z1, z2, z-offset], [0, 0, 0, 1]] * [x, y, z, 1], where the first three ordinates of the result are returned.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Geometry column: geometry
  • Projected coordinate system: EPSG:4326
  • X offset: 0.0
  • X0: 0.0
  • X1: -1.0
  • X2: 0.0
  • Y offset: 0.0
  • Y0: 1.0
  • Y1: 0.0
  • Y2: 0.0
  • Z offset: 0.0
  • Z0: 0.0
  • Z1: 0.0
  • Z2: 0.0
geometryOutput
{"type":"Polygon","coordinates":[[[0.0, 0.0],[1.0, 0.0],[1.0, 1.0],[0.0, 1.0],[0.0, 0.0]]]}{"type":"Polygon","coordinates":[[[0.0, 0.0, 0.0],[0.0, 1.0, 0.0],[-1.0, 1.0, 0.0],[-1.0, 0.0, 0.0],[0.0, 0.0, 0.0]]]}

See more details here.


Geometry array (unary) union

Supported in: Batch, Streaming

Given an array of geometries, combine these into a single geometry, merging without overlap.

Expression categories: Geospatial

Type variable bounds: T accepts Geometry

Output type: T

Example

Argument values:

  • Expression: geometries
geometriesOutput
[ {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}, {"type":"Polygon","coordinates":[[[0.5,0.0],[1.5,0.0],[1.5,1.0],[0.5,1.0],[0.5,0.0]]]} ]{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[0.5,1.0],[1.0,1.0],[1.5,1.0],[1.5,0.0],[1.0,0.0],[0.5,0.0],[0.0,0.0]]]}
[ ]null
nullnull

See more details here.


Geometry array line dissolve

Supported in: Batch, Streaming

Given an array of geometries, combine these into a linear geometry. Dissolve simplifies an input set of line-strings by removing unnecessary nodes and concatenating line-strings that can be combined. Z-coordinates will be ignored for the purpose of the dissolve operation, but the vertices in the resultant geometry will have the same z-coordinate as the corresponding points in the input.

Expression categories: Geospatial

Type variable bounds: T accepts Geometry

Output type: T

Example

Argument values:

  • Expression: geometries
geometriesOutput
[ {"type":"LineString","coordinates":[[0,0],[0,1],[1,1]]}, {"type":"LineString","coordinates":[[1,1]...{"type":"MultiLineString","coordinates":[[[5.0, 5.0],[4.0, 4.0],[3.0, 3.0],[2.0, 2.0],[1.0, 1.0],[0.0, 1.0],[0.0, 0.0]],[[7.0, 7.0], [6.0, 7.0], [6.0, 6.0]]]}
[ {"type":"LineString","coordinates":[[0,0,1],[0,1,1],[1,1,1]]}, {"type":"LineString","coordinates":[[1,1,1],[2,2,2]]}, {"type":"LineString","coordinates":[[1,1,2],[2,2,2],[3,3,3]]} ]{"type":"LineString","coordinates":[[0.0, 0.0, 1.0],[0.0, 1.0, 1.0],[1.0, 1.0, 1.0],[2.0, 2.0, 2.0],[3.0, 3.0, 3.0]]}

See more details here.


Geometry buffer

Supported in: Batch, Streaming

Computes the buffer of a geometry for both positive and negative buffer distances. Returns an approximate representation of all points within a given distance of the this geometric object (or for negative buffers, all points minus those within the buffer distance of the boundary). Buffer drops any z coordinates, and zero/negative distance buffers of lines and points will return null.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Buffer distance: distance
  • Geometry column: geometry
  • Projected coordinate system: EPSG:32618
  • Buffer cap style: ROUND
  • Buffer join style: ROUND
  • Line segments per quadrant: 8
  • Single or double sided: DOUBLE_SIDED
geometrydistanceOutput
{"type":"Point","coordinates":[-77.07368071728229,38.83040844313318]}10.0{"type":"Polygon","coordinates":[[[-77.07356558299462, 38.83041048767274],[-77.07356728534256, 38.83...
{"type":"LineString","coordinates":[[-77.07368071728229,38.83040844313318, 1],[-77.0725293738795,38.83042888342659, 1]]}10.0{"type":"Polygon","coordinates":[[[-77.07253198637027, 38.83051894052714],[-77.07250947453703, 38.83...
{"type":"Polygon","coordinates":[[[-77.07368071728229,38.83040844313318, 1],[-77.0725293738795,38.83...10.0{"type":"Polygon","coordinates":[[[-77.07379585155829, 38.83040639848026],[-77.07382199292853, 38.83...

See more details here.


Geometry centroid

Supported in: Batch, Streaming

Return the centroid, or "center of mass", of the geometry using a spherical approximation of the globe. If the geometry is a collection of mixed dimensions, only the elements of the highest dimension will contribute to the centroid (e.g. in a collection of points, lines and polygons, points and lines are ignored).

Expression categories: Geospatial

Output type: GeoPoint

See more details here.


Geometry contains

Supported in: Batch, Streaming

Determines if geometry a contains geometry b. Points or lines lying on the boundary of a polygon are not contained within another geometry.

Expression categories: Geospatial

Output type: Boolean

Example

Argument values:

  • Geometry a: geometry_a
  • Geometry b: geometry_b
geometry_ageometry_bOutput
{"coordinates":[[[-112.94377956164206,34.81725414459382],[-112.94377956164206,30.006795384733323], [...{"type":"Point","coordinates":[-100.0,32.0]}true
{"coordinates":[[[-112.94377956164206,34.81725414459382],[-112.94377956164206,30.006795384733323], [...{"type":"LineString","coordinates":[[-112.94377956164206,34.81725414459382],[-112.94377956164206,30.006795384733323]]}false
{"type":"LineString","coordinates":[[-112.94377956164206,34.81725414459382],[-112.94377956164206,30.006795384733323]]}{"type":"Point","coordinates":[-112.94377956164206,34.81725414459382]}false
{"type":"Point","coordinates":[-112.94377956164206,34.81725414459382]}{"type":"Point","coordinates":[-112.94377956164206,34.81725414459382]}true
{"coordinates":[[[-112.94377956164206,34.81725414459382],[-112.94377956164206,30.006795384733323], [...{"coordinates":[[[-111.94377956164206,33.81725414459382],[-111.94377956164206,31.006795384733323], [...true

See more details here.


Geometry difference

Supported in: Batch, Streaming

Calculates the portion of geometry a that is not intersecting geometry b.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Geometry a: geometry_a
  • Geometry b: geometry_b
geometry_ageometry_bOutput
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}{"type":"Polygon","coordinates":[[[0.25,0.25],[0.5,0.25],[0.5,0.5],[0.25,0.5],[0.25,0.25]]]}{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]],[[0.25,0.25],[0.5,0.25],[0.5,0.5],[0.25,0.5],[0.25,0.25]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}{"type":"Polygon","coordinates":[[[0.0,0.0],[0.5,0.0],[0.5,1.0],[0.0,1.0],[0.0,0.0]]]}{"type":"Polygon","coordinates":[[[0.5,1.0],[1.0,1.0],[1.0,0.0],[0.5,0.0],[0.5,1.0]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}{"type":"Polygon","coordinates":[[[5.0,5.0],[5.0,6.0],[6.0,6.0],[6.0,5.0],[5.0,5.0]]]}{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}{"type":"LineString","coordinates":[[0.0,0.0],[0.0,1.0]]}{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}

See more details here.


Geometry explode to array

Supported in: Batch, Streaming

Converts a geometry to an array of its constituent simple geometries.

Expression categories: Geospatial

Output type: Array<Geometry>

Example

Argument values:

  • Expression: geometry
geometryOutput
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}[ {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} ]
{"type":"MultiPolygon","coordinates":[[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]],[[[5.0,5.0],[5.0,6.0],[6.0,6.0],[6.0,5.0],[5.0,5.0]]]]}[ {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}, {"type":"Polygon","coordinates":[[[5.0,5.0],[5.0,6.0],[6.0,6.0],[6.0,5.0],[5.0,5.0]]]} ]

See more details here.


Geometry intersection

Supported in: Batch, Streaming

Calculates the portion of geometry a that is intersecting geometry b.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Geometry a: geometry_a
  • Geometry b: geometry_b
geometry_ageometry_bOutput
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}{"type":"Polygon","coordinates":[[[0.5,0.0],[1.5,0.0],[1.5,1.0],[0.5,1.0],[0.5,0.0]]]}{"type":"Polygon","coordinates":[[[0.5,1.0],[1.0,1.0],[1.0,0.0],[0.5,0.0],[0.5,1.0]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}{"type":"Polygon","coordinates":[[[5.0,5.0],[5.0,6.0],[6.0,6.0],[6.0,5.0],[5.0,5.0]]]}{"type":"Polygon","coordinates":[[]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}{"type":"Polygon","coordinates":[[[1.0,0.0],[1.0,1.0],[2.0,1.0],[2.0,0.0],[1.0,0.0]]]}{"type":"LineString","coordinates":[[1.0,1.0],[1.0,0.0]]}
{"type":"Point","coordinates":[0.0,0.0]}{"type":"LineString","coordinates":[[0.0,0.0],[1.0,0.0]]}{"type":"Point","coordinates":[0.0,0.0]}
{"type":"LineString","coordinates":[[0.0,0.0],[1.0,0.0]]}{"type":"Polygon","coordinates":[[[2.0,0.0],[2.0,1.0],[3.0,1.0],[3.0,0.0],[2.0,0.0]]]}{"type":"LineString","coordinates":[]}

See more details here.


Geometry length

Supported in: Batch, Streaming

Get the length of the line strings and multi line strings in the geometry in meters. Uses a spherical approximation of the globe. Non-linear geometries (polygons and points) count as 0.

Expression categories: Geospatial

Output type: Double

Example

Argument values:

  • Expression: geometry
geometryOutput
{"type":"LineString","coordinates":[[-73.778128,40.641195],[-118.408535,33.941563]]}3974344.7433354934
{"type":"LineString","coordinates":[[0.0,0.0],[1.0,0.0],[1.0,1.0],[1.0,2.0]]}333585.2407005987
{"type":"MultiLineString","coordinates":[[[0.0,0.0],[1.0,0.0],[1.0,1.0]], [[1.0,2.0],[2.0,2.0]]]}333517.50194413937

See more details here.


Geometry rotate 2d

Supported in: Streaming

Applies a two dimensional clockwise rotation centered at the provided GeoPoint to the supplied geometry. This rotation occurs in the provided coordinate reference system and is then projected back to WGS84.

Expression categories: Geospatial

Output type: Geometry

See more details here.


Geometry set z-coordinate

Supported in: Batch, Streaming

Sets the z-coordinate of a geometry. If the geometry has an existing z-coordinate it will be overwritten.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Geometry: geometry
  • Z coordinate: zCoordinate
geometryzCoordinateOutput
{"type":"Point","coordinates":[1.0, 2.0]}1.0{"type":"Point","coordinates":[1.0, 2.0, 1.0]}
{"type":"Point","coordinates":[1.0, 2.0, 3.0]}1.0{"type":"Point","coordinates":[1.0, 2.0, 1.0]}

See more details here.


Geometry shortest distance

Supported in: Batch, Streaming

Given two valid geometries, calculates the shortest (great circle) distance in meters between them. Uses a spherical approximation of the globe. Overlapping geometries have a distance of zero.

Expression categories: Geospatial

Output type: Double

See more details here.


Geometry standardize

Supported in: Batch, Streaming

Given a valid geometry, standardizes it by enforcing the right-hand rule on the input, which is the convention for GeoJSON. This enables equality comparisons between equivalent geometries. This expression may reverse linestrings.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Expression: geometry
geometryOutput
{"type":"Polygon","coordinates":[[[32.26868,-26.53253],[32.26465,-26.45873],[32.25262,-26.38563],[32.26868,-26.53253]]]}{"type":"Polygon","coordinates":[[[32.25262, -26.38563],[32.26868, -26.53253],[32.26465, -26.45873],[32.25262, -26.38563]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,0.0]]]}{"type":"Polygon","coordinates":[[[0.0,0.0],[1.0,0.0],[0.0,1.0],[0.0,0.0]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,0.0]],[[0.25,0.25],[0.5,0.25],[0.25,0.5]]]}{"type":"Polygon","coordinates":[[[0.0,0.0],[1.0,0.0],[0.0,1.0],[0.0,0.0]], [[0.25,0.25],[0.25,0.5],[0.5,0.25],[0.25,0.25]]]}
{"coordinates": [[[20.0, 10.0], [27.0, 10.0], [27.0, 17.0], [20.0, 17.0], [20.0, 10.0]]], "type": "Polygon"}{"coordinates": [[[20.0, 10.0], [27.0, 10.0], [27.0, 17.0], [20.0, 17.0], [20.0, 10.0]]], "type": "Polygon"}
{"coordinates": [[[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]], [[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]]], "type":"MultiPolygon"}{"coordinates": [[[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"}
{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"}{"coordinates": [[5.0, 5.0],[-1.0, -1.0]], "type":"LineString"}
{"coordinates": [[[0.0, 0.0, 0.0], [10.0, 0.0, 0.0], [10.0, 10.0, 0.0], [10.0, 10.0, 10.0], [0.0, 10.0, 10.0], [0.0, 0.0, 10.0], [0.0, 0.0, 0.0]]], "type": "Polygon"}{"coordinates": [[[0.0, 0.0, 0.0], [10.0, 0.0, 0.0], [10.0, 10.0, 0.0], [10.0, 10.0, 10.0], [0.0, 10.0, 10.0], [0.0, 0.0, 10.0], [0.0, 0.0, 0.0]]], "type": "Polygon"}

See more details here.


Geometry symmetric difference

Supported in: Batch, Streaming

Calculates the portion that is in either geometry, but not in their intersection.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Geometry a: geometry_a
  • Geometry b: geometry_b
geometry_ageometry_bOutput
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[2.0,1.0],[2.0,0.0],[0.0,0.0]]]}{"type":"Polygon","coordinates":[[[1.0,0.0],[1.0,1.0],[3.0,1.0],[3.0,0.0],[1.0,0.0]]]}{"type":"MultiPolygon","coordinates":[[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]],[[[2.0,0.0],[2.0,1.0],[3.0,1.0],[3.0,0.0],[2.0,0.0]]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}{"type":"Polygon","coordinates":[[[0.0,0.0],[0.5,0.0],[0.5,1.0],[0.0,1.0],[0.0,0.0]]]}{"type":"Polygon","coordinates":[[[0.5,1.0],[1.0,1.0],[1.0,0.0],[0.5,0.0],[0.5,1.0]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}{"type":"Polygon","coordinates":[[[0.25,0.25],[0.5,0.25],[0.5,0.5],[0.25,0.5],[0.25,0.25]]]}{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]],[[0.25,0.25],[0.5,0.25],[0.5,0.5],[0.25,0.5],[0.25,0.25]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}{"type":"Polygon","coordinates":[[[5.0,5.0],[5.0,6.0],[6.0,6.0],[6.0,5.0],[5.0,5.0]]]}{"type":"MultiPolygon","coordinates":[[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]],[[[5.0,5.0],[5.0,6.0],[6.0,6.0],[6.0,5.0],[5.0,5.0]]]]}

See more details here.


Geometry translate expression

Supported in: Batch, Streaming

Applies a translation to a geometry. Two dimensional geometries are only converted to three dimensional geometries if a z offset is supplied.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Geometry column: geometry
  • Projected coordinate system: EPSG:4326
  • X offset: 1.0
  • Y offset: -1.0
  • Z offset: null
geometryOutput
{"type":"Point","coordinates":[0.0, 0.0]}{"type":"Point","coordinates":[1.0, -1.0]}
{"type":"LineString","coordinates":[[0.0, 0.0], [1.0, 1.0]]}{"type":"LineString","coordinates":[[1.0, -1.0], [2.0, 0.0]]}
{"type":"Polygon","coordinates":[[[0.0, 0.0],[1.0, 0.0],[1.0, 1.0],[0.0, 1.0], [0.0, 0.0]]]}{"type":"Polygon","coordinates":[[[1.0, -1.0],[2.0, -1.0],[2.0, 0.0],[1.0, 0.0],[1.0, -1.0]]]}

See more details here.


Geometry union

Supported in: Batch, Streaming

Combines the two geometries to create a single geometry.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Geometry a: geometry_a
  • Geometry b: geometry_b
geometry_ageometry_bOutput
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}{"type":"Polygon","coordinates":[[[0.5,0.0],[1.5,0.0],[1.5,1.0],[0.5,1.0],[0.5,0.0]]]}{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[0.5,1.0],[1.0,1.0],[1.5,1.0],[1.5,0.0],[1.0,0.0],[0.5,0.0],[0.0,0.0]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}{"type":"Polygon","coordinates":[[[5.0,5.0],[5.0,6.0],[6.0,6.0],[6.0,5.0],[5.0,5.0]]]}{"type":"MultiPolygon","coordinates":[[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]],[[[5.0,5.0],[5.0,6.0],[6.0,6.0],[6.0,5.0],[5.0,5.0]]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}{"type":"Polygon","coordinates":[[[1.0,0.0],[1.0,1.0],[2.0,1.0],[2.0,0.0],[1.0,0.0]]]}{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[2.0,1.0],[2.0,0.0],[1.0,0.0],[0.0,0.0]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]],[[0.25,0.25],[0.5,0.25],[0.5,0.5],[0.25,0.5],[0.25,0.25]]]}{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}
{"type":"LineString","coordinates":[[0.0,0.0],[1.0,0.0]]}{"type":"Polygon","coordinates":[[[1.0,0.0],[1.0,1.0],[2.0,1.0],[2.0,0.0],[1.0,0.0]]]}{"type":"GeometryCollection","geometries":[{"type":"LineString","coordinates":[[0.0,0.0],[1.0,0.0]]},{"type":"Polygon","coordinates":[[[1.0,0.0],[1.0,1.0],[2.0,1.0],[2.0,0.0],[1.0,0.0]]]}]}

See more details here.


Get H3 index

Supported in: Batch, Streaming

Convert GeoPoint to H3 index at given resolution. Returns null for resolution <0 or >15.

Expression categories: Geospatial

Output type: H3 Index

See more details here.


Get H3 indices covering a geometry

Supported in: Batch, Streaming

Convert geometry to H3 indices at a certain resolution. Resolution must be between 0 and 15, inclusive. For a polygon, three conversions are supported: a) H3 indices that fully cover the polygon, b) H3 indices that are fully contained by the polygon, c) H3 indices whose centroids are contained in the polygon. Returns null when the expected number of H3 indices exceed 7 million.

Expression categories: Geospatial

Output type: Array<H3 Index>

See more details here.


Get XZ curve index of an envelope

Supported in: Batch, Streaming

Encodes the envelope in an XZ curve.

Expression categories: Geospatial

Output type: Long

Example

Argument values:

  • Curve preset: LON_LAT_10KM
  • Envelope: envelope
envelopeOutput
{
 maxLat -> 2.0,
 maxLon -> 3.0,
 minLat -> 0.0,
 minLon -> 1.0,
}
16777222
{
 maxLat -> 2.0,
 maxLon -> 3.0,
 minLat -> null,
 minLon -> 1.0,
}
null

See more details here.


Get bearing from start point to end point

Supported in: Batch, Streaming

Calculates the absolute true bearing (clockwise angle relative to geographical north) from the first point to the second point in degrees using a spherical approximation of the earth.

Expression categories: Geospatial

Output type: Double

Example

Argument values:

  • Ending point: end_point
  • Starting point: start_point
start_pointend_pointOutput
{
latitude: 40.69325025929194,
longitude: -74.00522662934995,
}
{
latitude: 51.4988509390695,
longitude: -0.1238396067697046,
}
51.20964213763489

See more details here.


Get geometry envelope

Supported in: Batch, Streaming

Given a valid geometry or array of geometries, return a geometry representing the envelope of the input. The envelope is the smallest axis-aligned rectangular region containing the minimum and maximum x and y values of the geometry.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Expression: geometry
geometryOutput
{"type":"Polygon","coordinates":[[[0.0,0.0],[1.0,0.0],[0.0,1.0]]]}{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}

See more details here.


Get lat/long bounding box struct

Supported in: Batch, Streaming

Given a valid geometry or array of geometries, return a struct containing the bounds of the geometry or geometries.

Expression categories: Geospatial

Output type: LatLonBoundingBox

Example

Argument values:

  • Expression: geometry
geometryOutput
{"type":"Polygon","coordinates":[[[0.0,0.0],[1.0,0.0],[0.0,1.0]]]}{
 maxLat -> 1.0,
 maxLon -> 1.0,
 minLat -> 0.0,
 minLon -> 0.0,
}

See more details here.


Get neighbors of an H3 index

Supported in: Batch, Streaming

Get all neighbors of an H3 index.

Expression categories: Geospatial

Output type: Array<H3 Index>

See more details here.


Get struct field

Supported in: Batch, Streaming

Extracts a field from a struct.

Expression categories: Struct

Output type: AnyType

Example

Argument values:

  • Locator: airline.id
  • Struct: struct
structOutput
{
airline: {
id: NA,
},
}
NA
{
airline: {
id: FE,
},
}
FE

See more details here.


Get the convex hull of a geometry

Supported in: Batch, Streaming

Given a valid GeoJSON input string, return a GeoJSON string that is the convex hull for the geometry. The convex hull is the smallest convex polygon containing the geometry.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Expression: geometry
geometryOutput
{"type":"Polygon","coordinates":[[[0.0,0.0],[2.0,0.0],[2.0,1.0],[1.0,1.0],[1.0,2.0],[0.0,2.0],[0.0,0.0]]]}{"type":"Polygon", "coordinates":[[[0.0, 0.0], [0.0, 2.0], [1.0, 2.0], [2.0, 1.0], [2.0, 0.0], [0.0, 0.0]]]}
nullnull

See more details here.


Greater than

Supported in: Batch, Streaming

Returns true if left is greater than right.

Expression categories: Numeric

Output type: Boolean

Example

Argument values:

  • Left: a
  • Right: b
abOutput
10true
11false
01false

See more details here.


Greater than or equals

Supported in: Batch, Streaming

Returns true if left is greater than or equal to right.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Left: a
  • Right: b
abOutput
10true
11true
01false

See more details here.


Greatest

Supported in: Batch, Streaming

Computes the greatest value amongst all input columns, skipping null values.

Expression categories: Numeric

Type variable bounds: T accepts ComparableType

Output type: T

Example

Argument values:

  • Expressions: [a, b, c]
abcOutput
1233
1323
3213

See more details here.


Gzip decompress

Supported in: Batch, Streaming

Decompresses gzip-compressed binary into a string.

Expression categories: File

Output type: String

Example

Argument values:

  • Expression: gzip
gzipOutput
H4sIAAAAAAAA//NIzcnJ11Eozy/KSVEEAObG5usNAAAAHello, world!

See more details here.


H3 cell to children

Supported in: Batch, Streaming

Get children of an H3 index at given resolution specifying children coarseness. Returns null for resolution <0 or >15 or for children resolution lower than given H3 index's resolution.

Expression categories: Geospatial

Output type: Array<H3 Index>

See more details here.


H3 cell to parent

Supported in: Batch, Streaming

Get parent of an H3 index at given resolution specifying parent coarseness. Returns null for resolution <0 or >15 or resolution higher than given index.

Expression categories: Geospatial

Output type: H3 Index

See more details here.


H3 to geometry

Supported in: Batch, Streaming

Convert H3 index to polygon.

Expression categories: Geospatial

Output type: Geometry

See more details here.


Hash sha256

Supported in: Batch, Streaming

Hashes the input using sha256 hashing algorithm.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: Hello World!

Output: 7f83b1657ff1fc53b92dc18148a1d65dfc2d4b1fa3d677284addd200126d9069

See more details here.


Interpolate geo point along linestring

Supported in: Batch, Streaming

Returns a point interpolated along a line. Implementation interprets lines as the shortest path, using a spherical approximation of the globe.

Expression categories: Geospatial

Output type: GeoPoint

Example

Argument values:

  • Fraction: fraction
  • Linestring: linestring
linestringfractionOutput
{"type":"LineString","coordinates":[[0.0,2.0],[30.0,0.0]]}0.5{
latitude: 1.0352686301676643,
longitude: 15.004677545504547,
}
{"type":"LineString","coordinates":[[30.0,2.0],[50.0,3.0]]}0.8{
latitude: 2.8256098405656185,
longitude: 45.99752305664789,
}
{"type":"LineString","coordinates":[[45.0,9.0],[90.0,4.0]]}0.2{
latitude: 8.363732883448177,
longitude: 54.073497456494955,
}

See more details here.


Is NaN

Supported in: Batch, Streaming

Returns true if the input is nan, false otherwise.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Expression: NaN

Output: true

See more details here.


Is empty struct

Supported in: Batch, Streaming

Returns true if the input is an empty struct, with recursive checking of inner arrays and structs.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Expression: struct
structOutput
{
airline: {
id: null,
name: null,
},
tail_no: null,
}
true
{
airline: {
id: NA,
name: null,
},
tail_no: null,
}
false

See more details here.


Is in

Supported in: Batch, Streaming

Returns true if the list contains the value.

Expression categories: Boolean

Type variable bounds: T accepts ComparableType

Output type: Boolean

Example

Description: You can check if the list contains the value. Argument values:

  • Contains: [AWE-112, BRR-123]
  • Value: value
valueOutput
BRR-123true
ABC-543false

See more details here.


Is not null

Supported in: Batch, Streaming

Returns true if the input is not null, can optionally treat empty strings as null.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Expression: hello
  • Treat empty strings as null: null

Output: true

See more details here.


Is null

Supported in: Batch, Streaming

Returns true if the input is null, can optionally treat empty strings as null.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Expression: null
  • Treat empty strings as null: null

Output: true

See more details here.


Is valid GeoJSON

Supported in: Batch, Streaming

Returns true if the input is a valid GeoJSON input string. Not all GeoJSON strings are indexable by the ontology; use the "normalize geometry" expression to prepare geometry prior to Ontology use.

Expression categories: Geospatial

Output type: Boolean

Example

Argument values:

  • Expression: geoJson
geoJsonOutput
{"type":"Point","coordinates":[3.0, 5.0, 2.0]}true
{"type":"Polygon","coordinates":[[[0.0,0.0],[1.0,0.0],[0.0,1.0],[0.0,0.0]]]}true
{"type":"LineString","coordinates":[[0.0,0.0],[1.0,0.0]]}true
not a GeoJSON stringfalse

See more details here.


Is valid Geohash

Supported in: Batch, Streaming

Returns true if the input is a valid Geohash input string.

Expression categories: Geospatial

Output type: Boolean

Example

Argument values:

  • Expression: geohash
geohashOutput
sk4dtrue
dt9zy9cg36j7true
not a Geohash stringfalse
nullfalse

See more details here.


Is valid H3 index

Supported in: Batch, Streaming

Returns true if the input is a valid H3 index string.

Expression categories: Geospatial

Output type: Boolean

Example

Argument values:

  • Expression: h3
h3Output
862a1072ffffffftrue
not an h3 valuefalse

See more details here.


Is valid MGRS

Supported in: Batch, Streaming

Returns true if the input is a valid MGRS (military grid reference system) string.

Expression categories: Geospatial

Output type: Boolean

Example

Argument values:

  • Expression: mgrs
mgrsOutput
4Q FJ 1 6true
4Q FJ 12345 67890true

See more details here.


Is valid MIME type

Supported in: Batch, Streaming

Returns true if the input is a valid MIME type.

Expression categories: Boolean, Other

Output type: Boolean

See more details here.


Is valid Ontology GeoPoint

Supported in: Batch, Streaming

Returns true if the input is a valid Ontology GeoPoint. Ontology GeoPoints are strings of the format '{lat},{lon}', where -90 <= lat <= 90 and -180 <= lon <= 180.

Expression categories: Geospatial

Output type: Boolean

Example

Argument values:

  • Expression: geopoint
geopointOutput
-35.307428203,149.122686883true
149.122686883,-35.307428203false
10.0, 20.0true
10.0, 20.0true
not a GeoPointfalse
nullfalse
(10.0,20.0)false

See more details here.


Is valid delegated media gid

Supported in: Batch, Streaming

Returns true if the input is a valid gotham delegated media gid. Check gotham's delegated media rtfm for more details.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Expression: ri.gotham-delegated-media.12345678-1234-1234-1234-123456789012.testaudiotype.testlocator

Output: true

See more details here.


Is valid media reference

Supported in: Batch, Streaming

Returns true if the input is a valid Foundry media reference.

Expression categories: Boolean

Output type: Boolean

See more details here.


Is valid rid

Supported in: Batch, Streaming

Returns true if the input is a valid Foundry resource identifier.

Expression categories: Boolean

Output type: Boolean

See more details here.


Is valid uuid

Supported in: Batch, Streaming

Returns true if the input is a valid uuid.

Expression categories: Boolean

Output type: Boolean

See more details here.


Join array

Supported in: Batch, Streaming

Joins array with specified separator.

Expression categories: Array

Output type: String

Example

Argument values:

  • Array to join: [ hello, world ]
  • Separator: -

Output: hello-world

See more details here.


Last day of the week/month/quarter/year

Supported in: Batch

Returns the last day of the week/month/quarter/year.

Expression categories: Datetime

Output type: Date

See more details here.


Least

Supported in: Batch, Streaming

Computes the least value amongst all input columns, skipping null values.

Expression categories: Boolean, Numeric

Type variable bounds: T accepts ComparableType

Output type: T

Example

Argument values:

  • Expressions: [a, b, c]
abcOutput
1231
1321
3211

See more details here.


Left of string

Supported in: Batch, Streaming

Extract left hand side of a string based on index.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: Hello world!
  • Length: 5

Output: Hello

See more details here.


Left pad string

Supported in: Batch, Streaming

Left-pad the string column to width of length with pad.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: Hello world!
  • Length: 15
  • Pad: *

Output: ***Hello world!

See more details here.


Length

Supported in: Batch, Streaming

Returns the length of each value in a string column or an array column.

Expression categories: Array, Numeric

Output type: Integer

Example

Argument values:

  • Expression: string
stringOutput
hello5
bye3

See more details here.


Less than

Supported in: Batch, Streaming

Returns true if left is less than right.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Left: left
  • Right: right
leftrightOutput
1.010true
10.01false

See more details here.


Less than or equals

Supported in: Batch, Streaming

Returns true if left is less than or equal to right.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Left: a
  • Right: b
abOutput
10false
11true
01true

See more details here.


Levenshtein distance

Supported in: Batch, Streaming

Compute the levenshtein distance between two strings.

Expression categories: Distance measurement, String

Output type: Integer

Example

Argument values:

  • Ignore case: false
  • Left: left
  • Right: right
leftrightOutput
hellohello0
hallohello1
hellohEllO2
hellohello, world!8
hellofarewell6

See more details here.


Logarithm

Supported in: Batch, Streaming

Calculates the natural logarithm, ln(x), of a column.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Expression: 10.123

Output: 2.3148100626166146

See more details here.


Logarithm with base

Supported in: Batch, Streaming

Calculates logarithm with a given base.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Base: 2.0
  • Expression: 8

Output: 3.0

See more details here.


Logical type cast

Supported in: Batch, Streaming

Cast expression to given logical type. Unlike the regular cast expression, this expression will not change the underlying base representation of the data, but rather enforce the constraints associated with the specified logical type, so that the output can be used as the input to downstream expressions which specifically demand an instance of that logical type.

Expression categories: Cast

Type variable bounds: C accepts AnyType

Output type: C

Example

Description: Successful cast to natural number Argument values:

  • Expression: 1234
  • Logical type: Natural number
  • Default value: null

Output: 1234

See more details here.


Lowercase

Supported in: Batch, Streaming

Converts all characters in string to lowercase.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: Hello World

Output: hello world

See more details here.


Map values

Supported in: Batch, Streaming

Map a set of values in a column to new values.

Expression categories: Data preparation

Type variable bounds: T1 accepts ComparableType**T2 accepts AnyType

Output type: T2

Example

Argument values:

  • Default: null
  • Input: country
  • Value map: {
     Denmark -> DNK,
     United Kingdom -> UK,
    }
countryOutput
United KingdomUK
DenmarkDNK
United States of Americanull

See more details here.


Modulo

Supported in: Batch, Streaming

Returns modulus of an expression.

Expression categories: Numeric

Output type: DefiniteNumeric

Example

Argument values:

  • Denominator: 4
  • Numerator: 10.123

Output: 2.123

See more details here.


Multiply numbers

Supported in: Batch, Streaming

Calculates the product of all input columns.

Expression categories: Numeric

Output type: Numeric

Example

Argument values:

  • Expressions: [col_a, col_b, col_c]
col_acol_bcol_cOutput
102360

See more details here.


Negate

Supported in: Batch, Streaming

Expression categories: Numeric

Output type: Numeric

See more details here.


Normal random number

Supported in: Batch, Streaming

Returns a column of normally distributed random numbers with zero mean and unit variance. This is not deterministic and will not produce the same result on repeated builds, even when using a seed.

Expression categories: Numeric

Output type: Double

See more details here.


Not

Supported in: Batch, Streaming

Returns the negated boolean value of a boolean expression.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Expression: boolean
booleanOutput
truefalse
falsetrue

See more details here.


Nth chain in polygon

Supported in: Batch, Streaming

Returns the nth ring in a single polygon in the geometry. Indexing is 1-based, and an index of 0 is out-of-bounds. An index equal to 1 returns an external ring. An index greater than 1 returns an internal ring. Returns null for any of the following conditions: geometry isn't a single polygon, a feature collection or geometry collection is provided, index is out-of-bounds, or at least one argument is null.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • N: n
  • Polygon: polygon
polygonnOutput
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"}1{"coordinates": [[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]], "type": "LineString"}
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"}2null
{"coordinates":[[[60.0,60.0],[50.0,60.0],[50.0,50.0],[60.0,50.0],[60.0,60.0]],[[57.0,57.0],[55.0,52.0],[52.0,52.0],[50.0,57.0],[57.0,57.0]]],"type":"Polygon"}1{"coordinates": [[60.0,60.0],[50.0,60.0],[50.0,50.0],[60.0,50.0],[60.0,60.0]], "type": "LineString"}
{"coordinates":[[[60.0,60.0],[50.0,60.0],[50.0,50.0],[60.0,50.0],[60.0,60.0]],[[57.0,57.0],[55.0,52.0],[52.0,52.0],[50.0,57.0],[57.0,57.0]]],"type":"Polygon"}2{"coordinates": [[57.0,57.0],[55.0,52.0],[52.0,52.0],[50.0,57.0],[57.0,57.0]], "type": "LineString"}

See more details here.


Nth point in linestring

Supported in: Batch, Streaming

Returns the nth point in a single linestring in the geometry. Indexing is 1-based, and an index of 0 is out-of-bounds. A negative index is counted backwards from the end of the linestring, so that -1 is the last point. Returns null for any of the following conditions: geometry isn't a single linestring, a feature collection or geometry collection is provided, index is out-of-bounds, or at least one argument is null.

Expression categories: Geospatial

Output type: GeoPoint

Example

Argument values:

  • Linestring: linestring
  • N: n
linestringnOutput
{"type":"LineString","coordinates":[[30.0,2.0],[35.0,0.0],[50.0,3.0]]}1{
latitude: 2.0,
longitude: 30.0,
}
{"type":"LineString","coordinates":[[30.0,2.0],[35.0,0.0],[50.0,3.0]]}3{
latitude: 3.0,
longitude: 50.0,
}
{"type":"LineString","coordinates":[[45.0,9.0],[90.0,4.0],[40.0,0.0]]}-1{
latitude: 0.0,
longitude: 40.0,
}

See more details here.


Nullify empty string

Supported in: Batch, Streaming

Convert empty strings to null.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: empty string

Output: null

See more details here.


Or

Supported in: Batch, Streaming

Returns true if any of the specified conditions are true. Nulls are considered false.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Conditions: [left_boolean, right_boolean]
left_booleanright_booleanOutput
truetruetrue
truefalsetrue
falsetruetrue
falsefalsefalse

See more details here.


PDF table of contents

Supported in: Batch

Expression categories: Media

Output type: Array<Struct<level, title, page>>

Example

Argument values:

  • Media reference: Media Reference
Media ReferenceOutput
{"mimeType":"application/pdf","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}}[ {
level: 0,
page: 2,
title: Chapter 1,
}, {
 **l...

See more details here.


Parse GeoJSON from a non-WGS 84 coordinate system

Supported in: Batch, Streaming

Convert GeoJSON string from a non-WGS 84 coordinate system to WGS 84 geometry. For GeoJSON already in WGS 84 (longitude, latitude), the "logical type cast" expression can convert directly with less overhead. Returns null for strings that fail during parsing or conversion.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • GeoJSON string: geojson_string
  • Source coordinate system: EPSG:32618
geojson_stringOutput
{"type":"Point","coordinates":[320000.0,4300000.0]}{"type":"Point","coordinates":[-77.07368071728229,38.83040844313318]}
{"type":"LineString","coordinates":[[320000.0,4300000.0],[320100.0,4300000.0]]}{"type":"LineString","coordinates":[[-77.07368071728229,38.83040844313318],[-77.0725293738795,38.83042888342659]]}
{"type":"Polygon","coordinates":[[[320000.0,4300000.0],[320100.0,4300000.0],[320000.0,4300100.0],[320000.0,4300000.0]]]}{"type":"Polygon","coordinates":[[[-77.07368071728229,38.83040844313318],[-77.0725293738795,38.83042888342659],[-77.07370685720375,38.83130901341597],[-77.07368071728229,38.83040844313318]]]}

See more details here.


Parse XML as schema

Supported in: Batch, Streaming

Parses xml strings following the given schema definition, ignoring any fields not in the schema.

Expression categories: File, Struct

Output type: Struct

Example

Argument values:

  • Schema: Struct<id, airport<id, miles>>
  • Xml: xml
  • Attribute prefix: null
  • Ignore namespace: null
  • Value tag: null
xmlOutput
<airline>
 <id>XB-112</id>
 <airport>
  <id>JFK</id>
  <miles>2000</miles>
 </airport>
</airline>
{
airport: {
id: JFK,
miles: 2000,
},
id: XB-112,
}

See more details here.


Parse classification string

Supported in: Batch

Returns the markings parsed from a given classification string. This output is formatted as a struct, where the first element of the struct is the list of strings of relevant markings. This list is null if the classification string is invalid. The second element of the struct is the string of error message(s). This string is null if there are no such messages (if the classification string is valid). Returns null if the classification string is null.

Expression categories: Other

Output type: Struct<groupNames<String>, errors>

See more details here.


Parse duration

Supported in: Batch

Parses an ISO8601 string duration and start time to its length in a specific time unit.

Expression categories: Datetime, String

Output type: Long

Example

Argument values:

  • Duration: PT1M30.5S
  • Start time: 2022-10-01T09:00:00Z
  • Unit: SECONDS

Output: 90

See more details here.


Parse json as struct

Supported in: Batch, Streaming

Parses json strings following the given schema definition, ignoring any fields not in the schema.

Expression categories: Data preparation, File, Popular, Struct

Output type: Array<AnyType> | Map<String, String> | Struct

Example

Argument values:

  • Json: json
  • Schema: Struct<airline, airport<id, miles>>
jsonOutput
{
 "airline": "XB-112",
 "airport": {
  "id": "JFK",
  "miles": 2000
 }
}
{
airline: XB-112,
airport: {
id: JFK,
miles: 2000,
},
}

See more details here.


Parse phone number

Supported in: Batch, Streaming

Normalizes phone numbers to a common format, parsing them from various regions and formats. Phone numbers containing the + sign followed by the region code will be parsed correctly even if the region is not set. All other number formats require a region to be selected from the options provided in order for them to be correctly parsed. Phone numbers that cannot be parsed will result in nulls.

Expression categories: String

Output type: Phone Number

Example

Description: Return formatted US phone number Argument values:

  • Expression: phoneNumber
  • Format: E164
  • Region: US
phoneNumberOutput
(234) 235-5678+12342355678
+1 415 5552671+14155552671
(415) 5552671+14155552671
Whatsapp@14155552671+14155552671

See more details here.


Parse well known binary as geometry

Supported in: Batch, Streaming

Converts well-known binary (WKB) to geometry logical type. Invalid WKB input will be returned as null. Optionally supply a source coordinate system identifier to convert from the source coordinate system to WGS 84 if the WKB is not in WGS 84 already.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Expression: wkb
  • Source coordinate system: null
wkbOutput
AAAAAAFACAAAAAAAAEAUAAAAAAAA{"type":"Point","coordinates":[3.0, 5.0]}
AIAAAAFACAAAAAAAAEAUAAAAAAAAQAAAAAAAAAA={"type":"Point","coordinates":[3.0, 5.0, 2.0]}
AAAAAAMAAAABAAAABAAAAAAAAAAAAAAAAAAAAAA/8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA/8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA={"type":"Polygon","coordinates":[[[0.0,0.0],[1.0,0.0],[0.0,1.0],[0.0,0.0]]]}
AAAAAAIAAAACAAAAAAAAAAAAAAAAAAAAAD/wAAAAAAAAAAAAAAAAAAA={"type":"LineString","coordinates":[[0.0,0.0],[1.0,0.0]]}

See more details here.


Parse well known text as geometry

Supported in: Batch, Streaming

Converts well-known text (WKT) string to geometry logical type. Invalid WKT input will be returned as null. Optionally supply a source coordinate system identifier to convert from the source coordinate system to WGS 84 if the WKT is not in WGS 84 already.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Expression: wkt
  • Source coordinate system: null
wktOutput
POINT (3.0 5.0 2.0){"type":"Point","coordinates":[3.0, 5.0, 2.0]}
POLYGON ((0.0 0.0, 1.0 0.0, 0.0 1.0, 0.0 0.0)){"type":"Polygon","coordinates":[[[0.0,0.0],[1.0,0.0],[0.0,1.0],[0.0,0.0]]]}
LINESTRING (0.0 0.0, 1.0 0.0){"type":"LineString","coordinates":[[0.0,0.0],[1.0,0.0]]}

See more details here.


Perimeter

Supported in: Batch, Streaming

Calculates perimeter of a geometry in meters using a spherical approximation of the globe. For a line string or a point, this equals 0.

Expression categories: Geospatial

Output type: Double

See more details here.


Positive modulo

Supported in: Batch

Returns positive modulus of an expression.

Expression categories: Numeric

Type variable bounds: T1 accepts Byte | Integer | Long | Short**T2 accepts Byte | Integer | Long | Short

Output type: T1

Example

Argument values:

  • Denominator: 3
  • Numerator: 10

Output: 1

See more details here.


Power of

Supported in: Batch, Streaming

Calculates power of expression to exponent. If any of the values is null, returns null.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Exponent: 3
  • Expression: 10

Output: 1000.0

See more details here.


Prepare geometry

Supported in: Batch, Streaming

Prepares a geometry for downstream use, for example indexing to the ontology, by converting a geometry string into valid GeoJSON. Polygons will be closed and deduplicated. Geometries which cross the anti-meridian (as indicated by width > 180 degrees) will be split into multiple features on each side of the anti-meridian. Outputs null if the input string cannot be read as GeoJSON or if the geometry contains out-of-bounds coordinates.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Expression: geometry
geometryOutput
{"type":"Polygon","coordinates":[[[0.0,0.0],[1.0,0.0], [0.0,1.0], [0.0,0.0]]]}{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,0.0],[0.0,0.0]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0,1.0],[1.0,0.0,1.0], [0.0,1.0,1.0],[0.0,0.0,1.0]]]}{"type":"Polygon","coordinates":[[[0.0,0.0,1.0],[0.0,1.0,1.0],[1.0,0.0,1.0],[0.0,0.0,1.0]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[1.0,0.0], [0.0,1.0]]]}{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,0.0],[0.0,0.0]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[1.0,0.0], [1.0,0.0], [0.0,1.0], [0.0,0.0]]]}{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,0.0],[0.0,0.0]]]}
{"type":"Polygon","coordinates":[[[179.0,-30.0],[-179.0,-30.0],[-179.0,30.0],[179.0,30.0],[179.0,-30]]]}{"type":"MultiPolygon","coordinates":[[[[-180.0,-30.0],[-180.0,30.0],[-179.0,30.0],[-179.0,-30.0],[-180.0,-30.0]]],[[[180.0,30.0],[180.0,-30.0],[179.0,-30.0],[179.0,30.0],[180.0,30.0]]]]}
{"type":"LineString","coordinates":[[179.0,30.0],[-179.0,30.0]]}{"type":"MultiLineString","coordinates":[[[179.0,30.0],[180.0,30.0]],[[-180.0,30.0],[-179.0,30.0]]]}
{"type":"GeometryCollection","geometries":[{"type":"LineString","coordinates":[[40.0,10.0],[0.0,1.0]...{"type":"GeometryCollection","geometries":[{"type":"LineString","coordinates":[[40.0,10.0],[0.0,1.0]...
{"type":"GeometryCollection","geometries":[{"type":"Point","coordinates":[1.0,0.0]},{"type":"LineString","coordinates":[[179.0,30.0],[-179.0,30.0]]}]}{"type":"GeometryCollection","geometries":[{"type":"Point","coordinates":[1.0,0.0]},{"type":"MultiLineString","coordinates":[[[179.0,30.0],[180.0,30.0]],[[-180.0,30.0],[-179.0,30.0]]]}]}

See more details here.


Reduce array elements

Supported in: Batch, Streaming

Reduces array elements using an expression.

Expression categories: Array

Type variable bounds: T accepts Array<Boolean | Byte | Date | Double | Float | Integer | Long | Map<AnyType, AnyType> | Short | String | Timestamp> | Boolean | Byte | Date | Double | Float | Integer | Long | Map<AnyType, AnyType> | Short | String | Timestamp

Output type: T

Example

Argument values:

  • Array: miles
  • Expression to reduce:
    add(
     expressions: [accumulator, element],
    )
  • Initial value: 0
milesOutput
[ 12300, 12342 ]24642

See more details here.


Regex extract

Supported in: Batch, Streaming

Extracts the specified group from a regex. Returns empty string when no match is found.

Expression categories: Regex, String

Output type: String

Example

Description: Extract the first two initials from the first match. Argument values:

  • Expression: MT-112, XB-967
  • Group: 1
  • Pattern: (\w\w)(-)

Output: MT

See more details here.


Regex find

Supported in: Batch, Streaming

Matches an expression against a regular expression. Regular expression can match any part of the string.

Expression categories: Regex, String

Output type: Boolean

Example

Description: You can find regex patterns. Argument values:

  • Expression: abcdefg
  • Regex: abc?d

Output: true

See more details here.


Regex index

Supported in: Batch

Returns an array of indices at which the regular expression pattern is found in the given expression.

Expression categories: Regex, String

Output type: Array<Integer>

Example

Description: You can find regex patterns and their indices. Argument values:

  • Expression: ababab
  • Regex: ab

Output: [ 0, 2, 4 ]

See more details here.


Regex match

Supported in: Batch, Streaming

Matches an expression against a regular expression. Regular expression must match the whole string.

Expression categories: Regex, String

Output type: Boolean

Example

Description: You can match regex patterns Argument values:

  • Expression: abcdefg
  • Regex: abc?d.+

Output: true

See more details here.


Regex replace

Supported in: Batch, Streaming

Replace a string using a regex pattern.

Expression categories: Regex, String

Output type: String

Example

Argument values:

  • Expression: tail_number
  • Pattern: (\w\w)(-)
  • Replace: **-
tail_numberOutput
MT-123**-123
XB-434**-434
MT-123, XB-434**-123, **-434

See more details here.


Rename struct field

Supported in: Batch, Streaming

Rename fields within a struct.

Expression categories: Data preparation, Struct

Output type: Struct

Example

Argument values:

  • Expression: struct
  • Renames: [(airline.id, identifier)]
structOutput
{
airline: {
id: NA,
},
}
{
airline: {
identifier: NA,
},
}
{
airline: {
id: FE,
},
}
{
airline: {
identifier: FE,
},
}

See more details here.


Right of string

Supported in: Batch, Streaming

Extract right hand side of a string based on index.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: Hello world!
  • Length: 6

Output: world!

See more details here.


Right pad string

Supported in: Batch, Streaming

Right-pad the string column to width of length with pad. If the length of the string is greater than the length provided, it will be trimmed.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: Hello world!
  • Length: 15
  • Pad: *

Output: Hello world!***

See more details here.


Round number

Supported in: Batch, Streaming

Round number to 'scale' decimal places.

Expression categories: Numeric

Output type: Decimal | Double | Float

Example

Argument values:

  • Column: 10.123
  • Scale: 2

Output: 10.12

See more details here.


Secant

Supported in: Batch, Streaming

Takes the secant of an angle.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Angle unit: degrees
  • Angle value: angle
angleOutput
0.01.0
90.01.633123935319537E16
180.0-1.0

See more details here.


Sentence case

Supported in: Batch, Streaming

Converts the first character of the first word to be uppercase.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: hello world

Output: Hello world

See more details here.


Sequence

Supported in: Batch, Streaming

Creates an array with numbers in range from start to end.

Expression categories: Array

Type variable bounds: T accepts Byte | Integer | Long | Short

Output type: Array<T>

Example

Description: Sequences increase by 1 unless otherwise specified. Argument values:

  • End: 10
  • Start: 0
  • Step size: null

Output: [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ]

See more details here.


Simplify geometry

Supported in: Batch, Streaming

This expression simplifies GeoJSON geometry by removing points within the given tolerance distance using a spherical model of the globe. Loops smaller than the tolerance may be removed entirely.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Geometry: Geometry
  • Tolerance: Tolerance
  • Coordinate precision: null
GeometryToleranceOutput
{"type":"LineString","coordinates":[[30.0,0.0],[35.0,0.0],[40.0,0.0]]}1000{"type":"LineString","coordinates":[[30.0,0.0],[40.0,0.0]]}
{"type":"Polygon","coordinates":[[[-1.0,-1.0],[1.0,-1.0],[1.0,1.0],[0.0,1.0],[-1.0,1.0],[-1.0,-1.0]]]}12000{"type":"Polygon","coordinates":[[[-1.0,1.0],[1.0,1.0],[1.0,-1.0],[-1.0,-1.0],[-1.0,1.0]]]}
{"type":"MultiLineString","coordinates":[[[0.0,0.0],[5.0,0.1],[10.0,0.0]], [[0.0,-5.0],[5.0,0.1],[10.0,5.0]]]}12000{"type":"MultiLineString","coordinates":[[[0.0,0.0],[10.0,0.0]],[[0.0,-5.0],[10.0,5.0]]]}
{"type":"MultiPolygon","coordinates":[[[[-2.0,-2.0],[2.0,-2.0],[2.0,2.0],[0.0,2.1],[-2.0,2.0],[-2.0,...12000{"type":"MultiPolygon","coordinates":[[[[-2.0,2.0],[2.0,2.0],[2.0,-2.0],[-2.0,-2.0],[-2.0,2.0]], [[1...

See more details here.


Sine

Supported in: Batch, Streaming

Takes the sine of an angle.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Angle unit: degrees
  • Angle value: angle
angleOutput
0.00.0
90.01.0
180.00.0

See more details here.


Skip bytes

Supported in: Batch, Streaming

Skip a given number of bytes in a binary column.

Expression categories: Binary

Output type: Binary

Example

Argument values:

  • Bytes: aGk=
  • Number of bytes to skip: 1

Output: aQ==

See more details here.


Slice array

Supported in: Batch, Streaming

Returns the array sliced from the first position to the second position. First position must be 1 or higher. If second position is longer than the array, the entire rest of the array will be returned.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array<T>

See more details here.


Soundex

Supported in: Batch

Compute the soundex encoding (a phonetic representation) for a word.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: input_string
input_stringOutput
catC300
caatC300
twoT000
tooT000
toT000
fourF600
forF600
foreF600
furF600
meowM000
me owM000

See more details here.


Split string

Supported in: Batch, Streaming

Split string on specified regex pattern.

Expression categories: String

Output type: Array<String>

Example

Argument values:

  • Expression: string
  • Pattern:
  • Limit: 2
stringOutput
hello[ hello ]
hello world[ hello, world ]
hello there world[ hello, there world ]

See more details here.


Square root

Supported in: Batch, Streaming

Calculates the square root of a column.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Expression: 9.0

Output: 3.0

See more details here.


Starts with

Supported in: Batch, Streaming

Expression categories: Boolean, String

Output type: Boolean

Example

Argument values:

  • Expression: Hello world
  • Ignore case: true
  • Value: hello

Output: true

See more details here.


String after delimiter

Supported in: Batch, Streaming

Extract the string after the first delimiter. Return full string if no matches are found.

Expression categories: String

Output type: String

Example

Argument values:

  • Delimiter: hello
  • Expression: ... Hello world
  • Ignore case: true

Output: world

See more details here.


String before delimiter

Supported in: Batch, Streaming

Extract the string before the first delimiter. Return the full string if no matches are found.

Expression categories: String

Output type: String

Example

Argument values:

  • Delimiter: hello
  • Expression: ... Hello world
  • Ignore case: true

Output: ...

See more details here.


String contains

Supported in: Batch, Streaming

Expression categories: Boolean, String

Output type: Boolean

Example

Argument values:

  • Expression: ... Hello world
  • Ignore case: true
  • Value: hello

Output: true

See more details here.


Substring

Supported in: Batch, Streaming

Extract substring.

Expression categories: Numeric

Output type: String

Example

Argument values:

  • Expression: string
  • Start: start
  • Length: length
stringstartlengthOutput
hello, world15hello
hello, world85world
hello, world-55world

See more details here.


Subtract multiple expressions

Supported in: Batch, Streaming

Calculates the difference between a number and all input columns.

Expression categories: Numeric

Output type: Numeric

Example

Argument values:

  • Expressions list: [col_b, col_c]
  • Value to be subtracted: col_a
col_acol_bcol_cOutput
5320
240-2
-2-4-24

See more details here.


Subtract numbers

Supported in: Batch, Streaming

Subtract one number from another number.

Expression categories: Numeric

Output type: Numeric

Example

Argument values:

  • Left: col_a
  • Right: col_b
col_acol_bOutput
32428
-5-3-2

See more details here.


Subtract timestamp/date

Supported in: Batch, Streaming

Returns the difference in the given time unit.

Expression categories: Datetime

Output type: Long

Example

Argument values:

  • End: 2022-10-01T10:00:00Z
  • Start: 2022-10-01T09:00:00Z
  • Unit: HOURS

Output: 1

See more details here.


Subtract value from date

Supported in: Batch, Streaming

Returns the date that is 'value' days/weeks/months/quarter/years before 'start'.

Expression categories: Datetime

Output type: Date

Example

Argument values:

  • Date: 2022-04-05
  • Unit: DAYS
  • Value: 2

Output: 2022-04-03

See more details here.


Sum of array elements

Supported in: Batch, Streaming

Sums the elements contained within the array.

Expression categories: Array

Type variable bounds: T accepts DefiniteNumeric

Output type: T

Example

Argument values:

  • Expression: [ 1, 2, 3 ]
  • Treat null as zero.: true

Output: 6

See more details here.


Tangent

Supported in: Batch, Streaming

Takes the tangent of an angle.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Angle unit: degrees
  • Angle value: angle
angleOutput
0.00.0
90.01.633123935319537E16
180.00.0

See more details here.


Text segmentation

Supported in: Batch, Streaming

Extract a series of text segments using sliding window segmentation.

Expression categories: String

Output type: Array<String>

See more details here.


Text to embeddings

Supported in: Batch

Converts text into embeddings.

Expression categories: String

Output type: Embedded vector

Example

Description: Example embeddings for the word 'palantir'. Argument values:

  • Model:
    ada002Embedding(

    )
  • Text column: text
  • Output mode: null
textOutput
palantir[ -0.019182289, -0.02127992, 0.009529043, -0.008066221, -0.0014429842, 0.019154688, -0.023556953, -0...

See more details here.


Timestamp add

Supported in: Batch, Streaming

Add value to timestamp in specified unit.

Expression categories: Datetime

Output type: Timestamp

Example

Argument values:

  • Timestamp: 2022-02-01T00:00:00Z
  • Unit: MILLISECONDS
  • Value to add: 2

Output: 2022-02-01T00:00:00.002Z

See more details here.


Timestamp sequence

Supported in: Batch

Creates an array with timestamps in range from start to end.

Expression categories: Datetime

Output type: Array<Timestamp>

Example

Argument values:

  • End time: end_time
  • Start time: start_time
  • Step unit: DAYS
  • Step size: 1.0
start_timeend_timeOutput
2023-01-01T00:00:00Z2023-01-03T00:00:00Z[ 2023-01-01T00:00:00Z, 2023-01-02T00:00:00Z, 2023-01-03T00:00:00Z ]
2023-01-01T01:50:00Z2023-01-03T00:00:00Z[ 2023-01-01T01:50:00Z, 2023-01-02T01:50:00Z ]

See more details here.


Timestamp subtract

Supported in: Batch, Streaming

Subtract value from timestamp in specified unit.

Expression categories: Datetime

Output type: Timestamp

Example

Argument values:

  • Timestamp: 2022-02-02T00:00:00Z
  • Unit: MILLISECONDS
  • Value to subtract: 2

Output: 2022-02-01T23:59:59.998Z

See more details here.


Timestamp to epoch millis

Supported in: Batch, Streaming

Converts from timestamp in UTC to epoch milliseconds.

Expression categories: Cast, Datetime

Output type: Long

Example

Argument values:

  • Timestamp: 2022-10-01T09:00:00Z

Output: 1664614800000

See more details here.


Timestamp to epoch seconds

Supported in: Batch, Streaming

Converts from timestamp in UTC to epoch seconds.

Expression categories: Cast, Datetime

Output type: Long

Example

Argument values:

  • Timestamp: 2022-10-01T09:01:13.47Z

Output: 1664614873

See more details here.


Title case

Supported in: Batch, Streaming

Converts the first character of each word to be uppercase and the rest lowercase.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: hello world

Output: Hello World

See more details here.


Transcribe audio into json using cpu

Supported in: Batch

Transcribe audio files into json using cpu.

Expression categories: Media

Output type: String

See more details here.


Transcribe audio into json using gpu

Supported in: Batch

Transcribe audio files into json using gpu.

Expression categories: Media

Output type: String

See more details here.


Transcribe audio into text

Supported in: Batch

Transcribe audio files into text.

Expression categories: Media

Output type: String | Struct<ok, error>

See more details here.


Transform array element

Supported in: Batch, Streaming

Maps each element of an array using an expression. Note, array index starts at 1.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array<T>

Example

Argument values:

  • Array: flight_number
  • Expression to apply.:
    stringBeforeDelimiter(
     delimiter: -,
     expression: element,
     ignoreCase: false,
    )
flight_numberOutput
[ XB-134, MT-111 ][ XB, MT ]

See more details here.


Transform map keys

Supported in: Batch, Streaming

Transforms keys of a map by applying an expression to every key-value pair.

Expression categories: Map

Type variable bounds: K accepts AnyType**V accepts AnyType

Output type: Map<K, V>

Example

Argument values:

  • Expression to apply.:
    stringBeforeDelimiter(
     delimiter: -,
     expression: key,
     ignoreCase: false,
    )
  • Map: flight_number
flight_numberOutput
{
 MT-111 -> 2,
 XB-134 -> 1,
}
{
 MT -> 2,
 XB -> 1,
}

See more details here.


Transform map values

Supported in: Batch

Transforms values of a map by applying an expression to every key-value pair.

Expression categories: Map

Type variable bounds: K accepts AnyType**V accepts AnyType

Output type: Map<K, V>

Example

Argument values:

  • Expression to apply.:
    stringBeforeDelimiter(
     delimiter: -,
     expression: value,
     ignoreCase: false,
    )
  • Map: flight_number
flight_numberOutput
{
 1 -> XB-134,
 2 -> MT-111,
}
{
 1 -> XB,
 2 -> MT,
}

See more details here.


Trim whitespace

Supported in: Batch, Streaming

Trims whitespace at beginning and end of string. Whitespace is defined as characters in any of: 1) Unicode's \p{whitespace} set, 2) Java's String#trim() method, or 3) Java's Character#isWhitespace() method.

Expression categories: Data preparation, String

Output type: String

Example

Argument values:

  • Expression: hello world

Output: hello world

See more details here.


Truncate date

Supported in: Batch

Returns the date rounded down to the nearest day/week/month/quarter/year.

Expression categories: Datetime

Output type: Date

See more details here.


Truncate timestamp

Supported in: Batch

Returns the timestamp truncated to the specified unit.

Expression categories: Datetime

Output type: Timestamp

Example

Argument values:

  • Start: 2022-02-01T10:10:10.0022Z
  • Unit: MILLISECONDS

Output: 2022-02-01T10:10:10.002Z

See more details here.


Uncompact a set of H3 indices

Supported in: Batch, Streaming

Uncompact H3 indices to the specified resolution. All input indices must be at a resolution less than or equal to the requested resolution or this transform will return null. If any of the input indices are invalid this transform will return null. Output indices are sorted in ascending order.

Expression categories: Geospatial

Output type: Array<H3 Index>

See more details here.


Unicode normalize

Supported in: Batch, Streaming

Perform unicode normalization as per Unicode Standard Annex #15.

Expression categories: Data preparation, String

Output type: String

Example

Argument values:

  • Expression: string
  • Normalization form: nfkc
stringOutput
123123
イナゴイナゴ

See more details here.


Uniform random number

Supported in: Batch, Streaming

Returns a column of uniform random numbers drawn between 0 and 1. This is not deterministic and will not produce the same result on repeated builds, even when using a seed.

Expression categories: Numeric

Output type: Double

See more details here.


Universally unique identifier (uuid) (unstable)

Supported in: Batch, Streaming

Returns a column of uuids. This is not deterministic and will not produce the same result on repeated builds. This is not the preferred way to build an id column and users should look into sha256 or others that are deterministic.

Expression categories: String

Output type: String

See more details here.


Uppercase

Supported in: Batch, Streaming

Converts all characters in string to uppercase.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: hello World

Output: HELLO WORLD

See more details here.


Url decode

Supported in: Batch, Streaming

Decodes a percent-encoded string to plain text.

Expression categories: Cast, String

Output type: String

Example

Argument values:

  • Expression: string
stringOutput
raw_string_with_no_special_charactersraw_string_with_no_special_characters
test%2Fapi%3Fstring%3D3test/api?string=3

See more details here.


Url encode

Supported in: Batch, Streaming

Percent-encodes a string to be sent in a url.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: string
stringOutput
raw_string_with_no_special_charactersraw_string_with_no_special_characters
test/api?string=3test%2Fapi%3Fstring%3D3

See more details here.


Use LLM

Supported in: Batch

Call an LLM with a configurable prompt.

Expression categories: String

Output type: Array<AnyType> | Boolean | Date | Decimal | Double | Float | Integer | Long | Short | String | Struct | Struct<ok<AnyType> | Boolean | Date | Decimal | Double | Float | Integer | Long | Short | String | Struct | Timestamp, error> | Timestamp

Example

Argument values:

  • Model:
    gpt4ChatModel(
     temperature: 0.0,
    )
  • Prompt: prompt
  • Output mode: null
  • Output type: null
  • System prompt: In the context of a food delivery app, your job is to rate reviews given in the following user promp...
promptOutput
The food was great!5

See more details here.


Value from map

Supported in: Batch, Streaming

Get a value from a map using a key.

Expression categories: Map

Type variable bounds: K accepts ComparableType**V accepts AnyType

Output type: V

Example

Argument values:

  • Key: Foo
  • Map: {
     Bar -> World,
     Foo -> Hello,
    }

Output: Hello

See more details here.


Aggregate expressions


All of

Supported in: Batch

Calculate the boolean 'and' of an aggregate. Nulls are considered false.

Expression categories: Aggregate

Output type: Boolean

Example

Argument values:

  • Expression: values

Given input table:

values
true
false
true

Outputs: false

See more details here.


Any of

Supported in: Batch

Calculate the boolean 'or' of an aggregate. Nulls are considered false.

Expression categories: Aggregate

Output type: Boolean

Example

Argument values:

  • Expression: values

Given input table:

values
true
false
true

Outputs: true

See more details here.


Approximate median

Supported in: Batch

Computes approximate median of values in the column.

Expression categories: Aggregate

Output type: Numeric

Example

Argument values:

  • Expression: values

Given input table:

values
2
4
3

Outputs: 3

See more details here.


Approximate percentile

Supported in: Batch

Returns the approximate percentile of the expression which is the smallest value in the ordered expression values (sorted from least to greatest) such that no more than percentage of expression values is less than the value or equal to that value.

Expression categories: Aggregate

Output type: Array<Numeric> | Byte | Decimal | Double | Float | Integer | Long | Short

Example

Argument values:

  • Expression: values
  • Percentiles: [0.5]
  • Accuracy: null

Given input table:

values
2
4
3

Outputs: 3

See more details here.


Collect array

Supported in: Batch, Streaming

Collects an array of values within each group. Null values are ignored.

Expression categories: Aggregate

Type variable bounds: T accepts AnyType

Output type: Array<T>

Example

Argument values:

  • Expression: factor

Given input table:

factor
2
2
3

Outputs: [ 2, 2, 3 ]

See more details here.


Collect distinct array

Supported in: Batch, Streaming

Collects an array of deduplicated values within each group. Null values are ignored.

Expression categories: Aggregate

Type variable bounds: T accepts ComparableType

Output type: Array<T>

Example

Argument values:

  • Expression: factor

Given input table:

factor
2
2
3

Outputs: [ 2, 3 ]

See more details here.


Covariance

Supported in: Batch, Streaming

Calculate the population covariance of values in two columns.

Expression categories: Aggregate

Output type: Double

Example

Argument values:

  • Left: left
  • Right: right

Given input table:

leftright
15
24
33
42
51

Outputs: -2.0

See more details here.


Create simple geometries from ordered rows of GeoPoints

Supported in: Batch

Given a column of GeoPoints and an ordering, return either a polygon or a line string by connecting the GeoPoints in the specified order. This function assumes that the data is tabular, with a single row representing an individual GeoPoint in a line string or in the shell of a polygon, along with a column specifying the order of those points. For a polygon this ordering should identify the points as you move counter-clockwise around the shell. Given an ordering of these points and a partition (grouping), the function constructs the required geometry for that partition by joining the GeoPoints in ascending order of the order-by column.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • GeoPoint: geo_point
  • Order by (ascending): order
  • Output geometry type: LINE_STRING

Given input table:

geo_pointorder
{
 latitude -> 0.0,
 longitude -> 0.0,
}
0
{
 latitude -> 1.0,
 longitude -> 0.0,
}
1
{
 latitude -> 1.0,
 longitude -> 1.0,
}
2

Outputs: {"type":"LineString","coordinates": [[0.0,0.0],[0.0, 1.0],[1.0,1.0]]}

See more details here.


Dense rank

Supported in: Batch

Returns the rank of rows within a window partition, without any gaps. In case of ties the rows get same rank. The difference between rank and dense_rank is that dense_rank leaves no gaps in ranking sequence when there are ties.

Expression categories: Aggregate

Output type: Integer

See more details here.


Distinct count

Supported in: Batch, Streaming

Calculate distinct number of values in column.

Expression categories: Aggregate

Output type: Long

Example

Argument values:

  • Expression: values

Given input table:

values
2
4
3

Outputs: 3

See more details here.


First

Supported in: Batch

First item in the group. Note, if used within an aggregate or unordered window, the row selected will be non-deterministic.

Expression categories: Aggregate

Type variable bounds: T accepts AnyType

Output type: T

Example

Argument values:

  • Expression: values
  • Ignore nulls: false

Given input table:

values
null
2
4
3

Outputs: null

See more details here.


Grouped geometry envelope

Supported in: Batch

Returns the envelope of all valid geometries in the given column. Invalid geometries are treated as null and ignored.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Expression: geometry

Given input table:

geometry
{"type":"LineString","coordinates":[[1,0],[0,8.4]]}
{"type":"Point","coordinates":[125.6, -92.3]}
{"type":"Polygon","coordinates":[[[0,0],[1,6.3],[-6,1],[0,0]]]}

Outputs: {"type":"Polygon","coordinates":[[[-6.0,-92.3],[-6.0,8.4],[125.6,8.4],[125.6,-92.3],[-6.0,-92.3]]]}

See more details here.


Grouped geometry union

Supported in: Batch

Combines the grouped geometries to create a single geometry.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Expression: geometry

Given input table:

geometry
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}
{"type":"Polygon","coordinates":[[[0.5,0.0],[1.5,0.0],[1.5,1.0],[0.5,1.0],[0.5,0.0]]]}

Outputs: {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[0.5,1.0],[1.0,1.0],[1.5,1.0],[1.5,0.0],[1.0,0.0],[0.5,0.0],[0.0,0.0]]]}

See more details here.


Grouped latitude/longitude bounding box

Supported in: Batch

Returns a struct containing the entire bounding box of all valid geometries in the given column. Invalid geometries are treated as null and ignored.

Expression categories: Geospatial

Output type: LatLonBoundingBox

Example

Argument values:

  • Expression: geometry

Given input table:

geometry
{"type":"LineString","coordinates":[[1,0],[0,8.4]]}
{"type":"Point","coordinates":[125.6, -92.3]}
{"type":"Polygon","coordinates":[[[0,0],[1,6.3],[-6,1],[0,0]]]}

Outputs: {
 maxLat -> 8.4,
 maxLon -> 125.6,
 minLat -> -92.3,
 minLon -> -6.0,
}

See more details here.


Lag

Supported in: Batch

Returns the value of the input at 'lag' before the current row in the window.

Expression categories: Aggregate

Type variable bounds: T accepts AnyType

Output type: T

See more details here.


Last

Supported in: Batch, Streaming

Last item in the group. Note, if used within an aggregate or unordered window, the row selected will be non-deterministic.

Expression categories: Aggregate

Type variable bounds: T accepts AnyType

Output type: T

Example

Argument values:

  • Expression: values
  • Ignore nulls: false

Given input table:

values
2
4
3
null

Outputs: null

See more details here.


Lead

Supported in: Batch

Returns the value of the input at 'lead' after the current row in the window.

Expression categories: Aggregate

Type variable bounds: T accepts AnyType

Output type: T

See more details here.


Linear regression gradient

Supported in: Batch

Calculate the linear regression gradient of the right-hand side (output variable) and the left-hand side (input variable).

Expression categories: Aggregate

Output type: Double

Example

Argument values:

  • Left: left
  • Right: right

Given input table:

leftright
15
24
33
42
51

Outputs: -1.0

See more details here.


Max

Supported in: Batch, Streaming

Calculate maximum value in column.

Expression categories: Numeric

Output type: ComparableType

Example

Argument values:

  • Expression: values

Given input table:

values
2
4
3

Outputs: 4

See more details here.


Max by

Supported in: Streaming

This expression computes a max row according to the max column expression after applying the provided filter specification. If there is no maximum row, null will be returned.

Expression categories: Aggregate

Output type: AnyType

Example

Argument values:

  • Expression: salary
  • Output projection expression: salary
  • Filter condition:
    lessThan(
     left: salary,
     right: 5000,
    )

Given input table:

dep_namesalary
develop9900
develop4000
develop3000

Outputs: 4000

See more details here.


Mean

Supported in: Batch, Streaming

Calculate mean of values in column.

Expression categories: Numeric

Output type: Decimal | Double

Example

Argument values:

  • Expression: values

Given input table:

values
2
4
3

Outputs: 3.0

See more details here.


Min

Supported in: Batch, Streaming

Calculate minimum value in column.

Expression categories: Numeric

Output type: ComparableType

Example

Argument values:

  • Expression: values

Given input table:

values
2
4
3

Outputs: 2

See more details here.


Min by

Supported in: Streaming

This expression computes a min row according to the min column expression after applying the provided filter specification. If there is no minimum row, null will be returned.

Expression categories: Aggregate

Output type: AnyType

Example

Argument values:

  • Expression: salary
  • Output projection expression: salary
  • Filter condition:
    greaterThan(
     left: salary,
     right: 0,
    )

Given input table:

dep_namesalary
develop-999
develop4000
develop3000

Outputs: 3000

See more details here.


Mode

Supported in: Batch

Calculate mode of values in column.

Expression categories: Aggregate

Type variable bounds: String accepts String

Output type: String

Example

Argument values:

  • Expression: values

Given input table:

values
a
b
b
b
c
c
d

Outputs: b

See more details here.


Percent rank

Supported in: Batch

Returns the percentile of rows within a window partition. A draw is assigned the same percent.

Expression categories: Aggregate

Output type: Double

See more details here.


Pivot

Supported in: Streaming

Apply an aggregate expression in a pivot context. The aggregation will run as a set of separate aggregations scoped to each distinct value of the pivot expression. The output is a map from pivot value to aggregate expression value.

Expression categories: Aggregate

Type variable bounds: K accepts ComparableType**V accepts AnyType

Output type: Map<K, V>

Example

Argument values:

  • Aggregate expression:
    sum(
     expression: value,
    )
  • Pivot expression: pivot

Given input table:

pivotvalue
a1
b2
a3

Outputs: {
 a -> 4,
 b -> 2,
}

See more details here.


Product

Supported in: Batch

Calculates the product of all input columns.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Expression: factor

Given input table:

factor
2
4
3

Outputs: 24.0

See more details here.


Rank

Supported in: Batch

Returns the rank of rows within a window partition. In case of ties the rows get same rank. The difference between rank and dense_rank is that rank leaves gaps in ranking sequence when there are ties.

Expression categories: Aggregate

Output type: Integer

See more details here.


Row count

Supported in: Batch, Streaming

Counts the number of non null rows in a group.

Expression categories: Aggregate

Output type: Long

Example

Argument values:

  • Expression: values

Given input table:

values
2
4
3

Outputs: 3

See more details here.


Row number

Supported in: Batch, Streaming

Returns a sequential number starting at 1 inside each partition.

Expression categories: Aggregate

Output type: Integer

See more details here.


Sample covariance

Supported in: Batch, Streaming

Calculate the sample covariance of values in two columns.

Expression categories: Aggregate

Output type: Double

Example

Argument values:

  • Left: left
  • Right: right

Given input table:

leftright
15
24
33
42
51

Outputs: -2.5

See more details here.


Sample variance

Supported in: Batch, Streaming

Calculate the sample variance of values in column.

Expression categories: Aggregate

Output type: Double

Example

Argument values:

  • Expression: values

Given input table:

values
2
2
3

Outputs: 0.33333333333

See more details here.


Standard deviation

Supported in: Batch

Calculate standard deviation of the values in column.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Expression: values

Given input table:

values
2
4
3

Outputs: 0.81649658092773

See more details here.


Sum

Supported in: Batch, Streaming

Sums the specified expression.

Expression categories: Numeric

Output type: Decimal | Double | Long

Example

Argument values:

  • Expression: values

Given input table:

values
2
4
3

Outputs: 9

See more details here.


Variance

Supported in: Batch, Streaming

Calculate population variance of values in column.

Expression categories: Aggregate

Output type: Double

Example

Argument values:

  • Expression: values

Given input table:

values
2
4
3

Outputs: 0.66666666667

See more details here.


Generator expressions


Explode array

Supported in: Batch, Streaming

Explode array into a row per value.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: T

See more details here.


Explode array with position

Supported in: Batch, Streaming

Explode array into a row per value as a struct containing the element's relative position in the array and the element itself.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Struct<Optional[position], Optional[element]>

See more details here.


Explode map

Supported in: Batch, Streaming

Explode map into a row per key, value pair.

Expression categories: Map

Type variable bounds: TKey accepts AnyType**TValue accepts AnyType

Output type: Struct<Optional[key], Optional[value]>

See more details here.


Transforms


Aggregate

Supported in: Batch

Performs the specified aggregations on the input dataset grouped by a set of columns.

Transform categories: Aggregate, Popular

Example

Argument values:

  • Aggregations: [
    alias(
     alias: factor,
     expression:
    sum(
     expression: factor,
    ),
    )]
  • Dataset: ri.foundry.main.dataset.aggregate
  • Group by columns: [tail_number]

Input:

tail_numberairlinemilesfactor
XB-123foundry air1242
MT-222new airline11235
XB-123foundry airline3355
MT-222new air5654
KK-452new air2221
XB-123foundry airline11343

Output:

tail_numberfactor
XB-12310
MT-2229
KK-4521

See more details here.


Aggregate on condition

Supported in: Batch

Aggregate expressions based on a condition statement.

Transform categories: Aggregate

See more details here.


Aggregate over window

Supported in: Streaming

Performs the specified aggregations on the data within a window, emitting outputs as specified by the provided trigger.

Transform categories: Aggregate

See more details here.


Anti join

Supported in: Batch

Anti joins left and right dataset inputs together, removing all rows that match the provided condition.

Transform categories: Join

Example

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [tail_number, airline],
    )
  • Join condition:
    equals(
     left: tail_number,
     right: tail_number,
    )
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right

Inputs: ri.foundry.main.dataset.left

tail_numberairlinemilesfactor
XB-123foundry air1242
MT-222new airline11235
XB-123foundry airline3355
MT-222new air5654
KK-452new air2221
PA-452new air2122
XB-123foundry airline11342

ri.foundry.main.dataset.right

tail_numberhome_airport
XB-123LHR
MT-222CPH
KK-452JFK
JR-201IAD

Output:

tail_numberairline
PA-452new air

See more details here.


Apply expression

Supported in: Batch, Streaming

Transforms input dataset by applying a single expression.

Transform categories: Other

Example

Argument values:

  • Dataset: ri.foundry.main.dataset.a
  • Expression:
    alias(
     alias: kilometers,
     expression:
    convertDistance(
     amount: miles,
     currentUnit: mile,
     targetUnit: kilometer,
    ),
    )

Input:

airlinemiles
foundry airways2500
new air3000

Output:

kilometersairlinemiles
4023.36foundry airways2500
4828.03new air3000

See more details here.


Array elements to columns

Supported in: Batch

Extracts elements from an array into columns.

Transform categories: Array

Example

Argument values:

  • Array: stats
  • Columns to extract: [miles, id]
  • Dataset: ri.foundry.main.dataset.a

Input:

stats
[ 1000, 2 ]

Output:

milesidstats
10002[ 1000, 2 ]

See more details here.


Assign timestamps and watermarks

Supported in: Streaming

Assigns timestamps and watermarks to the input, filtering out records where the timestamp is null.

Transform categories: Other

Example

Argument values:

  • Dataset: ri.foundry.main.dataset.a
  • Timestamp expression: timestamp
  • Emit watermark on every record: null

Input:

timestamptemperaturesensor_id
1969-12-31T23:59:50Z28sensor_1
1969-12-31T23:59:40Z30sensor_2
1969-12-31T23:59:35Z29sensor_1

Output:

timestamptemperaturesensor_id
1969-12-31T23:59:50Z28sensor_1
1969-12-31T23:59:40Z30sensor_2
1969-12-31T23:59:35Z29sensor_1

See more details here.


Coalesce data

Supported in: Batch

Operation to reduce the number of partitions. If say you have 1000 partitions andyou coalesce to 100 there will not be a shuffle, instead each of the 100 new partitions will claim 10 of the current partitions. If a larger number of partitions is requested, it will stay at the current number of partitions.

Transform categories: Other

See more details here.


Compute if expression absent

Supported in: Batch

Computes the expression for new rows, the value for a given key will only ever be computed once, even across builds.

Transform categories: Other

See more details here.


Convert media set to table rows

Supported in: Batch

Produces a dataset containing media references and basic metadata for media items in a media set. Use this transform first to apply other media transforms.

Transform categories: File, Media

See more details here.


Cross join

Supported in: Batch

Cross joins left and right dataset inputs together, matching all rows from each side against all rows from the other. The output is the cartesian product of the two datasets.

Transform categories: Join

Example

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [tail_number, airline],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [home_airport],
    )
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs: ri.foundry.main.dataset.left

tail_numberairlinemilesfactor
XB-123foundry air1242
MT-222new airline11235
PA-452new air2122

ri.foundry.main.dataset.right

tail_numberhome_airport
XB-123LHR
MT-222CPH
KK-452JFK
JR-201IAD

Output:

tail_numberairlinehome_airport
XB-123foundry airLHR
XB-123foundry airCPH
XB-123foundry airJFK
XB-123foundry airIAD
MT-222new airlineLHR
MT-222new airlineCPH
MT-222new airlineJFK
MT-222new airlineIAD
PA-452new airLHR
PA-452new airCPH
PA-452new airJFK
PA-452new airIAD

See more details here.


Date distribution

Supported in: Batch

Computes the distribution of dates/timestamps in a specified column.

Transform categories: Datetime

See more details here.


Drop columns

Supported in: Batch, Streaming

Transforms input dataset by dropping the specified columns.

Transform categories: Popular

Example

Argument values:

  • Columns to drop: {miles}
  • Dataset: ri.foundry.main.dataset.a

Input:

airlinemilesairports
foundry airways3000[ JFK, SFO ]

Output:

airlineairports
foundry airways[ JFK, SFO ]

See more details here.


Drop duplicates

Supported in: Batch

Drops duplicate rows from the input.

Transform categories: Other

Example

Argument values:

  • Dataset: ri.foundry.main.dataset.aggregate
  • Column subset: {tail_number}

Input:

tail_numberairlinemilesfactor
XB-123foundry air1242
MT-222new airline11235
XB-123foundry airline3355
MT-222new air5654
KK-452new air2221
XB-123foundry airline11343

Output:

tail_numberairlinemilesfactor
XB-123foundry air1242
MT-222new airline11235
KK-452new air2221

See more details here.


Empty file

Supported in: Batch

Creates an empty file.

Transform categories: Other

See more details here.


Empty media set file

Supported in: Batch, Streaming

Creates an empty media set file with the given schema and snapshot read mode.

Transform categories: Other

See more details here.


Empty table

Supported in: Batch, Streaming

Creates an empty table with the given schema and read mode.

Transform categories: Other

Example

Argument values:

  • Schema: Struct<flight_code, flight_number, airline>

Inputs: Output:

flight_codeflight_numberairline

See more details here.


Extract file metadata from dataset as rows

Supported in: Batch

Reads file metadata as rows from a dataset of files.

Transform categories: File

See more details here.


Extract many struct fields

Supported in: Batch

Extracts many fields from a struct. Original struct will be dropped.

Transform categories: Struct

Example

Argument values:

  • Dataset: ri.foundry.main.dataset.a
  • Locators: [(airline.name, airline), (tail_no, tail_number)]
  • Struct: raw

Input:

raw
{
airline: {
id: NA,
name: new air,
},
tail_no: NA-123,
}
{
airline: {
id: FA,
name: foundry airways,
},
tail_no: FA-123,
}

Output:

airlinetail_number
new airNA-123
foundry airwaysFA-123

See more details here.


Extract rows from a CSV file

Supported in: Batch

Reads a dataset of files and parses each CSV file into rows.

Transform categories: File

See more details here.


Extract rows from a GeoJSON file

Supported in: Batch

Reads a dataset of files and parses each GeoJSON file into rows. The output dataset will have a geometry column, and a column for each property listed by the user, apart from the _error and _file columns. If the user provides no properties to extract, the entire properties struct will be extracted into a properties column as a string. All GeoJSONs in the files must either be: a) multiline FeatureCollection: an entire file with one GeoJSON of type FeatureCollection b) single-line Feature: a file where every line is a fully valid GeoJSON of type Feature.

Transform categories: File, Geospatial

See more details here.


Extract rows from a JSON file

Supported in: Batch

Reads a dataset of files and parses each JSON file into rows.

Transform categories: File, String, Struct

See more details here.


Extract rows from a dataset of email files

Supported in: Batch

Reads a dataset of email files and parses each file into a row. Supported file extensions: .eml, .emltpl, and .msg.

Transform categories: File, Media

See more details here.


Extract rows from a dataset of text files

Supported in: Batch

Reads a dataset of text files and parses each file into a row.

Transform categories: File, String

See more details here.


Extract rows from an XML file

Supported in: Batch

Reads a dataset of files and parses each XML file into rows.

Transform categories: File

See more details here.


Extract rows from shapefile

Supported in: Batch

Reads a dataset of files and parses each shapefile into rows. All files except .shp, .shx and .dbf files will be ignored. This shapefile parser only supports point, polyline, polygon and multipoint geometry types. The output dataset will have a geometry column, and a column for each property listed by the user, apart from the _error and _file columns. If the user provides no properties to extract, the entire properties struct will be extracted into a properties column as a string.

Transform categories: File, Geospatial

See more details here.


Filter

Supported in: Batch, Streaming

Filters the input dataset based on the specified filter condition.

Transform categories: Popular

Example

Argument values:

  • Dataset: ri.foundry.main.dataset.a
  • Filter condition: recently_serviced

Input:

recently_servicedtail_number
trueKK-150
falseXB-120
trueMT-190

Output:

recently_servicedtail_number
trueKK-150
trueMT-190

See more details here.


First union by name

Supported in: Batch

Unions a set of datasets together on columns from the first dataset, adding nulls when columns are missing. Columns that are not present in the first dataset are removed.

Transform categories: Join

Example

Argument values:

  • Datasets to union: [ri.foundry.main.dataset.a, ri.foundry.main.dataset.b]

Inputs: ri.foundry.main.dataset.a

recently_servicedtail_numberairline_code
trueKK-150KK
falseXB-120XB
trueMT-190MT

ri.foundry.main.dataset.b

recently_servicedtail_numberhome_country
trueAA-200US
trueBN-435UK
trueBN-111UK

Output:

recently_servicedtail_numberairline_code
trueKK-150KK
falseXB-120XB
trueMT-190MT
trueAA-200null
trueBN-435null
trueBN-111null

See more details here.


Flatten struct

Supported in: Batch, Streaming

Take all fields in a struct and turn them into columns in the output dataset.

Transform categories: Struct

Example

Argument values:

  • Dataset: ri.foundry.main.dataset.a
  • Expression: raw
  • Max depth: 2
  • Column prefix: new_
  • Separator: null

Input:

raw
{
airline: {
id: NA,
name: new air,
},
tail_no: NA-123,
}
{
airline: {
id: FA,
name: foundry airways,
},
tail_no: FA-123,
}

Output:

new_airline_namenew_airline_idnew_tail_noraw
new airNANA-123{
airline: {
id: NA,
name: new air,
},
tail_no: NA-123,
}
foundry airwaysFAFA-123{
airline: {
id: FA,
name: foundry airways,
},
tail_no: FA-123,
}

See more details here.


Frequent pattern growth

Supported in: Batch

Frequent pattern (fp) growth finds frequent patterns in your dataset.

Transform categories: Aggregate, Other

Example

Argument values:

  • Input dataset: ri.foundry.main.dataset.a
  • Items column: customer_attributes
  • Minimum support: 0.6

Input:

customer_attributes
[ age_group: 20-30, country: Germany, gender: Female ]
[ age_group: 20-30, country: Germany, gender: Male ]

Output:

patternpattern_occurrencetotal_count
[ country: Germany, age_group: 20-30 ]22
[ age_group: 20-30 ]22
[ country: Germany ]22

See more details here.


Geo distance inner join

Supported in: Batch

Inner joins left and right datasets together based on the distance between input geometries. Internally converts geometries into the given projected coordinate reference system prior to the join and back to WGS84.

Transform categories: Geospatial, Join

Example

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [geometryColLhs, lhs-1],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [geometryCol, arrayCol],
    )
  • Distance: 10.0
  • Join key: (geometryColLhs, geometryCol)
  • Left dataset: ri.foundry.main.dataset.left
  • Projected coordinate system: EPSG:4326
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: rhs_

Inputs: ri.foundry.main.dataset.left

geometryColLhslhs-1
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"}42.0
{"coordinates": [55.0, 5.0], "type":"Point"}43.0
{"coordinates": [[25.0, 0.0], [0.0, 25.0]], "type":"LineString"}44.0

ri.foundry.main.dataset.right

geometryColcol1arrayCol
{"coordinates": [[[20.0, 10.0], [27.0, 10.0], [27.0, 17.0], [20.0, 17.0], [20.0, 10.0]]], "type": "Polygon"}rhsVal1[ 0.0, 1.0 ]
{"coordinates": [[[21.0, 21.0], [27.0, 21.0], [27.0, 27.0], [21.0, 27.0], [21.0, 21.0]]], "type": "Polygon"}rhsVal2[ 0.0, 1.0 ]

Output:

geometryColLhslhs-1rhs_geometryColrhs_arrayCol
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"}42.0{"coordinates": [[[20.0, 10.0], [27.0, 10.0], [27.0, 17.0], [20.0, 17.0], [20.0, 10.0]]], "type": "Polygon"}[ 0.0, 1.0 ]
{"coordinates": [[25.0, 0.0], [0.0, 25.0]], "type":"LineString"}44.0{"coordinates": [[[20.0, 10.0], [27.0, 10.0], [27.0, 17.0], [20.0, 17.0], [20.0, 10.0]]], "type": "Polygon"}[ 0.0, 1.0 ]

See more details here.


Geo distance left join

Supported in: Batch

Left joins datasets together if the distance between input geometries is less than or equal to the specified distance. Internally converts geometries into the given projected coordinate reference system prior to the join and back to WGS84.

Transform categories: Geospatial, Join

Example

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [geometryColLhs, lhs-1],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [geometryColRhs, rhs-1],
    )
  • Distance: 1640.42
  • Join key: (geometryColLhs, geometryColRhs)
  • Left dataset: ri.foundry.main.dataset.left
  • Projected coordinate system: epsg:2868
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs: ri.foundry.main.dataset.left

geometryColLhslhs-1
{"coordinates": [-112.14843750000001,33.440609443703586], "type":"Point"}42.0
null43.0

ri.foundry.main.dataset.right

geometryColRhsrhs-1
{"coordinates": [-112.14560508728029,33.44082430962016], "type":"Point"}rhsVal1
{"coordinates": [-112.11796760559083,33.440895931474124], "type":"Point"}rhsVal2

Output:

geometryColLhslhs-1geometryColRhsrhs-1
{"coordinates": [-112.14843750000001,33.440609443703586], "type":"Point"}42.0{"coordinates": [-112.14560508728029,33.44082430962016], "type":"Point"}rhsVal1
null43.0nullnull

See more details here.


Geo intersection inner join

Supported in: Batch, Streaming

Inner joins left and right datasets together based on whether input geometries overlap. Includes just touching geometries in the results.

Transform categories: Geospatial, Join

Example

Argument values:

  • Condition for columns to select on the left:
    allColumns(

    )
  • Condition for columns to select on the right:
    allColumns(

    )
  • Join key: (geometryColLhs, geometryColRhs)
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs: ri.foundry.main.dataset.left

geometryColLhscol1Lhs
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"}42.0

ri.foundry.main.dataset.right

geometryColRhscol1Rhs
{"coordinates": [[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], "type": "Polygon"}rhsVal1
{"coordinates": [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]], "type": "Polygon"}rhsVal2
{"coordinates": [0.0, 0.0], "type":"Point"}rhsVal3
{"coordinates": [15.0, 15.0], "type":"Point"}rhsVal4
{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"}rhsVal5
{"coordinates": [[20.0, 20.0], [21.0, 23.0]], "type":"LineString"}rhsVal6
{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"}rhsVal7
{"coordinates": [[20.0, 20.0], [21.0, 23.0]], "type":"LineString"}rhsVal8
{"coordinates": [[[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"}rhsVal9
{"coordinates": [[[[170.0, 170.0], [190.0, 170.0], [190.0, 190.0], [170.0, 190.0], [170.0, 170.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"}rhsVal10

Output:

geometryColLhscol1LhsgeometryColRhscol1Rhs
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"}42.0{"coordinates": [[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], "type": "Polygon"}rhsVal1
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"}42.0{"coordinates": [0.0, 0.0], "type":"Point"}rhsVal3
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"}42.0{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"}rhsVal5
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"}42.0{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"}rhsVal7
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"}42.0{"coordinates": [[[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"}rhsVal9

See more details here.


Geo intersection left join

Supported in: Batch

Left joins input datasets based on whether input geometries overlap. Includes just touching geometries in the results.

Transform categories: Geospatial, Join

Example

Argument values:

  • Condition for columns to select on the left:
    allColumns(

    )
  • Condition for columns to select on the right:
    allColumns(

    )
  • Join key: (geometryColLhs, geometryColRhs)
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs: ri.foundry.main.dataset.left

geometryColLhscol1Lhs
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"}42.0
{"coordinates": [55.0, 5.0], "type":"Point"}43.0

ri.foundry.main.dataset.right

geometryColRhscol1Rhs
{"coordinates": [[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], "type": "Polygon"}rhsVal1
{"coordinates": [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]], "type": "Polygon"}rhsVal2
{"coordinates": [0.0, 0.0], "type":"Point"}rhsVal3
{"coordinates": [15.0, 15.0], "type":"Point"}rhsVal4
{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"}rhsVal5
{"coordinates": [[20.0, 20.0], [21.0, 23.0]], "type":"LineString"}rhsVal6
{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"}rhsVal7
{"coordinates": [[20.0, 20.0], [21.0, 23.0]], "type":"LineString"}rhsVal8
{"coordinates": [[[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"}rhsVal9
{"coordinates": [[[[170.0, 170.0], [190.0, 170.0], [190.0, 190.0], [170.0, 190.0], [170.0, 170.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"}rhsVal10

Output:

geometryColLhscol1LhsgeometryColRhscol1Rhs
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"}42.0{"coordinates": [[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], "type": "Polygon"}rhsVal1
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"}42.0{"coordinates": [0.0, 0.0], "type":"Point"}rhsVal3
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"}42.0{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"}rhsVal5
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"}42.0{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"}rhsVal7
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"}42.0{"coordinates": [[[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"}rhsVal9
{"coordinates": [55.0, 5.0], "type":"Point"}43.0nullnull

See more details here.


GeoPoint-to-GeoPoint 3d distance inner join

Supported in: Batch

Inner joins left and right datasets together based on the distance between point geometries. The geometries must represent points, and may optionally include a z-coordinate. Internally converts geometries into the given projected coordinate reference system prior to the join and back to WGS84. Non-point geometries are ignored, and the entire right dataset must be able to fit into driver and executor memory. A 3 gb executor should be able to handle up to 4 million points in the neighbors dataset.

Transform categories: Geospatial, Join

Example

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [geometryColLhs, lhs-1],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [geometryCol, arrayCol],
    )
  • Distance: 2.5
  • Join key: (geometryColLhs, geometryCol)
  • Left dataset: ri.foundry.main.dataset.left
  • Projected coordinate system: EPSG:4326
  • Right dataset: ri.foundry.main.dataset.right
  • Use z-coordinate: false
  • Prefix for columns from right: rhs_

Inputs: ri.foundry.main.dataset.left

geometryColLhslhs-1
{"coordinates": [0.0, 0.0, 0.0], "type":"Point"}42.0
{"coordinates": [0.0, 0.0, 5.0], "type":"Point"}43.0
{"coordinates": [0.0, 0.0], "type":"Point"}44.0

ri.foundry.main.dataset.right

geometryColcol1arrayCol
{"coordinates": [0.0, 0.0, 2.0], "type":"Point"}rhsVal1[ 0.0, 1.0 ]
{"coordinates": [0.0, 1.0], "type":"Point"}rhsVal2[ 0.0, 1.0 ]

Output:

geometryColLhslhs-1rhs_geometryColrhs_arrayCol
{"coordinates": [0.0, 0.0, 0.0], "type":"Point"}42.0{"coordinates": [0.0, 0.0, 2.0], "type":"Point"}[ 0.0, 1.0 ]
{"coordinates": [0.0, 0.0, 0.0], "type":"Point"}42.0{"coordinates": [0.0, 1.0], "type":"Point"}[ 0.0, 1.0 ]
{"coordinates": [0.0, 0.0, 5.0], "type":"Point"}43.0{"coordinates": [0.0, 0.0, 2.0], "type":"Point"}[ 0.0, 1.0 ]
{"coordinates": [0.0, 0.0, 5.0], "type":"Point"}43.0{"coordinates": [0.0, 1.0], "type":"Point"}[ 0.0, 1.0 ]
{"coordinates": [0.0, 0.0], "type":"Point"}44.0{"coordinates": [0.0, 0.0, 2.0], "type":"Point"}[ 0.0, 1.0 ]
{"coordinates": [0.0, 0.0], "type":"Point"}44.0{"coordinates": [0.0, 1.0], "type":"Point"}[ 0.0, 1.0 ]

See more details here.


Geometry intersection join

Supported in: Batch

Inner joins left and right datasets together based on whether input geometries overlap. Returns a row containing all of the columns from both datasets if the join key column pair has geometries which intersect. Currently does not support joining on multiple join keys. Silently filters null join key geometry values. Left and right datasets must not have the same column names. Silently nullifies invalid GeoJSON in join columns.

Transform categories: Geospatial, Join

Example

Argument values:

  • Join key: [(geometryColLhs, geometryColRhs)]
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right

Inputs: ri.foundry.main.dataset.left

geometryColLhslhs-1
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"}42.0

ri.foundry.main.dataset.right

geometryColRhsrhs-1
{"coordinates": [[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], "type": "Polygon"}rhsVal1
{"coordinates": [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]], "type": "Polygon"}rhsVal2
{"coordinates": [0.0, 0.0], "type":"Point"}rhsVal3
{"coordinates": [15.0, 15.0], "type":"Point"}rhsVal4
{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"}rhsVal5
{"coordinates": [[20.0, 20.0], [21.0, 23.0]], "type":"LineString"}rhsVal6
{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"}rhsVal7
{"coordinates": [[20.0, 20.0], [21.0, 23.0]], "type":"LineString"}rhsVal8
{"coordinates": [[[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"}rhsVal9
{"coordinates": [[[[170.0, 170.0], [190.0, 170.0], [190.0, 190.0], [170.0, 190.0], [170.0, 170.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"}rhsVal10

Output:

geometryColLhslhs-1geometryColRhsrhs-1
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"}42.0{"coordinates": [[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], "type": "Polygon"}rhsVal1
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"}42.0{"coordinates": [0.0, 0.0], "type":"Point"}rhsVal3
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"}42.0{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"}rhsVal5
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"}42.0{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"}rhsVal7
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"}42.0{"coordinates": [[[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"}rhsVal9

See more details here.


Geometry knn inner join

Supported in: Batch

Selects the k closest points from the neighbors dataset for each valid input geometry from the base dataset. Internally converts the input datasets to the given coordinate reference system, and back to WGS84. The entire neighbors dataset must be able to fit into driver and executor memory. A 3 gb executor should be able to handle up to 1 million points in the neighbors dataset.

Transform categories: Geospatial, Join

Example

Argument values:

  • Base dataset: ri.foundry.main.dataset.left
  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [geometryCol, lhsCol],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [geometryCol, col],
    )
  • Join key: (geometryCol, geometryCol)
  • K: 2
  • Neighbors dataset: ri.foundry.main.dataset.right
  • Projected coordinate system: epsg:2868
  • Prefix for columns from right: rhs_

Inputs: ri.foundry.main.dataset.left

geometryCollhsCol
{"coordinates": [-112.14843750000001,33.440609443703586], "type":"Point"}42.0

ri.foundry.main.dataset.right

geometryColcol
{
latitude: 33.440609443703586,
longitude: -112.14843750000001,
}
rhsVal1
{
latitude: 33.44082430962016,
longitude: -112.14560508728029,
}
rhsVal2
{
latitude: 33.440895931474124,
longitude: -112.11796760559083,
}
rhsVal3

Output:

geometryCollhsColrhs_geometryColrhs_col
{"coordinates": [-112.14843750000001,33.440609443703586], "type":"Point"}42.0{
latitude: 33.440609443703586,
longitude: -112.14843750000001,
}
rhsVal1
{"coordinates": [-112.14843750000001,33.440609443703586], "type":"Point"}42.0{
latitude: 33.44082430962016,
longitude: -112.14560508728029,
}
rhsVal2

See more details here.


Geometry knn left join

Supported in: Batch

Selects the k closest points from the neighbors dataset for each valid input geometry from the base dataset. Internally converts the input datasets to the given coordinate reference system, and back to WGS84. The entire neighbors dataset must be able to fit into driver and executor memory. A 3 gb executor should be able to handle up to 1 million points in the neighbors dataset.

Transform categories: Geospatial, Join

Example

Argument values:

  • Base dataset: ri.foundry.main.dataset.left
  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [geometryCol, lhsCol],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [geometryCol, col],
    )
  • Join key: (geometryCol, geometryCol)
  • K: 2
  • Neighbors dataset: ri.foundry.main.dataset.right
  • Projected coordinate system: epsg:2868
  • Prefix for columns from right: rhs_

Inputs: ri.foundry.main.dataset.left

geometryCollhsCol
{"coordinates": [-112.14843750000001,33.440609443703586], "type":"Point"}42.0

ri.foundry.main.dataset.right

geometryColcol
{
latitude: 33.440609443703586,
longitude: -112.14843750000001,
}
rhsVal1
{
latitude: 33.44082430962016,
longitude: -112.14560508728029,
}
rhsVal2
{
latitude: 33.440895931474124,
longitude: -112.11796760559083,
}
rhsVal3

Output:

geometryCollhsColrhs_geometryColrhs_col
{"coordinates": [-112.14843750000001,33.440609443703586], "type":"Point"}42.0{
latitude: 33.440609443703586,
longitude: -112.14843750000001,
}
rhsVal1
{"coordinates": [-112.14843750000001,33.440609443703586], "type":"Point"}42.0{
latitude: 33.44082430962016,
longitude: -112.14560508728029,
}
rhsVal2

See more details here.


Get media references (datasets)

Supported in: Batch

Produces a dataset containing media references and basic metadata for files in a dataset.

Transform categories: File

See more details here.


Heartbeat detection

Supported in: Streaming

Detects when a record hasn't been seen for a configurable amount of time for a set of keys.

Transform categories: Other

See more details here.


Inner join

Supported in: Batch

Joins two datasets together, keeping only rows that satisfy the provided condition from each table.

Transform categories: Join

Example

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [tail_number, airline],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [home_airport],
    )
  • Join condition:
    equals(
     left: tail_number,
     right: tail_number,
    )
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs: ri.foundry.main.dataset.left

tail_numberairlinemilesfactor
XB-123foundry air1242
MT-222new airline11235
XB-123foundry airline3355
MT-222new air5654
KK-452new air2221
PA-452new air2122
XB-123foundry airline11342

ri.foundry.main.dataset.right

tail_numberhome_airport
XB-123LHR
MT-222CPH
KK-452JFK
JR-201IAD

Output:

tail_numberairlinehome_airport
XB-123foundry airLHR
MT-222new airlineCPH
XB-123foundry airlineLHR
MT-222new airCPH
KK-452new airJFK
XB-123foundry airlineLHR

See more details here.


Join

Supported in: Batch, Streaming

Joins left and right dataset inputs together.

Transform categories: Join

See more details here.


K-means clustering

Supported in: Batch

K-means clustering is an unsupervised machine learning algorithm. It groups dataset vectors into k clusters. The k value is determined by computing the best silhouette score of the specified range between minimum k and maximum k. Number of k values defines how many k values should be tried within this range, inclusive of the boundaries.

Transform categories: Other

Example

Argument values:

  • Input dataset: ri.foundry.main.dataset.a
  • Maximum k: 12
  • Minimum k: 3
  • Number of k values: 4
  • Vector column: feature_column

Input:

feature_column
[ 0.05, 3.1, 2.3 ]
[ 1.0, 3.1, 2.3 ]
[ 1.0, 3.5, 2.3 ]
[ 19.0, 12.3, -1.4 ]

Output:

feature_columncluster_id
[ 1.0, 3.1, 2.3 ]0
[ 1.0, 3.5, 2.3 ]0
[ 19.0, 12.3, -1.4 ]1
[ 0.05, 3.1, 2.3 ]2

See more details here.


KNN join

Supported in: Batch

Return the K nearest rows from the right dataset for each row in the left dataset, based on the distance measure.

Transform categories: Join

Example

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [tail_number, airline],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [fuzzy_airline, home_airport],
    )
  • Distance measure expression.:
    alias(
     alias: distance,
     expression:
    levenshteinDistance(
     ignoreCase: true,
     left: airline,
     right: fuzzy_airline,
    ),
    )
  • K nearest: 2
  • Left dataset: ri.foundry.main.dataset.left
  • Rank column name: rank
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs: ri.foundry.main.dataset.left

tail_numberairlinemilesfactor
XB-123foundry air1242
MT-222new airline11235
PA-452new air2122

ri.foundry.main.dataset.right

fuzzy_airlinehome_airport
airLHR
new airlineCPH
new planeJFK
old airIAD

Output:

rankdistancetail_numberairlinefuzzy_airlinehome_airport
13PA-452new airold airIAD
24PA-452new airairLHR
24PA-452new airnew airlineCPH
24PA-452new airnew planeJFK
10MT-222new airlinenew airlineCPH
24MT-222new airlinenew planeJFK
15XB-123foundry airold airIAD
28XB-123foundry airairLHR

See more details here.


Keeps duplicates

Supported in: Batch

Keep duplicate rows from the input.

Transform categories: Other

Example

Argument values:

  • Column subset: {tail_number}
  • Dataset: ri.foundry.main.dataset.aggregate

Input:

tail_numberairlinemilesfactor
XB-123foundry air1242
MT-222new airline11235
XB-123foundry airline3355
MT-222new air5654
KK-452new air2221
XB-123foundry airline11343

Output:

tail_numberairlinemilesfactor
XB-123foundry air1242
MT-222new airline11235
XB-123foundry airline3355
MT-222new air5654
XB-123foundry airline11343

See more details here.


Key by

Supported in: Streaming

Keys the input by the provided key by columns. Note that this does not re-sort the data and only maintains per key ordering from the point the keys are set. Re-keying data may be unsafe in that if the newly keyed data was depending on any specific ordering then we can't guarantee that ordering if it wasn't already maintained by the previous keying. Additionally sets the primary key if cdc (change data capture) mode is enabled. Primary key defines columns that indicate which rows are updates, deletes, and the ordering of when read as a current view.

Transform categories: Other

See more details here.


Left join

Supported in: Batch

Joins two datasets together, keeping all rows from the left table and only rows which satisfy the provided condition from the right table.

Transform categories: Join

Example

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [tail_number, airline],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [home_airport],
    )
  • Join condition:
    equals(
     left: tail_number,
     right: tail_number,
    )
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs: ri.foundry.main.dataset.left

tail_numberairlinemilesfactor
XB-123foundry air1242
MT-222new airline11235
XB-123foundry airline3355
MT-222new air5654
KK-452new air2221
PA-452new air2122
XB-123foundry airline11342

ri.foundry.main.dataset.right

tail_numberhome_airport
XB-123LHR
MT-222CPH
KK-452JFK
JR-201IAD

Output:

tail_numberairlinehome_airport
XB-123foundry airLHR
MT-222new airlineCPH
XB-123foundry airlineLHR
MT-222new airCPH
KK-452new airJFK
PA-452new airnull
XB-123foundry airlineLHR

See more details here.


Left lookup join

Supported in: Streaming

Joins two datasets together, keeping all rows from the left table and only matching rows from the right table.

Transform categories: Join

Example

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [tail_number, airline],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [home_airport],
    )
  • Join condition: [(tail_number, tail_number)]
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs: ri.foundry.main.dataset.left

tail_numberairlinemilesfactor
XB-123foundry air1242
MT-222new airline11235
XB-123foundry airline3355
MT-222new air5654
KK-452new air2221
PA-452new air2122
XB-123foundry airline11342

ri.foundry.main.dataset.right

tail_numberhome_airport
XB-123LHR
MT-222CPH
KK-452JFK
JR-201IAD

Output:

tail_numberairlinehome_airport
XB-123foundry airLHR
MT-222new airlineCPH
XB-123foundry airlineLHR
MT-222new airCPH
KK-452new airJFK
PA-452new airnull
XB-123foundry airlineLHR

See more details here.


Manually entered table

Supported in: Batch, Streaming

Uses manually entered table data to create an output.

Transform categories: Other

Example

Argument values:

  • Rows: [{
    airline: foundry airlines,
    flight_code: 112,
    flight_number: XB-123,
    }, {
    airline: foundry airlines,
    flight_code: 533,
    flight_number: MT-444,
    }, {
    airline: new air,
    flight_code: 934,
    flight_number: KK-123,
    }]
  • Schema: Struct<flight_code, flight_number, airline>

Inputs: Output:

flight_codeflight_numberairline
112XB-123foundry airlines
533MT-444foundry airlines
934KK-123new air

See more details here.


Mapping join

Supported in: Batch

Replaces values from the target columns in the source dataset with values in the mapping dataset.

Transform categories: Join

Type variable bounds: T1 accepts AnyType**T2 accepts AnyType

Example

Argument values:

  • Input dataset: ri.foundry.main.dataset.input
  • Key column for mapping values: flight_code
  • Mapping dataset: ri.foundry.main.dataset.mapping
  • Target columns: [flight_no, next_flight]
  • Values to use for mapping: flight_number
  • Assume unique mappings: null
  • Default value: unknown

Inputs: ri.foundry.main.dataset.input

flight_nonext_flightdeparture_time
5331122022-01-20T10:45:00Z
9345332022-01-20T11:20:00Z
2229342022-01-20T11:20:00Z

ri.foundry.main.dataset.mapping

flight_codeflight_numberairline
112XB-123foundry airlines
533MT-444foundry airlines
934KK-123new air

Output:

flight_nonext_flightdeparture_time
MT-444XB-1232022-01-20T10:45:00Z
KK-123MT-4442022-01-20T11:20:00Z
unknownKK-1232022-01-20T11:20:00Z

See more details here.


Narrow union by name

Supported in: Batch

Unions a set of datasets together on the intersection of their column names, columns that are not present in all input datasets are removed.

Transform categories: Join

Example

Argument values:

  • Datasets to union: [ri.foundry.main.dataset.a, ri.foundry.main.dataset.b]

Inputs: ri.foundry.main.dataset.a

recently_servicedtail_number
trueKK-150
falseXB-120
trueMT-190

ri.foundry.main.dataset.b

recently_servicedtail_numberairline_code
trueAA-200AA
trueBN-435BN
trueBN-111BN

Output:

recently_servicedtail_number
trueKK-150
falseXB-120
trueMT-190
trueAA-200
trueBN-435
trueBN-111

See more details here.


Normalize column names

Supported in: Batch, Streaming

Normalizes column names to use lower_snake_case.

Transform categories: Data preparation

Example

Argument values:

  • Dataset: ri.foundry.main.dataset.a
  • Remove special characters: null

Input:

recentlyServicedtailNumber_airlineCode
trueKK-150KK
falseXB-120XB
trueMT-190MT

Output:

recently_servicedtail_numberairline_code
trueKK-150KK
falseXB-120XB
trueMT-190MT

See more details here.


Numeric distribution

Supported in: Batch

Computes the distribution of numeric values in a specified column.

Transform categories: Numeric

See more details here.


Outer caching join

Supported in: Streaming

Rows from the left & right inputs which meet all of the match conditions and are within the caching window, along with unmatched rows from both inputs.

Transform categories: Join

See more details here.


Outer caching join

Supported in: Streaming

Joins left and right dataset inputs together, caching the record with the highest event time from each side for use in subsequent joins. Processing time of a record is used as a tiebreaker. In the case of a time results are optimistically emitted if there's no value to join against.

Transform categories: Join

See more details here.


Outer join

Supported in: Batch

Outer joins the provided dataset inputs together, keeping all rows from both datasets. Columns have nulls when there is no row satisfying the provided condition.

Transform categories: Join

Example

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [tail_number, airline],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [home_airport],
    )
  • Join condition:
    equals(
     left: tail_number,
     right: tail_number,
    )
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs: ri.foundry.main.dataset.left

tail_numberairlinemilesfactor
XB-123foundry air1242
MT-222new airline11235
XB-123foundry airline3355
MT-222new air5654
KK-452new air2221
PA-452new air2122
XB-123foundry airline11342

ri.foundry.main.dataset.right

tail_numberhome_airport
XB-123LHR
MT-222CPH
KK-452JFK
JR-201IAD

Output:

tail_numberairlinehome_airport
XB-123foundry airLHR
MT-222new airlineCPH
XB-123foundry airlineLHR
MT-222new airCPH
KK-452new airJFK
PA-452new airnull
XB-123foundry airlineLHR
JR-201nullIAD

See more details here.


Pivot

Supported in: Batch

Performs the specified aggregations on the input dataset grouped by a set of columns. Unique values to pivot on must be provided such that the output schema is known ahead of runtime. This improves runtime stability over time.

Transform categories: Aggregate, Popular

Type variable bounds: T accepts Boolean | Byte | Integer | Long | Short | String

Example

Argument values:

  • Aggregations: [
    alias(
     alias: miles,
     expression:
    mean(
     expression: miles,
    ),
    )]
  • Dataset: ri.foundry.main.dataset.a
  • Group by columns: [airline]
  • Pivot by column: airport
  • Pivot by values: [(JFK, new_york), (LHR, london)]
  • Prefix or suffix alias: null

Input:

airlineairportmiles
foundry airwaysJFK1002345
foundry airwaysLHR2221324
new airSFO21356673
new airJFK12323456
foundry airwaysLHR12542352
new airJFK12232355

Output:

airlinenew_york_mileslondon_miles
foundry airways1002345.07381838.0
new air1.22779055E7null

See more details here.


Project

Supported in: Batch, Streaming

Transforms input dataset either by selecting columns or applying functions to columns.

Transform categories: Other

Example

Argument values:

  • Columns: [
    alias(
     alias: airline,
     expression: airlin,
    )]
  • Dataset: ri.foundry.main.dataset.a
  • Keep remaining columns: false

Input:

airlinmiles
foundry airways2500
new air3000

Output:

airline
foundry airways
new air

See more details here.


Project on condition

Supported in: Batch, Streaming

Transforms input dataset either by selecting columns or applying functions to columns.

Transform categories: Popular

See more details here.


Project over window

Supported in: Batch, Streaming

Performs the specified aggregations on the data within the window. Emits one row each time a new row is received.

Transform categories: Aggregate

See more details here.


Rename columns

Supported in: Batch, Streaming

Renames a set of columns.

Transform categories: Data preparation, Popular

Example

Argument values:

  • Input dataset: ri.foundry.main.dataset.a
  • Renames: [(recently_serviced, does_not_require_service)]

Input:

recently_servicedtail_numberairline_code
trueKK-150KK
falseXB-120XB
trueMT-190MT

Output:

does_not_require_servicetail_numberairline_code
trueKK-150KK
falseXB-120XB
trueMT-190MT

See more details here.


Repartition data

Supported in: Batch

Forces a shuffle of the data based on optionally provided partitioning columns and a resulting number of partitions. If these are not provided, the partitioning will be determined automatically.

Transform categories: Other

See more details here.


Rollup

Supported in: Batch

Performs the specified aggregations on the input dataset at different levels of granularity, providing both intermediate and super aggregates.

Transform categories: Aggregate

Example

Argument values:

  • Aggregations: [
    alias(
     alias: mean_price,
     expression:
    mean(
     expression: price,
    ),
    )]
  • Dataset: ri.foundry.main.dataset.rollupBaseCase
  • Rollup columns: [city, model]

Input:

citymodelpricestore
Londonnew phone900.0MegaMart
Londonnew phone850.75AA
Londonnew phone870.75ABC Zone
San Francisconew phone1000.0Prescos
San Francisconew phone950.25XZY Force
San Francisconew phone1105.7Phone Mart
LondonforestX 20750.1MegaMart
LondonforestX 20690.0AA
LondonforestX 20730.0ABC Zone
San FranciscoforestX 20890.4Prescos
San FranciscoforestX 20900.1XZY Force
San FranciscoforestX 201050.75Phone Mart

Output:

citymodelmean_price
Londonnew phone873.8333333333334
LondonforestX 20723.3666666666667
Londonnull798.6
San Francisconew phone1018.65
San FranciscoforestX 20947.0833333333334
San Francisconull982.8666666666667
nullnull890.7333333333335

See more details here.


Row size

Supported in: Batch

Estimates the size of a single row in the JVM.

Transform categories: Other

See more details here.


Select columns

Supported in: Batch, Streaming

Selects a set of columns from the input dataset.

Transform categories: Popular

See more details here.


Semi join

Supported in: Batch

Semi joins left and right dataset inputs together. This removes all rows that don't match the join condition.

Transform categories: Join

Example

Argument values:

  • Condition for columns to select on the left:
    allColumns(

    )
  • Join condition:
    equals(
     left: tail_number,
     right: tail_number,
    )
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right

Inputs: ri.foundry.main.dataset.left

tail_numberairlinemilesfactor
XB-123foundry air1242
MT-222new airline11235
XB-123foundry airline3355
MT-222new air5654
KK-452new air2221
PA-452new air2122
XB-123foundry airline11342

ri.foundry.main.dataset.right

tail_numberhome_airport
XB-123LHR
MT-222CPH
KK-452JFK
JR-201IAD

Output:

tail_numberairlinemilesfactor
XB-123foundry air1242
MT-222new airline11235
XB-123foundry airline3355
MT-222new air5654
KK-452new air2221
XB-123foundry airline11342

See more details here.


Sort

Supported in: Batch

Transforms input dataset either by selecting columns or applying functions to columns.

Transform categories: Other

Example

Argument values:

  • Dataset: ri.foundry.main.dataset.a
  • Sort specification: [(b, DESCENDING)]

Input:

ab
12
34
56

Output:

ab
56
34
12

See more details here.


Text block

Supported in: Batch, Streaming

Insert a text description between your transformations. This does not transform the input data in any way.

Transform categories: Other

See more details here.


Time bounded drop duplicates

Supported in: Streaming

Drops duplicate rows from the input for given column subset, rows seen will expire after configured amount of event time. Row that arrive late by an amount greater than the configured amount of event time will always be dropped. Partitions by keys specified. Each drop duplicates will be computed separately for distinct key column values.

Transform categories: Other

See more details here.


Time bounded drop out of order

Supported in: Streaming

Drops rows with the same values for all key columns that are out of order. A row is out of order if it would have come before an already received row with the same key values based on sort columns and directions. Two rows are compared by evaluating the first sort column and direction first, and then moving on to the next sort column and direction if and only if there was a tie, and so on until order is determined or all sort columns are tied in which case the rows are equal. The current maximum for each key is stored until no new rows have been seen for that key for an event time greater than or equal to the expiry. After a key has received no new rows for greater or equal to the expiry time, any new row for that key will be never be dropped, and will always be stored as the new current maximum.

Transform categories: Other

See more details here.


Time bounded event time sort

Supported in: Streaming

Emits rows by key in ascending event time order, allowing for late arriving records up until at least the allowed lateness. Records arriving after the allowed lateness plus some small buffer interval will be dropped.

Transform categories: Other

See more details here.


Top rows

Supported in: Batch

Picks the top rows in each sorted partition.

Transform categories: Aggregate

Example

Argument values:

  • Dataset: ri.foundry.main.dataset.a
  • Partition by columns: {airline}
  • Sort specification: [(airport, DESCENDING), (miles, ASCENDING)]
  • Number of rows: null

Input:

airlineairportmiles
foundry airwaysJFK1002345
foundry airwaysLHR2221324
new airSFO21356673
new airJFK12323456
foundry airwaysLHR12542352
new airJFK12232355

Output:

airlineairportmiles
foundry airwaysLHR2221324
new airSFO21356673

See more details here.


Union by name

Supported in: Batch, Streaming

Unions a set of datasets together on matching column names.

Transform categories: Join

Example

Argument values:

  • Datasets to union: [ri.foundry.main.dataset.a, ri.foundry.main.dataset.b]

Inputs: ri.foundry.main.dataset.a

recently_servicedtail_numberairline_code
trueKK-150KK
falseXB-120XB
trueMT-190MT

ri.foundry.main.dataset.b

recently_servicedtail_numberairline_code
trueAA-200AA
trueBN-435BN
trueBN-111BN

Output:

recently_servicedtail_numberairline_code
trueKK-150KK
falseXB-120XB
trueMT-190MT
trueAA-200AA
trueBN-435BN
trueBN-111BN

See more details here.


Unpivot

Supported in: Batch, Streaming

Unpivot is the opposite operation of pivot. This converts multiple columns into rows, transforming data from a wide format to a long format. To do so it creates two new columns: one containing the original column names as values, and another containing the corresponding data values. All other columns that are not unpivoted are kept as is.

Transform categories: Aggregate, Popular

Type variable bounds: T accepts AnyType

Example

Argument values:

  • Columns to unpivot: [new_york_miles, london_miles]
  • Dataset: ri.foundry.main.dataset.a
  • Output unpivoted column name: city
  • Unpivoted values output column name: miles

Input:

airlinenew_york_mileslondon_miles
foundry airways10006000
new airnull8000

Output:

citymilesairline
new_york_miles1000foundry airways
london_miles6000foundry airways
new_york_milesnullnew air
london_miles8000new air

See more details here.


Wide union by name

Supported in: Batch, Streaming

Unions a set of datasets together on the superset of their column names, adding nulls when columns are missing.

Transform categories: Join

Example

Argument values:

  • Datasets to union: [ri.foundry.main.dataset.a, ri.foundry.main.dataset.b]

Inputs: ri.foundry.main.dataset.a

recently_servicedtail_number
trueKK-150
falseXB-120
trueMT-190

ri.foundry.main.dataset.b

recently_servicedtail_numberairline_code
trueAA-200AA
trueBN-435BN
trueBN-111BN

Output:

recently_servicedtail_numberairline_code
trueKK-150null
falseXB-120null
trueMT-190null
trueAA-200AA
trueBN-435BN
trueBN-111BN

See more details here.


Window

Supported in: Batch

Performs the specified aggregations on the input dataset grouped by a set of columns.

Transform categories: Aggregate, Popular

See more details here.