Functions index

Pipeline Builder provides expressions that operate at different levels. They can generally be categorized as row level, aggregations or generators.

Row level functions operate on values from a single row. Most expressions fall in this category, for example add.

Aggregations aggregate multiple row values into one. For example the 'sum' expression.

Generators produce multiple values from a single row. For example the 'explode_array' expression

Transforms are functions that operate on a whole table or multiple tables. For example the 'drop' transform.The following document will outline the available expressions and transforms.

Row level expressions


Absolute value

Supported in: Batch, Streaming

Returns the absolute value.

Expression categories: Numeric

Type variable bounds: T accepts Numeric

Output type: T

Example

Argument values:

  • Expression: numeric_column
numeric_columnOutput
0.00.0
1.11.1
-1.11.1

See details.


Add numbers

Supported in: Batch, Streaming

Calculates the sum of all input columns. Returns null on overflow.

Expression categories: Numeric

Output type: Numeric

Example

Argument values:

  • Expressions: [col_a, col_b]
col_acol_bOutput
123

See details.


Add or update map

Supported in: Batch, Streaming

Updates a value by key in a map or adds new key value pair.

Expression categories: Map

Type variable bounds: K accepts AnyType**V accepts AnyType

Output type: Map<K, V>

Example

Argument values:

  • Expression: 4
  • Key: k
  • Map: map_col
map_colOutput
{
 a -> 1,
 b -> 2,
 k -> 2,
}
{
 a -> 1,
 b -> 2,
 k -> 4,
}
{
 a -> 1,
 b -> 2,
}
{
 a -> 1,
 b -> 2,
 k -> 4,
}

See details.


Add or update struct field

Supported in: Batch, Streaming

Updates a field of a struct or adds a new field.

Expression categories: Struct

Output type: Struct

Example

Argument values:

  • Expression: value
  • Locator: airline.id
  • Struct: struct
structvalueOutput
{
airline: {
id: NA,
},
}
1{
airline: {
id: 1,
},
}
{
airline: {
id: FE,
},
}
2{
airline: {
id: 2,
},
}

See details.


Add value to date

Supported in: Batch, Streaming

Returns the date that is 'value' days/weeks/months/quarter/years after 'start'.

Expression categories: Datetime

Output type: Date

Example

Argument values:

  • Date: 2022-02-01
  • Unit: DAYS
  • Value: 2

Output: 2022-02-03

See details.


All array elements satisfy

Supported in: Batch, Streaming

Return true if the expression is true for all elements in the array.

Expression categories: Array

Output type: Boolean

Example

Argument values:

  • Array: miles
  • Boolean condition:
    isNull(
     expression: element,
    )
milesOutput
[ 12300, null ]false
[ null, null ]true

See details.


And

Supported in: Batch, Streaming

Returns true if all of the specified conditions are true. Nulls are considered false.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Conditions: [left_boolean, right_boolean]
left_booleanright_booleanOutput
truetruetrue
truefalsefalse
falsetruefalse
falsefalsefalse

See details.


Any array element satisfy

Supported in: Batch, Streaming

Return true if the expression is true for any element in the array.

Expression categories: Array

Output type: Boolean

Example

Argument values:

  • Array: miles
  • Boolean condition:
    isNull(
     expression: element,
    )
milesOutput
[ 12300, null ]true
[ 12300, 12000 ]false

See details.


Arccos

Supported in: Batch, Streaming

Inverse cosine function.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Angle unit: radians
  • Value: 1.0

Output: 0.0

See details.


Arcsin

Supported in: Batch, Streaming

Inverse sine function.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Angle unit: radians
  • Value: 0.0

Output: 0.0

See details.


Arctan

Supported in: Batch, Streaming

Inverse tangent function.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Angle unit: degrees
  • Value: angle
angleOutput
-1.0-45.0
0.00.0
1.045.0

See details.


Arctan2

Supported in: Batch, Streaming

Returns the angle θ between the ray from the origin to the point (x, y) and the positive x-axis, confined to −π<θ<=π.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Angle unit: degrees
  • X: x
  • Y: y
yxOutput
0.00.00.0
1.00.090.0
0.0-1.0180.0
-1.00.0-90.0

See details.


Area

Supported in: Batch, Streaming

Calculates area of a geometry in meters squared using a spherical approximation of the globe. For a line string or a point, this equals 0.

Expression categories: Geospatial

Output type: Double

See details.


Array add

Supported in: Batch, Streaming

Adds a value to the array at a specified index.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array<T>

Example

Argument values:

  • Array: numbers
  • Index: 1
  • Value: 1
numbersOutput
[ 3, 5 ][ 1, 3, 5 ]
[ 2 ][ 1, 2 ]
[ ][ 1 ]

See details.


Array cartesian product

Supported in: Batch, Streaming

Compute the cartesian product of arrays.

Expression categories: Array

Output type: Array<Struct>

Example

Argument values:

  • Expression: [first, second]
firstsecondOutput
[ 1, 2 ][ 3, 4 ][ {
first: 1,
second: 3,
}, {
first: 1,
 *second...

See details.


Array concat

Supported in: Batch, Streaming

Concatenates the provided arrays into a single array, without de-duplication.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array<T>

Example

Argument values:

  • Expressions: [[ 1, 2, 3 ], [ 4, 5 ]]

Output: [ 1, 2, 3, 4, 5 ]

See details.


Array contains

Supported in: Batch, Streaming

Returns true if the array contains the value.

Expression categories: Array, Boolean

Output type: Boolean

Example

Argument values:

  • Array: part_ids
  • Value: BRR-123
part_idsOutput
[ AWE-112, BRR-123 ]true
[ AWE-222, ABC-543 ]false

See details.


Array contains null

Supported in: Batch, Streaming

Returns true if the array contains null.

Expression categories: Array, Boolean

Output type: Boolean

Example

Argument values:

  • Expression: part_ids
part_idsOutput
[ AWE-112, BRR-123, null ]true
[ AWE-222, ABC-543 ]false

See details.


Array difference

Supported in: Batch, Streaming

Returns all unique elements in the left array that are not in the right array.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array<T>

Example

Argument values:

  • Left array: [ 1, 2, 3 ]
  • Right array: [ 2, 3, 4 ]

Output: [ 1 ]

See details.


Array distinct

Supported in: Batch, Streaming

Removes duplicates and returns distinct values from the array.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array<T>

Example

Argument values:

  • Expression: [ 1, 1, 2, 3 ]

Output: [ 1, 2, 3 ]

See details.


Array element

Supported in: Batch, Streaming

Returns the element at a given position from the input array. Positions outside of the array will return null.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: T

Example

Argument values:

  • Array: [ 10, 11, 12 ]
  • Position: 1

Output: 10

See details.


Array elements are distinct

Supported in: Batch, Streaming

Returns true if the array's elements are distinct, false otherwise. If the array is null, the returned value is false.

Expression categories: Array, Boolean

Output type: Boolean

Example

Argument values:

  • Expression: part_ids
part_idsOutput
[ ABC-123, DCE-123, EFG-123 ]true
[ ABC-123, ABC-123, EFG-123 ]false

See details.


Array flatten

Supported in: Batch, Streaming

Creates a single array from an input nested array by unioning the elements within the first level of nesting.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array<T>

Example

Argument values:

  • Expression: array
arrayOutput
[ [ 1, 2, 3 ], [ 4, 5, 6 ] ][ 1, 2, 3, 4, 5, 6 ]

See details.


Array intersect

Supported in: Batch, Streaming

Removes duplicates and intersects a list of arrays.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array<T>

Example

Argument values:

  • Expressions: [[ 1, 2, 3 ], [ 3, 4 ]]

Output: [ 3 ]

See details.


Array maximum

Supported in: Batch, Streaming

Returns the maximum value of an array column.

Expression categories: Array

Type variable bounds: T accepts ComparableType

Output type: T

Example

Argument values:

  • Expression: [ 1, 2, 3 ]

Output: 3

See details.


Array minimum

Supported in: Batch, Streaming

Returns the minimum value of an array column.

Expression categories: Array

Type variable bounds: T accepts ComparableType

Output type: T

Example

Argument values:

  • Expression: [ 1, 2, 3 ]

Output: 1

See details.


Array position

Supported in: Batch, Streaming

Returns a position/index of the first occurrence of the 'value' in a given array. Returns null when value is not found or when any of the arguments are null.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Long

Example

Argument values:

  • Array: [ 10, 11, 12 ]
  • Value: 10

Output: 1

See details.


Array remove

Supported in: Batch, Streaming

Returns an array after removing all provided 'value' from the given array.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array<T>

Example

Argument values:

  • Array: [ 1, 2, 3 ]
  • Value: 1

Output: [ 2, 3 ]

See details.


Array repeat

Supported in: Batch, Streaming

Returns an array with the contents of array concatenated value times.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array<T>

Example

Argument values:

  • Array: [ 1, 2 ]
  • Value: 2

Output: [ 1, 2, 1, 2 ]

See details.


Array reverse

Supported in: Batch, Streaming

Reverse the order of elements in 'array'.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array<T>

Example

Argument values:

  • Expression: [ 1, 2, 3 ]

Output: [ 3, 2, 1 ]

See details.


Array sort

Supported in: Batch, Streaming

Returns a sorted array of the given input array. All null values are placed at the end of a descending array and at the front of an ascending array.

Expression categories: Array

Type variable bounds: T accepts ComparableType

Output type: Array<T>

Example

Argument values:

  • Direction: ASCENDING
  • Expression: [ 5, 3, 6 ]

Output: [ 3, 5, 6 ]

See details.


Array sort by struct key

Supported in: Batch, Streaming

Returns a sorted array of the given input array of structs sorted by the values of the given struct keys.

Expression categories: Array

Output type: Array<Struct>

Example

Argument values:

  • Input array: [ {
    age: 20,
    }, {
    age: 10,
    }, {
    age: 30,
    } ]
  • Sort keys: [(age, ASCENDING)]

Output: [ {
age: 10,
}, {
age: 20,
}, {
age: 30,
} ]

See details.


Array union

Supported in: Batch, Streaming

Removes duplicates and unions a list of arrays.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array<T>

Example

Argument values:

  • Expressions: [[ 1, 2, 3 ], [ 3, 4 ]]

Output: [ 1, 2, 3, 4 ]

See details.


Arrays have intersection

Supported in: Batch, Streaming

Checks if given arrays have at least one shared element.

Expression categories: Array, Boolean

Type variable bounds: T accepts AnyType

Output type: Boolean

Example

Argument values:

  • Expressions: [[ 1, 2, 3 ], [ 3, 4 ]]

Output: true

See details.


Arrays zip

Supported in: Batch, Streaming

Zips a list of given arrays into a merged array of structs in which the n-th struct contains all n-th values of input arrays.

Expression categories: Array

Output type: Array<Struct>

Example

Argument values:

  • Expressions: [first_array, second_array]
first_arraysecond_arrayOutput
[ 1, 2, 3 ][ 4, 5, 6 ][ {
first_array: 1,
second_array: 4,
}, {
first_array: 2,<...

See details.


Base 64 decode to string

Supported in: Batch, Streaming

Base64 decode the given expression. Uses utf-8 encoding for binary.

Expression categories: Binary, Cast, String

Output type: String

Example

Argument values:

  • Expression: encoded
encodedOutput
Zm9vfoo
YmFybar

See details.


Base64 decode

Supported in: Batch, Streaming

Base64 decode the given expression.

Expression categories: Binary, Cast

Output type: Binary

Example

Argument values:

  • Expression: city_base64
city_base64Output
TG9uZG9uTG9uZG9u
Q29wZW5oYWdlbg==Q29wZW5oYWdlbg==
TmV3IFlvcms=TmV3IFlvcms=

See details.


Base64 encode

Supported in: Batch, Streaming

Base64 encode the given expression.

Expression categories: Binary, Cast

Output type: String

Example

Argument values:

  • Expression: city
cityOutput
LondonTG9uZG9u
CopenhagenQ29wZW5oYWdlbg==
New YorkTmV3IFlvcms=

See details.


Bit shift left

Supported in: Batch, Streaming

Shift the given value a number of bits left.

Expression categories: Binary

Type variable bounds: E accepts Byte | Integer | Long | Short

Output type: E

Example

Argument values:

  • Expression: 1
  • Number of bits: 1

Output: 2

See details.


Bit shift right

Supported in: Batch, Streaming

Shift the given value a number of bits right.

Expression categories: Binary

Type variable bounds: E accepts Byte | Integer | Long | Short

Output type: E

Example

Argument values:

  • Expression: 1
  • Number of bits: 1

Output: 0

See details.


Buffer H3 indices

Supported in: Batch, Streaming

Creates a buffer of distance k from an array of H3 indices.

Expression categories: Geospatial

Output type: Array<H3 Index>

See details.


Calculate destination point

Supported in: Batch, Streaming

Calculates the destination point along a specified path given a starting point, course, and distance.

Expression categories: Geospatial

Output type: GeoPoint

Example

Argument values:

  • Course: course
  • Distance: distance
  • Starting point: point_a
  • Calculation method.: GREAT_CIRCLE
point_acoursedistanceOutput
{
latitude: 48.8567,
longitude: 2.3508,
}
225.032000.0{
latitude: 48.65279552300661,
longitude: 2.0427666779658806,
}

See details.


Calculate haversine distance

Supported in: Batch, Streaming

Calculates the haversine distance between two latitude and longitude point pairs in meters.

Expression categories: Geospatial

Output type: Double

Example

Argument values:

  • Point a: point_a
  • Point b: point_b
point_apoint_bOutput
{
latitude: 41.507483,
longitude: -99.436554,
}
{
latitude: 38.504048,
longitude: -98.315949,
}
347328.82778977347
{
latitude: 22.308919,
longitude: 113.914603,
}
{
latitude: -33.946111,
longitude: 151.177222,
}
7393894.00134442

See details.


Case

Supported in: Batch, Streaming

Choose between different branches based on conditions.

Expression categories: Popular

Type variable bounds: T accepts AnyType

Output type: T

Example

Argument values:

  • Default: Yes
  • Branches: [(
    lessThan(
     left: miles,
     right: 15000,
    ), No)]
milesOutput
20053Yes
10210No
34120Yes

See details.


Cast

Supported in: Batch, Streaming

Cast expression to given type.

Expression categories: Cast, Popular

Type variable bounds: C accepts AnyType

Output type: C

Example

Description: Casting long to string Argument values:

  • Expression: 1234
  • Type: String

Output: 1234

See details.


Ceil

Supported in: Batch, Streaming

Returns ceil of a given fractional value.

Expression categories: Numeric

Output type: Decimal | Long

Example

Argument values:

  • Expression: 10.123

Output: 11

See details.


Change timestamp time zone

Supported in: Batch

Changes the time zone of a timestamp.

Expression categories: Datetime

Output type: Timestamp

Example

Argument values:

  • Output time zone: America/Chicago
  • Timestamp: 2020-04-28T05:09:00Z
  • Input time zone: US/Eastern

Output: 2020-04-28T04:09:00Z

See details.


Character-wise translate string

Supported in: Batch, Streaming

Replaces individual characters from the input column that are found in the matching with the corresponding character in the replacement string. If the matching string is longer than the replacement string, characters at the end of the matching string will be dropped.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: translate
  • Matching string: rnlt
  • Replacement string: 123

Output: 1a2s3ae

See details.


Chunk string

Supported in: Batch, Streaming

Chunk string into chunks of a specified size and on specified separators.

Expression categories: String

Output type: Array<String>

Example

Argument values:

  • Expression: string
  • Chunk overlap: null
  • Chunk size: 10
  • Keep separator: null
  • Separators: null
stringOutput
hello[ hello ]
hello world. the quick brown fox jumps over the fence.[ hello, world., the quick, brown fox, jumps, over the, fence. ]
hello world.
the quick brown fox
jumps over the fence.
[ hello, world., the quick, brown fox, jumps, over the, fence. ]
hello world.
the quick brown fox
jumps over the fence.
[ hello, world., the quick, brown fox, jumps, over the, fence. ]

See details.


Cipher decrypt

Supported in: Batch, Streaming

Decrypts expression with cipher.

Expression categories: Other

Output type: String

Example

Argument values:

  • Cipher license rid: ri.bellaso.main.cipher-license.1-decrypt
  • Expression: string
stringOutput
CIPHER::ri.bellaso.main.cipher-channel.1::OCRBIW3iHDltOGa6MEHwb7f/Dw==::CIPHERbar

See details.


Cipher encrypt

Supported in: Batch, Streaming

Encrypts expression with cipher.

Expression categories: Other

Output type: Cipher Text

Example

Argument values:

  • Cipher license rid: ri.bellaso.main.cipher-license.1-encrypt
  • Expression: string
stringOutput
barCIPHER::ri.bellaso.main.cipher-channel.1::OCRBIW3iHDltOGa6MEHwb7f/Dw==::CIPHER

See details.


Cipher hash

Supported in: Batch, Streaming

Hashes expression with cipher.

Expression categories: Other

Output type: Cipher Text

Example

Argument values:

  • Cipher license rid: ri.bellaso.main.cipher-license.1-hash
  • Expression: string
stringOutput
barCIPHER::ri.bellaso.main.cipher-channel.1::c70a14f5cc57c940e3265045a5554d641bd549ee27a571a05cdbc75c77762eb86b1144c12f1bb7811a0bcec08b2f143989c44022e4664f615d6885ad640332cb::CIPHER

See details.


Clean string

Supported in: Batch, Streaming

Applies the set of clean actions on the expression.

Expression categories: Data preparation, String

Output type: String

Example

Argument values:

  • Clean actions: {trim}
  • Expression: hello world

Output: hello world

See details.


Compact a set of H3 indices

Supported in: Batch, Streaming

Compact H3 indices into a subset of mixed resolutions if possible. Running the inverse operation uncompact is guaranteed to yield the same set of indices that were compacted if the input indices were all the same resolution. If any of the input indices are invalid this transform will return null. Output indices are sorted in ascending order.

Expression categories: Geospatial

Output type: Array<H3 Index>

Example

Argument values:

  • H3 indices: h3_set
h3_setOutput
[ 87754a914ffffff, 87754a916ffffff, 87754a930ffffff, 87754a932ffffff, 87754a933ffffff, 87754a934ffff...[ 86754e64fffffff, 87754a914ffffff, 87754a916ffffff, 87754a930ffffff, 87754a932ffffff, 87754a933ffff...

See details.


Concatenate strings

Supported in: Batch, Streaming

Concatenates a list of strings with the specified separator.

Expression categories: String

Output type: String

Example

Argument values:

  • Expressions: [hello, world]
  • Null output if any input is null: null
  • Separator: _

Output: hello_world

See details.


Construct delegated media Gotham identifier (GID)

Supported in: Batch, Streaming

Expression to construct a valid delegated media Gotham identifier (GID) from components. If result is more than 1024 characters, produces a null row.

Expression categories: Other

Output type: Delegated media Gotham identifier (GID)

Example

Argument values:

  • Media locator: locator
  • Media type: mediaType
  • Producer instance: invalidUuid
mediaTypelocatorOutput
testaudiotypeempty stringnull

See details.


Convert DMS to GeoPoint

Supported in: Batch, Streaming

Converts a geospatial coordinate string in degrees, minutes, seconds (DMS) format to a GeoPoint in accordance to user-provided formats. The default formats are DDD*°MM*'SS*"H and DDD*MMSSssH. The formats are run in order, and the first matching format will be returned. See formatting guide on how to write user-generated formats.

Expression categories: Geospatial

Output type: GeoPoint

Example

Argument values:

  • Coordinates: coordinates
  • Formats: null
coordinatesOutput
078261594N075220923E{
latitude: 78.43776111111112,
longitude: 75.36923055555555,
}
046115095S069524119W{
latitude: -46.19748611111111,
longitude: -69.87810833333333,
}
023°45'55"N 069°52'11"W{
latitude: 23.76527777777777,
longitude: -69.86972222222222,
}
-123°55'55"N 069°53'00"W{
latitude: -123.93194444444445,
longitude: -69.88333333333334,
}
123456789N23456789E{
latitude: 123.76885833333333,
longitude: 23.768858333333334,
}

See details.


Convert GeoPoint to DMS

Supported in: Batch, Streaming

Converts a GeoPoint to a geospatial coordinate string in degrees, minutes, seconds (DMS) format in accordance with a user-chosen format. Possible formats are DDD°MM'SS"H and DDDMMSSssH.

Expression categories: Geospatial

Output type: String

See details.


Convert GeoPoint to Geohash

Supported in: Batch, Streaming

Converts a GeoPoint to a base32-encoded Geohash with specified precision that contains the GeoPoint. For more information on Geohash, see: https://en.wikipedia.org/wiki/Geohash .

Expression categories: Geospatial

Output type: Geohash

See details.


Convert GeoPoint to MGRS

Supported in: Batch, Streaming

Converts a GeoPoint following the WGS84 coordinate system (which is EPSG:4326) to a MGRS (military grid reference system) coordinate. The output MGRS will follow a space-delimited format with 5 digits of precision.

Expression categories: Geospatial

Output type: MGRS

Example

Argument values:

  • Expression: geoPoint
geoPointOutput
{
 latitude -> 88.99999659707431,
 longitude -> 0.9996456505181999,
}
Z AF 01937 88990

See details.


Convert GeoPoint to geometry

Supported in: Batch, Streaming

Convert GeoPoint to a GeoJSON of type point.

Expression categories: Geospatial

Output type: Geometry

See details.


Convert MGRS to GeoPoint

Supported in: Batch, Streaming

Converts a MGRS (military grid reference system) coordinate into a GeoPoint following the WGS84 coordinate system (which is EPSG:4326).

Expression categories: Geospatial

Output type: GeoPoint

Example

Argument values:

  • Expression: mgrs
mgrsOutput
ZAF0193788990{
latitude: 88.99999659707431,
longitude: 0.9996456505181999,
}

See details.


Convert a string to date

Supported in: Batch, Streaming

Returns the date given a formatted string in accordance to the Java DateTimeFormatter. The default formats are yyyy-MM-dd and yyyy-MM-dd'T'HH:mm:ss.SSSXXX. The formats are run in order, the first matching format will be returned.

Expression categories: Cast, Datetime

Output type: Date

Example

Description: Date formats are optional Argument values:

  • String: 2020-04-28
  • Formats: null

Output: 2020-04-28

See details.


Convert a string to timestamp

Supported in: Batch, Streaming

Returns the timestamp given a formatted string in accordance to the Java DateTimeFormatter. The default formats are yyyy-MM-dd'T'HH:mm:ss.SSSXXX and yyyy-MM-dd. The formats are run in order, the first matching format will be returned.

Expression categories: Cast, Datetime

Output type: Timestamp

Example

Argument values:

  • String: timestamp
  • Formats: [dd-yyyy-MM HH:mm, yyyy-MM-dd]
  • Time zone: null
timestampOutput
28-2020-04 10:09:002020-04-28T10:09:00Z
2020-04-282020-04-28T00:00:00Z

See details.


Convert base

Supported in: Batch, Streaming

Convert a number (or it string representation) from one base to another.

Expression categories: Binary, Cast, Numeric

Output type: String

Example

Argument values:

  • Expression: 4A801
  • From base: 16
  • To base: 10

Output: 305153

See details.


Convert between angle units

Supported in: Batch, Streaming

Expression categories: Geospatial, Numeric

Output type: Double

See details.


Convert between distance units

Supported in: Batch, Streaming

Expression categories: Numeric

Output type: Double

See details.


Convert between time units

Supported in: Batch, Streaming

Expression categories: Datetime

Output type: Double

See details.


Convert between weight units

Supported in: Batch, Streaming

Expression categories: Numeric

Output type: Double

See details.


Convert data to JSON

Supported in: Batch, Streaming

Transforms input into json string.

Expression categories: File, String

Output type: String

Example

Argument values:

  • Input: struct
structOutput
{
airline: {
id: NA,
},
}
{"airline":{"id":"NA"}}

See details.


Convert from Ontology GeoPoint

Supported in: Batch, Streaming

Convert an Ontology GeoPoint into a regular GeoPoint. Ontology GeoPoints are strings of the format '{lat},{lon}', where -90 <= lat <= 90 and -180 <= lon <= 180. Regular GeoPoints are structures of the format {"longitude": {long},"latitude": {lat}}.

Expression categories: Geospatial

Output type: GeoPoint

Example

Argument values:

  • Expression: geopoint
geopointOutput
-20.0000000,80.0000000{
latitude: -20.0,
longitude: 80.0,
}
38.9031000,-77.0599000{
latitude: 38.9031,
longitude: -77.0599,
}
41.9876543,-99.1234568{
latitude: 41.9876543,
longitude: -99.1234568,
}

See details.


Convert from hexadecimal

Supported in: Batch

Inverse of hex. Interprets each pair of characters as a hexadecimal number and converts to the byte representation of the number.

Expression categories: Numeric, String

Output type: Binary

Example

Argument values:

  • Expression: string_hex
string_hexOutput
68656C6C6FaGVsbG8=
3039MDk=
FFFFFFFFFFFFCFC7////////z8c=
4C6F6E646F6ETG9uZG9u

See details.


Convert from hexadecimal to string

Supported in: Batch, Streaming

Inverse of hex, interprets each pair of characters as a hexadecimal number and converts to the utf-8 string of the byte representation of the number.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: string_hex
string_hexOutput
68656C6C6Fhello
4C6F6E646F6ELondon

See details.


Convert geocentric coordinates to WGS 84 geodesic coordinates

Supported in: Batch, Streaming

Converts geocentric cartesian coordinates (also known as Earth-centered, Earth-fixed or ECEF coordinates) to geodesic polar coordinates. Altitude is defined as height-above-ellipsoid. If any coordinates are null, the output will be null.

Expression categories: Geospatial

Output type: GeoPoint with altitude

Example

Argument values:

  • X coordinate: x_coordinate
  • Y coordinate: y_coordinate
  • Z coordinate: z_coordinate
x_coordinatey_coordinatez_coordinateOutput
0.06378137.00.0{
 altitude -> 0.0,
 geoPoint -> {
 latitude -> 0.0,
 longitude -> 90.0,
},
}
0.0-6378137.00.0{
 altitude -> 0.0,
 geoPoint -> {
 latitude -> 0.0,
 longitude -> -90.0,
},
}
-6378137.00.00.0{
 altitude -> 0.0,
 geoPoint -> {
 latitude -> 0.0,
 longitude -> 180.0,
},
}
-6378137.0-0.00.0{
 altitude -> 0.0,
 geoPoint -> {
 latitude -> 0.0,
 longitude -> -180.0,
},
}
0.00.06356752.314245179{
 altitude -> 0.0,
 geoPoint -> {
 latitude -> 90.0,
 longitude -> 0.0,
},
}
0.00.0-6356752.314245179{
 altitude -> 0.0,
 geoPoint -> {
 latitude -> -90.0,
 longitude -> 0.0,
},
}

See details.


Convert legacy OffsetDateTime

Supported in: Batch

Converts a legacy OffsetDateTime column to a timestamp that can be used in all Foundry pipelines. The timestamp is returned in UTC.

Expression categories: Datetime

Output type: Timestamp

See details.


Convert linestring to polygon

Supported in: Batch, Streaming

Convert a linestring geometry to a polygon geometry. This expression assumes the linestring geometry is closed. If not, the expression will return null.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Expression: polygon_points
polygon_pointsOutput
{"type":"LineString","coordinates":[[-77.49,38.01],[-77.47,38.15],[-77.19,38.14],[-77.49,38.01]]}{"type":"Polygon","coordinates":[[[-77.49,38.01],[-77.47,38.15],[-77.19,38.14],[-77.49,38.01]]]}

See details.


Convert timestamp from UTC

Supported in: Batch, Streaming

Converts a timestamp from UTC to a given time zone.

Expression categories: Datetime

Output type: Timestamp

Example

Argument values:

  • Time zone: EST
  • Timestamp: 2020-04-28T10:09:00Z

Output: 2020-04-28T05:09:00Z

See details.


Convert timestamp to UTC

Supported in: Batch, Streaming

Converts a timestamp to UTC based on a given time zone.

Expression categories: Datetime

Output type: Timestamp

Example

Argument values:

  • Time zone: EST
  • Timestamp: 2020-04-28T10:09:00Z

Output: 2020-04-28T15:09:00Z

See details.


Convert to Ontology GeoPoint

Supported in: Batch, Streaming

Convert a GeoPoint into a string that the Ontology will accept for a geo-indexed column (a geohash type column). Ontology GeoPoints are strings of the format '{lat},{lon}', where -90 <= lat <= 90 and -180 <= lon <= 180.

Expression categories: Geospatial

Output type: Ontology GeoPoint

Example

Argument values:

  • Expression: point
pointOutput
{
latitude: -20.0,
longitude: 80.0,
}
-20.0000000,80.0000000
{
latitude: 38.9031,
longitude: -77.0599,
}
38.9031000,-77.0599000
{
latitude: 41.987654321,
longitude: -99.123456789,
}
41.9876543,-99.1234568
nullnull

See details.


Convert to hexadecimal

Supported in: Batch, Streaming

Computes hex value of given expression.

Expression categories: Numeric, String

Output type: String

Example

Argument values:

  • Expression: city_hex
city_hexOutput
TG9uZG9u4C6F6E646F6E

See details.


Convert to octal

Supported in: Batch, Streaming

Computes octal value of given expression.

Expression categories: Numeric

Output type: String

Example

Argument values:

  • Expression: 12345

Output: 30071

See details.


Cosine

Supported in: Batch, Streaming

Takes the cosine of an angle.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Angle unit: degrees
  • Angle value: angle
angleOutput
0.01.0
90.00.0
180.0-1.0

See details.


Create GeoPoint

Supported in: Batch, Streaming

Creates a GeoPoint column from a latitude and longitude column. Validates that the latitude parameter is between -90 and 90, inclusive, and that the longitude parameter is between -180 and 180, inclusive; if not, returns a null value.

Expression categories: Geospatial

Output type: GeoPoint

See details.


Create GeoPoint from coordinate system

Supported in: Batch, Streaming

Takes a pair of coordinates from a source coordinate system and transforms them into WGS 84 latitude/longitude values. Coordinate systems (also know as coordinate reference systems or spatial reference systems) represent different systems for identifying the location of a point on the globe and are often identified by key in standardized databases such as EPSG. If the given projection is not supported or either coordinate is null, returns null. This expression is for advanced users. It is recommended to use the "Create GeoPoint" expression if you do not need to deal with coordinate systems.

Expression categories: Geospatial

Output type: GeoPoint

Example

Argument values:

  • Source coordinate system: EPSG:32618
  • X coordinate: x_coordinate
  • Y coordinate: y_coordinate
x_coordinatey_coordinateOutput
322190.22339529654306505.703879281{
 latitude -> 38.88944258,
 longitude -> -77.05014581,
}
323243.13615360594318298.06539618{
 latitude -> 38.99585379643137,
 longitude -> -77.04105678275415,
}
407063.634653000164764873.719585404{
 latitude -> 43.03086518778498,
 longitude -> -76.14077251822197,
}

See details.


Create an empty array

Supported in: Batch, Streaming

Returns an empty array of the given type.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array<T>

Example

Argument values:

  • Type: String

Output: [ ]

See details.


Create array

Supported in: Batch, Streaming

Creates an array from the columns provided.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array<T>

Example

Argument values:

  • Expressions: [1, 2, 3]

Output: [ 1, 2, 3 ]

See details.


Create ellipse geometry

Supported in: Batch, Streaming

Approximates an ellipse as a polygon centered at the given geo coordinate. The distance between points is computed along the surface of the WGS84 ellipsoid approximating the surface of the earth.

Expression categories: Geospatial

Output type: Geometry

See details.


Create geodesic line string

Supported in: Batch, Streaming

Creates a geodesic line between two points.

Expression categories: Geospatial

Output type: Geometry

See details.


Create linestring geometry

Supported in: Batch, Streaming

Creates a GeoJSON linestring geometry from the given points.

Expression categories: Geospatial

Type variable bounds: T accepts Struct<longitude, latitude>

Output type: Geometry

Example

Argument values:

  • Points: points
pointsOutput
[ {
latitude: 10.0,
longitude: 0.0,
}, {
latitude: 10.0,
longitude: 10.0,
} ]
{"type":"LineString","coordinates":[[0.0,10.0],[10.0,10.0]]}
[ {
latitude: 10.0,
longitude: 10.0,
}, {
latitude: 20.0,<...
{"type":"LineString","coordinates":[[10.0,10.0],[20.0,20.0],[30.0,30.0]]}
[ {
latitude: 0.0,
longitude: 179.0,
}, {
latitude: 0.0,
longitude: 181.0,
} ]
{"type":"MultiLineString","coordinates":[[[179.0,0.0],[180.0,0.0]],[[-180.0,0.0],[-179.0,0.0]]]}
[ {
latitude: 0.0,
longitude: -179.0,
}, {
latitude: 0.0,
longitude: -181.0,
} ]
{"type":"MultiLineString","coordinates":[[[180.0,0.0],[179.0,0.0]],[[-179.0,0.0],[-180.0,0.0]]]}

See details.


Create map from arrays

Supported in: Batch, Streaming

Returns a map using key-value pairs from the zipped arrays. Null values are not allowed as keys and will cause a runtime error.

Expression categories: Array, Map

Type variable bounds: K accepts AnyType**V accepts AnyType

Output type: Map<K, V>

Example

Argument values:

  • Array of keys: [ 1, 2, 3 ]
  • Array of values: [ 4, 5, 6 ]

Output: {
 1 -> 4,
 2 -> 5,
 3 -> 6,
}

See details.


Create null value

Supported in: Batch, Streaming

Returns a null value of the given type.

Expression categories: Data preparation

Type variable bounds: T accepts AnyType

Output type: T

Example

Argument values:

  • Type: String

Output: null

See details.


Create range fan geometry

Supported in: Batch, Streaming

Approximates a range fan as a polygon, specifying the region of all points whose haversine distance to the origin point is between the minimum and maximum radii, and to which the bearing from the origin is contained with the angular range centered around the specified bearing parameter. The left and right sides of the range fan are drawn as geodesic lines computed along the surface of the WGS84 ellipsoid approximating the surface of the earth. Returns null if the range spans more than 180 degrees while also crossing the anti-meridian, or if the maximum radius spans more than half of the circumference of the earth.

Expression categories: Geospatial

Output type: Geometry

See details.


Create struct column

Supported in: Batch, Streaming

Combines multiple columns into a single structured column.

Expression categories: Struct

Output type: Struct

Example

Argument values:

  • Struct elements: [tail_number, id]
tail_numberidOutput
MT-1121{
id: 1,
tail_number: MT-112,
}
XB-1232{
id: 2,
tail_number: XB-123,
}
PA-6543{
id: 3,
tail_number: PA-654,
}

See details.


Create time series reference values

Supported in: Batch, Streaming

Creates time series reference values.

Expression categories: String

Output type: String

Example

Argument values:

  • Series identifier: seriesId
  • Time series sync RID: ri.time-series-catalog.main.sync.11111111
seriesIdOutput
seriesOne{"seriesId":"seriesOne","syncRid":"ri.time-series-catalog.main.sync.11111111"}

See details.


Current date

Supported in: Batch, Streaming

Returns the current date of when computation started.

Expression categories: Datetime

Output type: Date

See details.


Current timestamp

Supported in: Batch, Streaming

Returns the current timestamp when computation started.

Expression categories: Datetime

Output type: Timestamp

See details.


Date sequence

Supported in: Batch

Creates an array with dates in range from start to end.

Expression categories: Datetime

Output type: Array<Date>

Example

Argument values:

  • End date: last_planned_flight
  • Start date: first_planned_flight
  • Step unit: DAYS
  • Step size: null
first_planned_flightlast_planned_flightOutput
2023-01-012023-01-03[ 2023-01-01, 2023-01-02, 2023-01-03 ]
2023-01-312023-02-02[ 2023-01-31, 2023-02-01, 2023-02-02 ]
2023-02-282023-03-01[ 2023-02-28, 2023-03-01 ]

See details.


Decode

Supported in: Batch, Streaming

Decode the given expression using the specified charset.

Expression categories: Binary, Cast

Output type: String

Example

Argument values:

  • Charset: UTF_16
  • Expression: city
cityOutput
/v8ATABvAG4AZABvAG4=London
/v8AQwBvAHAAZQBuAGgAYQBnAGUAbg==Copenhagen
/v8ATgBlAHcAIABZAG8AcgBrNew York

See details.


Decode Geobuf as GeoJSON

Supported in: Batch, Streaming

Decode Geobuf geometry as GeoJSON.

Expression categories: Geospatial

Output type: Geometry

See details.


Divide numbers

Supported in: Batch, Streaming

Divide one number by another number.

Expression categories: Numeric

Output type: Decimal | Double

Example

Argument values:

  • Left: col_a
  • Right: col_b
col_acol_bOutput
422.0
1125.5

See details.


Edit distance

Supported in: Batch, Streaming

Compute the edit distance between two strings. Supports Levenshtein, indel, and Damerau-Levenshtein distance.

Expression categories: Distance measurement, String

Output type: Double | Integer

Example

Description: String edit distance calculated using Levenshtein distance Argument values:

  • Distance function: levenshtein
  • Ignore case: false
  • Left: left
  • Right: right
  • Normalize distance: false
leftrightOutput
hellohello0
hallohello1
hlelohello2
hellohEllO2
hellohello, world!8
hellofarewell6

See details.


Encode GeoJSON as Geobuf

Supported in: Batch, Streaming

Encodes GeoJSON geometry as Geobuf.

Expression categories: Geospatial

Output type: Geobuf

See details.


Ends with

Supported in: Batch, Streaming

Expression categories: Boolean, String

Output type: Boolean

Example

Argument values:

  • Expression: Hello World
  • Ignore case: true
  • Value: world

Output: true

See details.


Epoch milliseconds to date

Supported in: Batch, Streaming

Converts from epoch milliseconds to date, UTC.

Expression categories: Cast, Datetime

Output type: Date

Example

Description: You can convert epoch timestamps in milliseconds to the date type Argument values:

  • Expression: 1673964111000

Output: 2023-01-17

See details.


Epoch milliseconds to timestamp

Supported in: Batch, Streaming

Converts from epoch milliseconds to timestamp in UTC.

Expression categories: Cast, Datetime

Output type: Timestamp

Example

Description: You can convert epoch timestamps in milliseconds to the timestamp type Argument values:

  • Expression: 1673964111000

Output: 2023-01-17T14:01:51Z

See details.


Epoch seconds to date

Supported in: Batch, Streaming

Converts from epoch seconds to date in UTC.

Expression categories: Cast, Datetime

Output type: Date

Example

Description: You can convert epoch timestamps to the date type Argument values:

  • Expression: 1673964111

Output: 2023-01-17

See details.


Epoch seconds to timestamp

Supported in: Batch, Streaming

Converts from epoch seconds to timestamp in UTC.

Expression categories: Cast, Datetime

Output type: Timestamp

Example

Description: You can convert epoch timestamps to the timestamp type Argument values:

  • Expression: 1673964111

Output: 2023-01-17T14:01:51Z

See details.


Equals

Supported in: Batch, Streaming

Returns true if left and right are equal.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Left: a
  • Right: b
abOutput
11true
10false

See details.


Exponential

Supported in: Batch, Streaming

Calculates the exponential, e^x, of a column.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Expression: 2.0

Output: 7.38905609893

See details.


Extract all regex matches

Supported in: Batch, Streaming

Extract all instances of a regex match into an array.

Expression categories: Regex, String

Output type: Array<String>

Example

Description: Extract the first two initials from each code. Argument values:

  • Expression: MT-112, XB-967
  • Group: 1
  • Pattern: (\w\w)(-)

Output: [ MT, XB ]

See details.


Extract content from spreadsheets in JSON

Supported in: Batch

Extract content from all sheets a spreadsheet in JSON format.

Expression categories: Media

Output type: Map<String, Struct>

See details.


Extract date part

Supported in: Batch, Streaming

Extracts a part of a date like year or day of week.

Expression categories: Datetime

Output type: Integer

See details.


Extract document metadata

Supported in: Batch

Extracts metadata fields from a document.

Expression categories: Media

Output type: Struct

Example

Argument values:

  • Media reference: Media Reference
  • Metadata to include: [Document Author, Page Count, Document Title]
Media ReferenceOutput
{"mimeType":"application/pdf","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}}{
author: Jane Doe,
page_count: 23,
title: Document Title,
}

See details.


Extract imagery metadata

Supported in: Batch, Streaming

Extracts metadata fields from an image.

Expression categories: Media

Output type: Struct

Example

Argument values:

  • Media reference: Media Reference
  • Metadata to include: [Attributes, Bands, Bytes, Dimensions, Format, Geographic Metadata, ICC Profile, EXIF Image Location]
Media ReferenceOutput
{"mimeType":"image/tiff","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}}{
attributes: {
 outer_key1 -> {
 inner_key1 -> inner_value1,
},
...

See details.


Extract layout-aware content from PDF

Supported in: Batch

Extracts content from the specified document, while preserving the document's layout.

Expression categories: Media

Output type: Array<Array<Struct<block_index, block_id, page, block_type, content, bounding_box, languages<String>, confidence>>> | Array<String>

See details.


Extract layout-aware content from images

Supported in: Batch

Extracts content from images, while preserving the original layout.

Expression categories: Media

Output type: Array<Struct<block_index, block_id, block_type, content, bounding_box, languages<String>, confidence>> | String

See details.


Extract map keys

Supported in: Batch, Streaming

Return map keys as an array. Note the order of array elements is not deterministic.

Expression categories: Map

Type variable bounds: K accepts AnyType

Output type: Array<K>

Example

Argument values:

  • Map: flight_number
flight_numberOutput
{
 MT-111 -> 2,
 XB-134 -> 1,
}
[ XB-134, MT-111 ]

See details.


Extract map values

Supported in: Batch, Streaming

Return map values as an array. Note the order of array elements is not deterministic.

Expression categories: Map

Type variable bounds: V accepts AnyType

Output type: Array<V>

Example

Argument values:

  • Map: flight_number
flight_numberOutput
{
 MT-111 -> 2,
 XB-134 -> 1,
}
[ 1, 2 ]

See details.


Extract offset from legacy OffsetDateTime

Supported in: Batch

Extracts the offset from a legacy OffsetDateTime column. This is the offset in seconds of the origin timezone of the timestamp from UTC timezone.

Expression categories: Datetime

Output type: Integer

Example

Argument values:

  • Expression: col_a
col_aOutput
{
offset: 0,
timestamp: 2024-09-09T09:00:00.001Z,
}
0
{
offset: 19800,
timestamp: 2024-09-09T09:00:00.001Z,
}
19800
{
offset: -3600,
timestamp: 2024-09-09T09:00:00.001Z,
}
-3600

See details.


Extract table of contents from PDF

Supported in: Batch

Produces a table of contents from a PDF based on the headings used within the document.

Expression categories: Media

Output type: Array<Struct<level, title, page>>

Example

Argument values:

  • Media reference: Media Reference
Media ReferenceOutput
{"mimeType":"application/pdf","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}}[ {
level: 0,
page: 2,
title: Chapter 1,
}, {
 **l...

See details.


Extract text from PDF

Supported in: Batch

Extracts raw text from the pages in a PDF.

Expression categories: Media

Output type: Array<String>

Example

Argument values:

  • Media reference: Media Reference
  • End page: End Page
  • Start page: Start Page
Media ReferenceStart PageEnd PageOutput
{"mimeType":"application/pdf","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}}12[ first page, second page ]

See details.


Extract text from PDF (using OCR)

Supported in: Batch

Extracts text from the pages in a PDF file using optical character recognition (OCR).

Expression categories: Media

Output type: Array<String>

See details.


Extract text from images (using OCR)

Supported in: Batch

Extracts text from an image using optical character recognition (OCR).

Expression categories: Media

Output type: String

See details.


Extract timestamp part

Supported in: Batch, Streaming

Extracts a part of a timestamp like year or day of week.

Expression categories: Datetime

Output type: Integer

See details.


Filter array elements

Supported in: Batch, Streaming

Filters an array based on the filter expression. Note, array index starts at 1.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array<T>

Example

Argument values:

  • Array: array
  • Expression to filter:
    isNotNull(
     expression: element,
    )
arrayOutput
[ 2, 5, null, 11 ][ 2, 5, 11 ]

See details.


Filter by geometry type

Supported in: Batch, Streaming

Nulls any values in the geometry column that are not of the provided geometry types.

Expression categories: Geospatial

Output type: Geometry

See details.


First non null value (coalesce)

Supported in: Batch, Streaming

Picks first non null value of the inputs. Known as coalesce in sql.

Expression categories: Data preparation

Type variable bounds: T accepts AnyType

Output type: T

Example

Argument values:

  • Expressions: [tail_number, airline]
  • Treat empty strings as null.: null
tail_numberairlineOutput
XB-123nullXB-123
nullMTMT

See details.


Floor

Supported in: Batch, Streaming

Returns floor of a given fractional value.

Expression categories: Numeric

Output type: Decimal | Long

Example

Argument values:

  • Expression: 10.123

Output: 10

See details.


Format date as string

Supported in: Batch, Streaming

Returns the date as formatted string in accordance to the Java DateTimeFormatter. The default format is ISO8601.

Expression categories: Cast, String

Output type: String

Example

Argument values:

  • Date: 2022-12-20
  • Format: yy-MM-dd

Output: 22-12-20

See details.


Format number

Supported in: Batch, Streaming

Formats a number to a specific number of decimal places.

Expression categories: Cast, Numeric, String

Output type: String

Example

Description: Formats a number to 2 decimal places. Argument values:

  • Decimal places: 2
  • Number: 1234.5678

Output: 1,234.57

See details.


Format string

Supported in: Batch, Streaming

Formats string printf style.

Expression categories: String

Output type: String

Example

Argument values:

  • Format arguments: [argument1, argument2]
  • Format string: Hello %s, my name is %s
argument1argument2Output
AliceBobHello Alice, my name is Bob
JaneJohnHello Jane, my name is John

See details.


Format timestamp as string

Supported in: Batch, Streaming

Returns the timestamp as a formatted string (ISO8601 by default).

Expression categories: Cast, Datetime, String

Output type: String

Example

Argument values:

  • Timestamp: 2022-10-01T09:00:00Z
  • Format: yyyy-MM-dd
  • Time zone: null

Output: 2022-10-01

See details.


Geometries have intersection

Supported in: Batch, Streaming

Determines if two geometries intersect.

Expression categories: Geospatial

Output type: Boolean

Example

Argument values:

  • Geometry a: geometry_a
  • Geometry b: geometry_b
geometry_ageometry_bOutput
{"coordinates":[[[-112.94377956164206,34.81725414459382],[-112.94377956164206,30.006795384733323], [...{"coordinates":[[[-103.78627755867336,33.162750522563925],[-103.78627755867336,28.29724741894266],[-...true
{"coordinates":[[[0.3651446504365481,15.159518507965103],[0.3651446504365481,13.427462911044273],[3....{"coordinates":[[[5.656394524666183,13.405417496831944],[5.656394524666183,11.29869961209053],[8.551...false

See details.


Geometry 3d affine transformation

Supported in: Batch, Streaming

Applies a three dimensional affine transformation to the input geometry. This transformation occurs in the user-provided projected coordinate system, and the result is projected back to WGS84. Two dimensional geometries will have their z-coordinates set to 0 before the affine transformation is applied. The returned geometry is three dimensional and for each coordinate [x,y,z] represents the matrix multiplication [[x0, x1, x2, x-offset], [y0, y1, y2, y-offset], [z0, z1, z2, z-offset], [0, 0, 0, 1]] * [x, y, z, 1], where the first three ordinates of the result are returned.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Geometry column: geometry
  • Projected coordinate system: EPSG:4326
  • X offset: 0.0
  • X0: 0.0
  • X1: -1.0
  • X2: 0.0
  • Y offset: 0.0
  • Y0: 1.0
  • Y1: 0.0
  • Y2: 0.0
  • Z offset: 0.0
  • Z0: 0.0
  • Z1: 0.0
  • Z2: 0.0
geometryOutput
{"type":"Polygon","coordinates":[[[0.0, 0.0],[1.0, 0.0],[1.0, 1.0],[0.0, 1.0],[0.0, 0.0]]]}{"type":"Polygon","coordinates":[[[0.0, 0.0, 0.0],[0.0, 1.0, 0.0],[-1.0, 1.0, 0.0],[-1.0, 0.0, 0.0],[0.0, 0.0, 0.0]]]}

See details.


Geometry array (unary) union

Supported in: Batch, Streaming

Given an array of geometries, combine these into a single geometry, merging without overlap.

Expression categories: Geospatial

Type variable bounds: T accepts Geometry

Output type: T

Example

Argument values:

  • Expression: geometries
geometriesOutput
[ {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}, {"type":"Polygon","coordinates":[[[0.5,0.0],[1.5,0.0],[1.5,1.0],[0.5,1.0],[0.5,0.0]]]} ]{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[0.5,1.0],[1.0,1.0],[1.5,1.0],[1.5,0.0],[1.0,0.0],[0.5,0.0],[0.0,0.0]]]}
[ ]null
nullnull

See details.


Geometry array line dissolve

Supported in: Batch, Streaming

Given an array of geometries, combine these into a linear geometry. Dissolve simplifies an input set of line-strings by removing unnecessary nodes and concatenating line-strings that can be combined. Z-coordinates will be ignored for the purpose of the dissolve operation, but the vertices in the resultant geometry will have the same z-coordinate as the corresponding points in the input.

Expression categories: Geospatial

Type variable bounds: T accepts Geometry

Output type: T

Example

Argument values:

  • Expression: geometries
geometriesOutput
[ {"type":"LineString","coordinates":[[0,0],[0,1],[1,1]]}, {"type":"LineString","coordinates":[[1,1]...{"type":"MultiLineString","coordinates":[[[5.0, 5.0],[4.0, 4.0],[3.0, 3.0],[2.0, 2.0],[1.0, 1.0],[0.0, 1.0],[0.0, 0.0]],[[7.0, 7.0], [6.0, 7.0], [6.0, 6.0]]]}
[ {"type":"LineString","coordinates":[[0,0,1],[0,1,1],[1,1,1]]}, {"type":"LineString","coordinates":[[1,1,1],[2,2,2]]}, {"type":"LineString","coordinates":[[1,1,2],[2,2,2],[3,3,3]]} ]{"type":"LineString","coordinates":[[0.0, 0.0, 1.0],[0.0, 1.0, 1.0],[1.0, 1.0, 1.0],[2.0, 2.0, 2.0],[3.0, 3.0, 3.0]]}

See details.


Geometry buffer

Supported in: Batch, Streaming

Computes the buffer of a geometry for both positive and negative buffer distances. Returns an approximate representation of all points within a given distance of the this geometric object (or for negative buffers, all points minus those within the buffer distance of the boundary). Buffer drops any z coordinates, and zero/negative distance buffers of lines and points will return null.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Buffer distance: distance
  • Geometry column: geometry
  • Projected coordinate system: EPSG:32618
  • Buffer cap style: ROUND
  • Buffer join style: ROUND
  • Line segments per quadrant: 8
  • Single or double sided: DOUBLE_SIDED
geometrydistanceOutput
{"type":"Point","coordinates":[-77.07368071728229,38.83040844313318]}10.0{"type":"Polygon","coordinates":[[[-77.07356558299462, 38.83041048767274],[-77.07356728534256, 38.83...
{"type":"LineString","coordinates":[[-77.07368071728229,38.83040844313318, 1],[-77.0725293738795,38.83042888342659, 1]]}10.0{"type":"Polygon","coordinates":[[[-77.07253198637027, 38.83051894052714],[-77.07250947453703, 38.83...
{"type":"Polygon","coordinates":[[[-77.07368071728229,38.83040844313318, 1],[-77.0725293738795,38.83...10.0{"type":"Polygon","coordinates":[[[-77.07379585155829, 38.83040639848026],[-77.07382199292853, 38.83...

See details.


Geometry centroid

Supported in: Batch, Streaming

Return the centroid, or "center of mass", of the geometry using a spherical approximation of the globe. If the geometry is a collection of mixed dimensions, only the elements of the highest dimension will contribute to the centroid (e.g. in a collection of points, lines and polygons, points and lines are ignored). This operation will round to 32-bit floating point precision for coordinates in the geometry.

Expression categories: Geospatial

Output type: GeoPoint

See details.


Geometry contains

Supported in: Batch, Streaming

Determines if geometry a contains geometry b. Points or lines lying on the boundary of a polygon are not contained within another geometry.

Expression categories: Geospatial

Output type: Boolean

Example

Argument values:

  • Geometry a: geometry_a
  • Geometry b: geometry_b
geometry_ageometry_bOutput
{"coordinates":[[[-112.94377956164206,34.81725414459382],[-112.94377956164206,30.006795384733323], [...{"type":"Point","coordinates":[-100.0,32.0]}true
{"coordinates":[[[-112.94377956164206,34.81725414459382],[-112.94377956164206,30.006795384733323], [...{"type":"LineString","coordinates":[[-112.94377956164206,34.81725414459382],[-112.94377956164206,30.006795384733323]]}false
{"type":"LineString","coordinates":[[-112.94377956164206,34.81725414459382],[-112.94377956164206,30.006795384733323]]}{"type":"Point","coordinates":[-112.94377956164206,34.81725414459382]}false
{"type":"Point","coordinates":[-112.94377956164206,34.81725414459382]}{"type":"Point","coordinates":[-112.94377956164206,34.81725414459382]}true
{"coordinates":[[[-112.94377956164206,34.81725414459382],[-112.94377956164206,30.006795384733323], [...{"coordinates":[[[-111.94377956164206,33.81725414459382],[-111.94377956164206,31.006795384733323], [...true

See details.


Geometry difference

Supported in: Batch, Streaming

Calculates the portion of geometry a that is not intersecting geometry b.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Geometry a: geometry_a
  • Geometry b: geometry_b
geometry_ageometry_bOutput
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}{"type":"Polygon","coordinates":[[[0.25,0.25],[0.5,0.25],[0.5,0.5],[0.25,0.5],[0.25,0.25]]]}{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]],[[0.25,0.25],[0.5,0.25],[0.5,0.5],[0.25,0.5],[0.25,0.25]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}{"type":"Polygon","coordinates":[[[0.0,0.0],[0.5,0.0],[0.5,1.0],[0.0,1.0],[0.0,0.0]]]}{"type":"Polygon","coordinates":[[[0.5,1.0],[1.0,1.0],[1.0,0.0],[0.5,0.0],[0.5,1.0]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}{"type":"Polygon","coordinates":[[[5.0,5.0],[5.0,6.0],[6.0,6.0],[6.0,5.0],[5.0,5.0]]]}{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}{"type":"LineString","coordinates":[[0.0,0.0],[0.0,1.0]]}{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}

See details.


Geometry explode to array

Supported in: Batch, Streaming

Converts a geometry to an array of its constituent simple geometries.

Expression categories: Geospatial

Output type: Array<Geometry>

Example

Argument values:

  • Expression: geometry
geometryOutput
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}[ {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} ]
{"type":"MultiPolygon","coordinates":[[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]],[[[5.0,5.0],[5.0,6.0],[6.0,6.0],[6.0,5.0],[5.0,5.0]]]]}[ {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}, {"type":"Polygon","coordinates":[[[5.0,5.0],[5.0,6.0],[6.0,6.0],[6.0,5.0],[5.0,5.0]]]} ]

See details.


Geometry intersection

Supported in: Batch, Streaming

Calculates the portion of geometry a that is intersecting geometry b.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Geometry a: geometry_a
  • Geometry b: geometry_b
geometry_ageometry_bOutput
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}{"type":"Polygon","coordinates":[[[0.5,0.0],[1.5,0.0],[1.5,1.0],[0.5,1.0],[0.5,0.0]]]}{"type":"Polygon","coordinates":[[[0.5,1.0],[1.0,1.0],[1.0,0.0],[0.5,0.0],[0.5,1.0]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}{"type":"Polygon","coordinates":[[[5.0,5.0],[5.0,6.0],[6.0,6.0],[6.0,5.0],[5.0,5.0]]]}{"type":"Polygon","coordinates":[[]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}{"type":"Polygon","coordinates":[[[1.0,0.0],[1.0,1.0],[2.0,1.0],[2.0,0.0],[1.0,0.0]]]}{"type":"LineString","coordinates":[[1.0,1.0],[1.0,0.0]]}
{"type":"Point","coordinates":[0.0,0.0]}{"type":"LineString","coordinates":[[0.0,0.0],[1.0,0.0]]}{"type":"Point","coordinates":[0.0,0.0]}
{"type":"LineString","coordinates":[[0.0,0.0],[1.0,0.0]]}{"type":"Polygon","coordinates":[[[2.0,0.0],[2.0,1.0],[3.0,1.0],[3.0,0.0],[2.0,0.0]]]}{"type":"LineString","coordinates":[]}

See details.


Geometry length

Supported in: Batch, Streaming

Get the length of the line strings and multi line strings in the geometry in meters. Uses a spherical approximation of the globe. Non-linear geometries (polygons and points) count as 0.

Expression categories: Geospatial

Output type: Double

Example

Argument values:

  • Expression: geometry
geometryOutput
{"type":"LineString","coordinates":[[-73.778128,40.641195],[-118.408535,33.941563]]}3974344.7433354934
{"type":"LineString","coordinates":[[0.0,0.0],[1.0,0.0],[1.0,1.0],[1.0,2.0]]}333585.2407005987
{"type":"MultiLineString","coordinates":[[[0.0,0.0],[1.0,0.0],[1.0,1.0]], [[1.0,2.0],[2.0,2.0]]]}333517.50194413937

See details.


Geometry rotate 2d

Supported in: Streaming

Applies a two dimensional clockwise rotation centered at the provided GeoPoint to the supplied geometry. This rotation occurs in the provided coordinate reference system and is then projected back to WGS84.

Expression categories: Geospatial

Output type: Geometry

See details.


Geometry set z-coordinate

Supported in: Batch, Streaming

Sets the z-coordinate of a geometry. If the geometry has an existing z-coordinate it will be overwritten.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Geometry: geometry
  • Z coordinate: zCoordinate
geometryzCoordinateOutput
{"type":"Point","coordinates":[1.0, 2.0]}1.0{"type":"Point","coordinates":[1.0, 2.0, 1.0]}
{"type":"Point","coordinates":[1.0, 2.0, 3.0]}1.0{"type":"Point","coordinates":[1.0, 2.0, 1.0]}

See details.


Geometry shortest distance

Supported in: Batch, Streaming

Given two valid geometries, calculates the shortest (great circle) distance in meters between them. Uses a spherical approximation of the globe. Overlapping geometries have a distance of zero.

Expression categories: Geospatial

Output type: Double

See details.


Geometry standardize

Supported in: Batch, Streaming

Given a valid geometry, standardizes it by enforcing the right-hand rule on the input, which is the convention for GeoJSON. This enables equality comparisons between equivalent geometries. This expression may reverse linestrings.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Expression: geometry
geometryOutput
{"type":"Polygon","coordinates":[[[32.26868,-26.53253],[32.26465,-26.45873],[32.25262,-26.38563],[32.26868,-26.53253]]]}{"type":"Polygon","coordinates":[[[32.25262, -26.38563],[32.26868, -26.53253],[32.26465, -26.45873],[32.25262, -26.38563]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,0.0]]]}{"type":"Polygon","coordinates":[[[0.0,0.0],[1.0,0.0],[0.0,1.0],[0.0,0.0]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,0.0]],[[0.25,0.25],[0.5,0.25],[0.25,0.5]]]}{"type":"Polygon","coordinates":[[[0.0,0.0],[1.0,0.0],[0.0,1.0],[0.0,0.0]], [[0.25,0.25],[0.25,0.5],[0.5,0.25],[0.25,0.25]]]}
{"coordinates": [[[20.0, 10.0], [27.0, 10.0], [27.0, 17.0], [20.0, 17.0], [20.0, 10.0]]], "type": "Polygon"}{"coordinates": [[[20.0, 10.0], [27.0, 10.0], [27.0, 17.0], [20.0, 17.0], [20.0, 10.0]]], "type": "Polygon"}
{"coordinates": [[[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]], [[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]]], "type":"MultiPolygon"}{"coordinates": [[[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"}
{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"}{"coordinates": [[5.0, 5.0],[-1.0, -1.0]], "type":"LineString"}
{"coordinates": [[[0.0, 0.0, 0.0], [10.0, 0.0, 0.0], [10.0, 10.0, 0.0], [10.0, 10.0, 10.0], [0.0, 10.0, 10.0], [0.0, 0.0, 10.0], [0.0, 0.0, 0.0]]], "type": "Polygon"}{"coordinates": [[[0.0, 0.0, 0.0], [10.0, 0.0, 0.0], [10.0, 10.0, 0.0], [10.0, 10.0, 10.0], [0.0, 10.0, 10.0], [0.0, 0.0, 10.0], [0.0, 0.0, 0.0]]], "type": "Polygon"}

See details.


Geometry symmetric difference

Supported in: Batch, Streaming

Calculates the portion that is in either geometry, but not in their intersection.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Geometry a: geometry_a
  • Geometry b: geometry_b
geometry_ageometry_bOutput
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[2.0,1.0],[2.0,0.0],[0.0,0.0]]]}{"type":"Polygon","coordinates":[[[1.0,0.0],[1.0,1.0],[3.0,1.0],[3.0,0.0],[1.0,0.0]]]}{"type":"MultiPolygon","coordinates":[[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]],[[[2.0,0.0],[2.0,1.0],[3.0,1.0],[3.0,0.0],[2.0,0.0]]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}{"type":"Polygon","coordinates":[[[0.0,0.0],[0.5,0.0],[0.5,1.0],[0.0,1.0],[0.0,0.0]]]}{"type":"Polygon","coordinates":[[[0.5,1.0],[1.0,1.0],[1.0,0.0],[0.5,0.0],[0.5,1.0]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}{"type":"Polygon","coordinates":[[[0.25,0.25],[0.5,0.25],[0.5,0.5],[0.25,0.5],[0.25,0.25]]]}{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]],[[0.25,0.25],[0.5,0.25],[0.5,0.5],[0.25,0.5],[0.25,0.25]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}{"type":"Polygon","coordinates":[[[5.0,5.0],[5.0,6.0],[6.0,6.0],[6.0,5.0],[5.0,5.0]]]}{"type":"MultiPolygon","coordinates":[[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]],[[[5.0,5.0],[5.0,6.0],[6.0,6.0],[6.0,5.0],[5.0,5.0]]]]}

See details.


Geometry translate expression

Supported in: Batch, Streaming

Applies a translation to a geometry. Two dimensional geometries are only converted to three dimensional geometries if a z offset is supplied.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Geometry column: geometry
  • Projected coordinate system: EPSG:4326
  • X offset: 1.0
  • Y offset: -1.0
  • Z offset: null
geometryOutput
{"type":"Point","coordinates":[0.0, 0.0]}{"type":"Point","coordinates":[1.0, -1.0]}
{"type":"LineString","coordinates":[[0.0, 0.0], [1.0, 1.0]]}{"type":"LineString","coordinates":[[1.0, -1.0], [2.0, 0.0]]}
{"type":"Polygon","coordinates":[[[0.0, 0.0],[1.0, 0.0],[1.0, 1.0],[0.0, 1.0], [0.0, 0.0]]]}{"type":"Polygon","coordinates":[[[1.0, -1.0],[2.0, -1.0],[2.0, 0.0],[1.0, 0.0],[1.0, -1.0]]]}

See details.


Geometry union

Supported in: Batch, Streaming

Combines the two geometries to create a single geometry.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Geometry a: geometry_a
  • Geometry b: geometry_b
geometry_ageometry_bOutput
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}{"type":"Polygon","coordinates":[[[0.5,0.0],[1.5,0.0],[1.5,1.0],[0.5,1.0],[0.5,0.0]]]}{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[0.5,1.0],[1.0,1.0],[1.5,1.0],[1.5,0.0],[1.0,0.0],[0.5,0.0],[0.0,0.0]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}{"type":"Polygon","coordinates":[[[5.0,5.0],[5.0,6.0],[6.0,6.0],[6.0,5.0],[5.0,5.0]]]}{"type":"MultiPolygon","coordinates":[[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]],[[[5.0,5.0],[5.0,6.0],[6.0,6.0],[6.0,5.0],[5.0,5.0]]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}{"type":"Polygon","coordinates":[[[1.0,0.0],[1.0,1.0],[2.0,1.0],[2.0,0.0],[1.0,0.0]]]}{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[2.0,1.0],[2.0,0.0],[1.0,0.0],[0.0,0.0]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]],[[0.25,0.25],[0.5,0.25],[0.5,0.5],[0.25,0.5],[0.25,0.25]]]}{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}
{"type":"LineString","coordinates":[[0.0,0.0],[1.0,0.0]]}{"type":"Polygon","coordinates":[[[1.0,0.0],[1.0,1.0],[2.0,1.0],[2.0,0.0],[1.0,0.0]]]}{"type":"GeometryCollection","geometries":[{"type":"LineString","coordinates":[[0.0,0.0],[1.0,0.0]]},{"type":"Polygon","coordinates":[[[1.0,0.0],[1.0,1.0],[2.0,1.0],[2.0,0.0],[1.0,0.0]]]}]}

See details.


Get H3 index

Supported in: Batch, Streaming

Convert GeoPoint to H3 index at given resolution. Returns null for resolution <0 or >15.

Expression categories: Geospatial

Output type: H3 Index

See details.


Get H3 indices covering a geometry

Supported in: Batch, Streaming

Convert geometry to H3 indices at a certain resolution. Resolution must be between 0 and 15, inclusive. For a polygon, three conversions are supported: a) H3 indices that fully cover the polygon, b) H3 indices that are fully contained by the polygon, c) H3 indices whose centroids are contained in the polygon. Returns null when the expected number of H3 indices exceed 7 million.

Expression categories: Geospatial

Output type: Array<H3 Index>

See details.


Get MIME type

Supported in:

Returns the IANA MIME type of a media reference.

Expression categories: Media

Output type: String

See details.


Get PDF page dimensions

Supported in: Batch

Get the dimensions in points of each page of the PDF.

Expression categories: Media

Output type: Array<Struct<height, width>>

See details.


Get XZ curve index of an envelope

Supported in: Batch, Streaming

Encodes the envelope in an XZ curve.

Expression categories: Geospatial

Output type: Long

Example

Argument values:

  • Curve preset: LON_LAT_10KM
  • Envelope: envelope
envelopeOutput
{
 maxLat -> 2.0,
 maxLon -> 3.0,
 minLat -> 0.0,
 minLon -> 1.0,
}
16777222
{
 maxLat -> 2.0,
 maxLon -> 3.0,
 minLat -> null,
 minLon -> 1.0,
}
null

See details.


Get bearing from start point to end point

Supported in: Batch, Streaming

Calculates the absolute true bearing (clockwise angle relative to geographical north) from the first point to the second point in degrees using a spherical approximation of the earth.

Expression categories: Geospatial

Output type: Double

Example

Argument values:

  • Ending point: end_point
  • Starting point: start_point
start_pointend_pointOutput
{
latitude: 40.69325025929194,
longitude: -74.00522662934995,
}
{
latitude: 51.4988509390695,
longitude: -0.1238396067697046,
}
51.20964213763489

See details.


Get geometry envelope

Supported in: Batch, Streaming

Given a valid geometry or array of geometries, return a geometry representing the envelope of the input. The envelope is the smallest axis-aligned rectangular region containing the minimum and maximum x and y values of the geometry.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Expression: geometry
geometryOutput
{"type":"Polygon","coordinates":[[[0.0,0.0],[1.0,0.0],[0.0,1.0]]]}{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}

See details.


Get lat/long bounding box struct

Supported in: Batch, Streaming

Given a valid geometry or array of geometries, return a struct containing the bounds of the geometry or geometries.

Expression categories: Geospatial

Output type: LatLonBoundingBox

Example

Argument values:

  • Expression: geometry
geometryOutput
{"type":"Polygon","coordinates":[[[0.0,0.0],[1.0,0.0],[0.0,1.0]]]}{
 maxLat -> 1.0,
 maxLon -> 1.0,
 minLat -> 0.0,
 minLon -> 0.0,
}

See details.


Get neighbors of an H3 index

Supported in: Batch, Streaming

Get all neighbors of an H3 index.

Expression categories: Geospatial

Output type: Array<H3 Index>

See details.


Get struct field

Supported in: Batch, Streaming

Extracts a field from a struct.

Expression categories: Struct

Output type: AnyType

Example

Argument values:

  • Locator: airline.id
  • Struct: struct
structOutput
{
airline: {
id: NA,
},
}
NA
{
airline: {
id: FE,
},
}
FE

See details.


Get the convex hull of a geometry

Supported in: Batch, Streaming

Given a valid GeoJSON input string, return a GeoJSON string that is the convex hull for the geometry. The convex hull is the smallest convex polygon containing the geometry.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Expression: geometry
geometryOutput
{"type":"Polygon","coordinates":[[[0.0,0.0],[2.0,0.0],[2.0,1.0],[1.0,1.0],[1.0,2.0],[0.0,2.0],[0.0,0.0]]]}{"type":"Polygon", "coordinates":[[[0.0, 0.0], [0.0, 2.0], [1.0, 2.0], [2.0, 1.0], [2.0, 0.0], [0.0, 0.0]]]}
nullnull

See details.


Greater than

Supported in: Batch, Streaming

Returns true if left is greater than right.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Left: a
  • Right: b
abOutput
10true
11false
01false

See details.


Greater than or equals

Supported in: Batch, Streaming

Returns true if left is greater than or equal to right.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Left: a
  • Right: b
abOutput
10true
11true
01false

See details.


Greatest

Supported in: Batch, Streaming

Computes the greatest value amongst all input columns, skipping null values.

Expression categories: Numeric

Type variable bounds: T accepts ComparableType

Output type: T

Example

Argument values:

  • Expressions: [a, b, c]
abcOutput
1233
1323
3213

See details.


Gzip decompress

Supported in: Batch, Streaming

Decompresses gzip-compressed binary into a string.

Expression categories: File

Output type: String

Example

Argument values:

  • Expression: gzip
gzipOutput
H4sIAAAAAAAA//NIzcnJ11Eozy/KSVEEAObG5usNAAAAHello, world!

See details.


H3 cell to children

Supported in: Batch, Streaming

Get children of an H3 index at given resolution specifying children coarseness. Returns null for resolution <0 or >15 or for children resolution lower than given H3 index's resolution.

Expression categories: Geospatial

Output type: Array<H3 Index>

See details.


H3 cell to parent

Supported in: Batch, Streaming

Get parent of an H3 index at given resolution specifying parent coarseness. Returns null for resolution <0 or >15 or resolution higher than given index.

Expression categories: Geospatial

Output type: H3 Index

See details.


H3 to geometry

Supported in: Batch, Streaming

Convert H3 index to polygon.

Expression categories: Geospatial

Output type: Geometry

See details.


Hash sha256

Supported in: Batch, Streaming

Hashes the input using sha256 hashing algorithm.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: Hello World!

Output: 7f83b1657ff1fc53b92dc18148a1d65dfc2d4b1fa3d677284addd200126d9069

See details.


Image to embeddings

Supported in: Batch

Converts images into embeddings using the provided model.

Expression categories: Media

Output type: Embedded vector

Example

Description: Example embeddings for an image. Argument values:

  • Media reference: mediaRef
  • Model:
    googleSiglip2Embedding(

    )
  • Output mode: null
mediaRefOutput
{
"mimeType": "image/jpeg",
"reference": {
 "type": "mediaSetViewItem",
 "...
embeddings-result

See details.


Interpolate geo point along linestring

Supported in: Batch, Streaming

Returns a point interpolated along a line. Implementation interprets lines as the shortest path, using a spherical approximation of the globe.

Expression categories: Geospatial

Output type: GeoPoint

Example

Argument values:

  • Fraction: fraction
  • Linestring: linestring
linestringfractionOutput
{"type":"LineString","coordinates":[[0.0,2.0],[30.0,0.0]]}0.5{
latitude: 1.0352686301676643,
longitude: 15.004677545504547,
}
{"type":"LineString","coordinates":[[30.0,2.0],[50.0,3.0]]}0.8{
latitude: 2.8256098405656185,
longitude: 45.99752305664789,
}
{"type":"LineString","coordinates":[[45.0,9.0],[90.0,4.0]]}0.2{
latitude: 8.363732883448177,
longitude: 54.073497456494955,
}

See details.


Is NaN

Supported in: Batch, Streaming

Returns true if the input is nan, false otherwise.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Expression: NaN

Output: true

See details.


Is empty struct

Supported in: Batch, Streaming

Returns true if the input is an empty struct, with recursive checking of inner arrays and structs.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Expression: struct
structOutput
{
airline: {
id: null,
name: null,
},
tail_no: null,
}
true
{
airline: {
id: NA,
name: null,
},
tail_no: null,
}
false

See details.


Is in

Supported in: Batch, Streaming

Returns true if the list contains the value.

Expression categories: Boolean

Type variable bounds: T accepts ComparableType

Output type: Boolean

Example

Description: You can check if the list contains the value. Argument values:

  • Contains: [AWE-112, BRR-123]
  • Value: value
valueOutput
BRR-123true
ABC-543false

See details.


Is not null

Supported in: Batch, Streaming

Returns true if the input is not null, can optionally treat empty strings as null.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Expression: hello
  • Treat empty strings as null: null

Output: true

See details.


Is null

Supported in: Batch, Streaming

Returns true if the input is null, can optionally treat empty strings as null.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Expression: null
  • Treat empty strings as null: null

Output: true

See details.


Is valid GeoJSON

Supported in: Batch, Streaming

Returns true if the input is a valid GeoJSON input string. Not all GeoJSON strings are indexable by the ontology; use the "prepare geometry" expression to prepare geometry prior to Ontology use.

Expression categories: Geospatial

Output type: Boolean

Example

Argument values:

  • Expression: geoJson
geoJsonOutput
{"type":"Point","coordinates":[3.0, 5.0, 2.0]}true
{"type":"Polygon","coordinates":[[[0.0,0.0],[1.0,0.0],[0.0,1.0],[0.0,0.0]]]}true
{"type":"LineString","coordinates":[[0.0,0.0],[1.0,0.0]]}true
not a GeoJSON stringfalse

See details.


Is valid Geohash

Supported in: Batch, Streaming

Returns true if the input is a valid Geohash input string.

Expression categories: Geospatial

Output type: Boolean

Example

Argument values:

  • Expression: geohash
geohashOutput
sk4dtrue
dt9zy9cg36j7true
not a Geohash stringfalse
nullfalse

See details.


Is valid H3 index

Supported in: Batch, Streaming

Returns true if the input is a valid H3 index string.

Expression categories: Geospatial

Output type: Boolean

Example

Argument values:

  • Expression: h3
h3Output
862a1072ffffffftrue
not an h3 valuefalse

See details.


Is valid MGRS

Supported in: Batch, Streaming

Returns true if the input is a valid MGRS (military grid reference system) string.

Expression categories: Geospatial

Output type: Boolean

Example

Argument values:

  • Expression: mgrs
mgrsOutput
4Q FJ 1 6true
4Q FJ 12345 67890true

See details.


Is valid MIME type

Supported in: Batch, Streaming

Returns true if the input is a valid MIME type.

Expression categories: Boolean, Other

Output type: Boolean

See details.


Is valid Ontology GeoPoint

Supported in: Batch, Streaming

Returns true if the input is a valid Ontology GeoPoint. Ontology GeoPoints are strings of the format '{lat},{lon}', where -90 <= lat <= 90 and -180 <= lon <= 180.

Expression categories: Geospatial

Output type: Boolean

Example

Argument values:

  • Expression: geopoint
geopointOutput
-35.307428203,149.122686883true
149.122686883,-35.307428203false
10.0, 20.0true
10.0, 20.0true
not a GeoPointfalse
nullfalse
(10.0,20.0)false

See details.


Is valid delegated media gid

Supported in: Batch, Streaming

Returns true if the input is a valid gotham delegated media gid. Check gotham's delegated media rtfm for more details.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Expression: ri.gotham-delegated-media.12345678-1234-1234-1234-123456789012.testaudiotype.testlocator

Output: true

See details.


Is valid media reference

Supported in: Batch, Streaming

Returns true if the input is a valid Foundry media reference.

Expression categories: Boolean

Output type: Boolean

See details.


Is valid rid

Supported in: Batch, Streaming

Returns true if the input is a valid Foundry resource identifier.

Expression categories: Boolean

Output type: Boolean

See details.


Is valid uuid

Supported in: Batch, Streaming

Returns true if the input is a valid uuid.

Expression categories: Boolean

Output type: Boolean

See details.


Join array

Supported in: Batch, Streaming

Joins array with specified separator.

Expression categories: Array

Output type: String

Example

Argument values:

  • Array to join: [ hello, world ]
  • Separator: -

Output: hello-world

See details.


Last day of the week/month/quarter/year

Supported in: Batch

Returns the last day of the week/month/quarter/year.

Expression categories: Datetime

Output type: Date

See details.


Least

Supported in: Batch, Streaming

Computes the least value amongst all input columns, skipping null values.

Expression categories: Boolean, Numeric

Type variable bounds: T accepts ComparableType

Output type: T

Example

Argument values:

  • Expressions: [a, b, c]
abcOutput
1231
1321
3211

See details.


Left of string

Supported in: Batch, Streaming

Extract left hand side of a string based on index.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: Hello world!
  • Length: 5

Output: Hello

See details.


Left pad string

Supported in: Batch, Streaming

Left-pad the string column to width of length with pad.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: Hello world!
  • Length: 15
  • Pad: *

Output: ***Hello world!

See details.


Length

Supported in: Batch, Streaming

Returns the length of each value in a string column or an array column.

Expression categories: Array, Numeric

Output type: Integer

Example

Argument values:

  • Expression: string
stringOutput
hello5
bye3

See details.


Less than

Supported in: Batch, Streaming

Returns true if left is less than right.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Left: left
  • Right: right
leftrightOutput
1.010true
10.01false

See details.


Less than or equals

Supported in: Batch, Streaming

Returns true if left is less than or equal to right.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Left: left
  • Right: right
leftrightOutput
1.010true
10.01false

See details.


Logarithm

Supported in: Batch, Streaming

Calculates the natural logarithm, ln(x), of a column.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Expression: 10.123

Output: 2.3148100626166146

See details.


Logarithm with base

Supported in: Batch, Streaming

Calculates logarithm with a given base.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Base: 2.0
  • Expression: 8

Output: 3.0

See details.


Logical type cast

Supported in: Batch, Streaming

Cast expression to given logical type. Unlike the regular cast expression, this expression will not change the underlying base representation of the data, but rather enforce the constraints associated with the specified logical type, so that the output can be used as the input to downstream expressions which specifically demand an instance of that logical type.

Expression categories: Cast

Type variable bounds: C accepts AnyType

Output type: C

Example

Description: Successful cast to natural number Argument values:

  • Expression: 1234
  • Logical type: Natural number
  • Default value: null

Output: 1234

See details.


Lowercase

Supported in: Batch, Streaming

Converts all characters in string to lowercase.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: Hello World

Output: hello world

See details.


Map values

Supported in: Batch, Streaming

Changes the values of the input column to new values based on a map of key-value pairs. If the input value is not found in the map, the default value is used.

Expression categories: Data preparation

Type variable bounds: T1 accepts ComparableType**T2 accepts AnyType

Output type: T2

Example

Argument values:

  • Column to replace values in: country
  • Default value:
    cast(
     expression: null,
     type: String,
    )
  • Values map: {
     Denmark -> DNK,
     United Kingdom -> UK,
    }
countryOutput
United KingdomUK
DenmarkDNK
United States of Americanull

See details.


Modulo

Supported in: Batch, Streaming

Returns modulus of an expression.

Expression categories: Numeric

Output type: DefiniteNumeric

Example

Argument values:

  • Denominator: 4
  • Numerator: 10.123

Output: 2.123

See details.


Multiply numbers

Supported in: Batch, Streaming

Calculates the product of all input columns.

Expression categories: Numeric

Output type: Numeric

Example

Argument values:

  • Expressions: [col_a, col_b, col_c]
col_acol_bcol_cOutput
102360

See details.


Natural random number

Supported in: Batch, Streaming

Returns a random natural number. This is not deterministic and will not produce the same result on repeated builds, even when using a seed.

Expression categories: Numeric

Output type: Long

Example

Description: The only natural number between 10 (inclusive) and 11 (exclusive) is 10. Argument values:

  • Max value: 11
  • Min value: 10
  • Seed: null

Output: 10

See details.


Negate

Supported in: Batch, Streaming

Expression categories: Numeric

Output type: Numeric

See details.


Normal random number

Supported in: Batch, Streaming

Returns a column of normally distributed random numbers with zero mean and unit variance. This is not deterministic and will not produce the same result on repeated builds, even when using a seed.

Expression categories: Numeric

Output type: Double

See details.


Not

Supported in: Batch, Streaming

Returns the negated boolean value of a boolean expression.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Expression: boolean
booleanOutput
truefalse
falsetrue

See details.


Not any

Supported in: Batch, Streaming

Returns true only if all of the specified conditions are false. Nulls are considered false.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Conditions: [left_boolean, right_boolean]
left_booleanright_booleanOutput
truetruefalse
truefalsefalse
falsetruefalse
falsefalsetrue

See details.


Nth chain in polygon

Supported in: Batch, Streaming

Returns the nth ring in a single polygon in the geometry. Indexing is 1-based, and an index of 0 is out-of-bounds. An index equal to 1 returns an external ring. An index greater than 1 returns an internal ring. Returns null for any of the following conditions: geometry isn't a single polygon, a feature collection or geometry collection is provided, index is out-of-bounds, or at least one argument is null.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • N: n
  • Polygon: polygon
polygonnOutput
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]}1{"coordinates": [[0.0, 0.0], [0.0, 10.0], [10.0, 10.0], [10.0, 0.0], [0.0, 0.0]], "type": "LineString"}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]}2null
{"coordinates":[[[60.0,60.0],[50.0,60.0],[50.0,50.0],[60.0,50.0],[60.0,60.0]],[[57.0,57.0],[55.0,52.0],[52.0,52.0],[50.0,57.0],[57.0,57.0]]],"type":"Polygon"}1{"coordinates": [[60.0,60.0],[50.0,60.0],[50.0,50.0],[60.0,50.0],[60.0,60.0]], "type": "LineString"}
{"coordinates":[[[60.0,60.0],[50.0,60.0],[50.0,50.0],[60.0,50.0],[60.0,60.0]],[[57.0,57.0],[55.0,52.0],[52.0,52.0],[50.0,57.0],[57.0,57.0]]],"type":"Polygon"}2{"coordinates": [[57.0,57.0],[55.0,52.0],[52.0,52.0],[50.0,57.0],[57.0,57.0]], "type": "LineString"}

See details.


Nth point in linestring

Supported in: Batch, Streaming

Returns the nth point in a single linestring in the geometry. Indexing is 1-based, and an index of 0 is out-of-bounds. A negative index is counted backwards from the end of the linestring, so that -1 is the last point. Returns null for any of the following conditions: geometry isn't a single linestring, a feature collection or geometry collection is provided, index is out-of-bounds, or at least one argument is null.

Expression categories: Geospatial

Output type: GeoPoint

Example

Argument values:

  • Linestring: linestring
  • N: n
linestringnOutput
{"type":"LineString","coordinates":[[30.0,2.0],[35.0,0.0],[50.0,3.0]]}1{
latitude: 2.0,
longitude: 30.0,
}
{"type":"LineString","coordinates":[[30.0,2.0],[35.0,0.0],[50.0,3.0]]}3{
latitude: 3.0,
longitude: 50.0,
}
{"type":"LineString","coordinates":[[45.0,9.0],[90.0,4.0],[40.0,0.0]]}-1{
latitude: 0.0,
longitude: 40.0,
}

See details.


Nullify empty string

Supported in: Batch, Streaming

Convert empty strings to null.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: empty string

Output: null

See details.


Or

Supported in: Batch, Streaming

Returns true if any of the specified conditions are true. Nulls are considered false.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Conditions: [left_boolean, right_boolean]
left_booleanright_booleanOutput
truetruetrue
truefalsetrue
falsetruetrue
falsefalsefalse

See details.


Parse GeoJSON from a non-WGS 84 coordinate system

Supported in: Batch, Streaming

Convert GeoJSON string from a non-WGS 84 coordinate system to WGS 84 geometry. For GeoJSON already in WGS 84 (longitude, latitude), the "logical type cast" expression can convert directly with less overhead. Returns null for strings that fail during parsing or conversion.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • GeoJSON string: geojson_string
  • Source coordinate system: EPSG:32618
geojson_stringOutput
{"type":"Point","coordinates":[320000.0,4300000.0]}{"type":"Point","coordinates":[-77.07368071728229,38.83040844313318]}
{"type":"LineString","coordinates":[[320000.0,4300000.0],[320100.0,4300000.0]]}{"type":"LineString","coordinates":[[-77.07368071728229,38.83040844313318],[-77.0725293738795,38.83042888342659]]}
{"type":"Polygon","coordinates":[[[320000.0,4300000.0],[320100.0,4300000.0],[320000.0,4300100.0],[320000.0,4300000.0]]]}{"type":"Polygon","coordinates":[[[-77.07368071728229,38.83040844313318],[-77.0725293738795,38.83042888342659],[-77.07370685720375,38.83130901341597],[-77.07368071728229,38.83040844313318]]]}

See details.


Parse JSON string

Supported in: Batch, Streaming

Parses JSON string following the given schema definition, ignoring any fields not in the schema.

Expression categories: Data preparation, Popular, String, Struct

Output type: Array<AnyType> | Map<String, String> | Struct

Example

Argument values:

  • JSON: json
  • Schema: Struct<airline, airport<id, miles>>
  • Output mode: null
jsonOutput
{
 "airline": "XB-112",
 "airport": {
  "id": "JFK",
  "miles": 2000
 }
}
{
airline: XB-112,
airport: {
id: JFK,
miles: 2000,
},
}

See details.


Parse KML string as geometry

Supported in: Batch, Streaming

Parses KML geometry definitions as a GeoJSON. Ignores all attributes. This expression operates on already extracted text; please extract files to text before using this expression.

Expression categories: Geospatial

Output type: String | Struct<ok, error>

Example

Description: Basic polygons. Argument values:

  • KML string to parse.: col
  • Output mode: null
  • Prepare geometry after parse: null
colOutput
<LineString>
<coordinates>
-71.1663,42.2614
-71.1667,42.2616
</coordinates>
</LineString>
{"type":"LineString","coordinates":[[-71.1663,42.2614],[-71.1667,42.2616]]}
<Polygon>
<extrude>1</extrude>
<altitudeMode>relativeToGround</altitudeMode>
<ou...
{"type":"Polygon","coordinates":[[[-122.0848938459612,37.42257124044786,17.0],[-122.0847882750515,37...
<Polygon>
<extrude>1</extrude>
<altitudeMode>relativeToGround</altitudeMode>
<ou...
{"type":"Polygon","coordinates":[[[-77.05465973756702,38.87291016281703,100.0],[-77.0531553685479,38...
<Point>
<coordinates>
-71.1663,42.2614
</coordinates>
</Point>
{"type":"Point","coordinates":[-71.1663,42.2614]}
<MultiGeometry>
<Polygon>
<outerBoundaryIs>
<coordinates> -71.1663,42.2614
-71.1...
{"type":"MultiPolygon","coordinates":[[[[-81.1679,32.2614],[-81.1679,32.28],[-81.1663,32.28],[-81.16...

See details.


Parse KML string as geometry list

Supported in: Batch, Streaming

Parses KML string as a list of GeoJSONs, ignoring all KML attributes.

Expression categories: Geospatial

Output type: Array<Geometry> | Struct<ok<Struct<ok, error>>, error>

Example

Argument values:

  • KML string to parse.: col
  • Output mode: simple
  • Prepare geometry after parse: true
colOutput
<?xml version="1.0" encoding="utf-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
<Do...
[ {"coordinates":[[-122.43193945401, 37.801983684521], [-122.431564131101, 37.8020327731402], [-122.43... ]

See details.


Parse XML as schema

Supported in: Batch, Streaming

Parses xml strings following the given schema definition, ignoring any fields not in the schema.

Expression categories: File, Struct

Output type: Struct

Example

Argument values:

  • Input schema: Struct<id, airport<id, miles>>
  • Xml: xml
  • Attribute prefix: null
  • Ignore namespace: null
  • Output mode: SIMPLE
  • Value tag: null
xmlOutput
<airline>
 <id>XB-112</id>
 <airport>
  <id>JFK</id>
  <miles>2000</miles>
 </airport>
</airline>
{
airport: {
id: JFK,
miles: 2000,
},
id: XB-112,
}

See details.


Parse classification string

Supported in: Batch

Returns the markings parsed from a given classification string. This output is formatted as a struct, where the first element of the struct is an array comprising the classification markings that represent the input. This list is null if the classification string is invalid, or if there are other errors that occur while parsing the markings. The second element of the struct is the string of error message(s). If there are no errors, the error field will be null. This expression is called asynchronously for performance.

Expression categories: Other

Output type: Struct<markingIds<Classification>, errors>

See details.


Parse duration

Supported in: Batch

Parses an ISO8601 string duration and start time to its length in a specific time unit.

Expression categories: Datetime, String

Output type: Long

Example

Argument values:

  • Duration: PT1M30.5S
  • Start time: 2022-10-01T09:00:00Z
  • Unit: SECONDS

Output: 90

See details.


Parse phone number

Supported in: Batch, Streaming

Normalizes phone numbers to a common format, parsing them from various regions and formats. Phone numbers containing the + sign followed by the region code will be parsed correctly even if the region is not set. All other number formats require a region to be selected from the options provided in order for them to be correctly parsed. Phone numbers that cannot be parsed will result in nulls.

Expression categories: String

Output type: Phone Number

Example

Description: Return formatted US phone number Argument values:

  • Expression: phoneNumber
  • Format: E164
  • Region: US
phoneNumberOutput
(234) 235-5678+12342355678
+1 415 5552671+14155552671
(415) 5552671+14155552671
Whatsapp@14155552671+14155552671

See details.


Parse well known binary as geometry

Supported in: Batch, Streaming

Converts well-known binary (WKB) to geometry logical type. Invalid WKB input will be returned as null. Optionally supply a source coordinate system identifier to convert from the source coordinate system to WGS 84 if the WKB is not in WGS 84 already.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Expression: wkb
  • Source coordinate system: null
wkbOutput
AAAAAAFACAAAAAAAAEAUAAAAAAAA{"type":"Point","coordinates":[3.0, 5.0]}
AIAAAAFACAAAAAAAAEAUAAAAAAAAQAAAAAAAAAA={"type":"Point","coordinates":[3.0, 5.0, 2.0]}
AAAAAAMAAAABAAAABAAAAAAAAAAAAAAAAAAAAAA/8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA/8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA={"type":"Polygon","coordinates":[[[0.0,0.0],[1.0,0.0],[0.0,1.0],[0.0,0.0]]]}
AAAAAAIAAAACAAAAAAAAAAAAAAAAAAAAAD/wAAAAAAAAAAAAAAAAAAA={"type":"LineString","coordinates":[[0.0,0.0],[1.0,0.0]]}

See details.


Parse well known text as geometry

Supported in: Batch, Streaming

Converts well-known text (WKT) string to geometry logical type. Invalid WKT input will be returned as null. Optionally supply a source coordinate system identifier to convert from the source coordinate system to WGS 84 if the WKT is not in WGS 84 already.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Expression: wkt
  • Source coordinate system: null
wktOutput
POINT (3.0 5.0 2.0){"type":"Point","coordinates":[3.0, 5.0, 2.0]}
POLYGON ((0.0 0.0, 1.0 0.0, 0.0 1.0, 0.0 0.0)){"type":"Polygon","coordinates":[[[0.0,0.0],[1.0,0.0],[0.0,1.0],[0.0,0.0]]]}
LINESTRING (0.0 0.0, 1.0 0.0){"type":"LineString","coordinates":[[0.0,0.0],[1.0,0.0]]}

See details.


Perimeter

Supported in: Batch, Streaming

Calculates perimeter of a geometry in meters using a spherical approximation of the globe. For a line string or a point, this equals 0.

Expression categories: Geospatial

Output type: Double

See details.


Positive modulo

Supported in: Batch

Returns positive modulus of an expression.

Expression categories: Numeric

Type variable bounds: T1 accepts Byte | Integer | Long | Short**T2 accepts Byte | Integer | Long | Short

Output type: T1

Example

Argument values:

  • Denominator: 3
  • Numerator: 10

Output: 1

See details.


Power of

Supported in: Batch, Streaming

Calculates power of expression to exponent. If any of the values is null, returns null.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Exponent: 3
  • Expression: 10

Output: 1000.0

See details.


Prepare geometry

Supported in: Batch, Streaming

Prepares a geometry for downstream use, for example indexing to the ontology, by converting a geometry string into valid GeoJSON. Polygons will be closed and deduplicated. Geometries which cross the anti-meridian (as indicated by width > 180 degrees) will be split into multiple features on each side of the anti-meridian. By default, this operation will return the converted geometry, or null if the string cannot be converted. Alternatively, in the "show errors" output mode, this operation will instead output a struct containing either the successfully parsed output or a descriptive error message.

Expression categories: Geospatial

Output type: Geometry | Struct<ok, error>

Example

Argument values:

  • Geometry string: geometry
  • Output mode: null
geometryOutput
{"type":"Polygon","coordinates":[[[0.0,0.0],[10.0,0.0],[10.0,10.0],[0.0,10.0],[0.0,0.0]]]}{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0,1.0],[1.0,0.0,1.0],[0.0,1.0,1.0],[0.0,0.0,1.0]]]}{"type":"Polygon","coordinates":[[[0.0,0.0,1.0],[0.0,1.0,1.0],[1.0,0.0,1.0],[0.0,0.0,1.0]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[1.0,0.0], [0.0,1.0]]]}{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,0.0],[0.0,0.0]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[1.0,0.0], [1.0,0.0], [0.0,1.0], [0.0,0.0]]]}{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,0.0],[0.0,0.0]]]}
{"type":"Polygon","coordinates":[[[179.0,-30.0],[-179.0,-30.0],[-179.0,30.0],[179.0,30.0],[179.0,-30]]]}{"type":"MultiPolygon","coordinates":[[[[-180.0,-30.0],[-180.0,30.0],[-179.0,30.0],[-179.0,-30.0],[-180.0,-30.0]]],[[[180.0,30.0],[180.0,-30.0],[179.0,-30.0],[179.0,30.0],[180.0,30.0]]]]}
{"type":"LineString","coordinates":[[179.0,30.0],[-179.0,30.0]]}{"type":"MultiLineString","coordinates":[[[179.0,30.0],[180.0,30.0]],[[-180.0,30.0],[-179.0,30.0]]]}
{"type":"GeometryCollection","geometries":[{"type":"LineString","coordinates":[[40.0,10.0],[0.0,1.0]...{"type":"GeometryCollection","geometries":[{"type":"LineString","coordinates":[[40.0,10.0],[0.0,1.0]...
{"type":"GeometryCollection","geometries":[{"type":"Point","coordinates":[1.0,0.0]},{"type":"LineString","coordinates":[[179.0,30.0],[-179.0,30.0]]}]}{"type":"GeometryCollection","geometries":[{"type":"Point","coordinates":[1.0,0.0]},{"type":"MultiLineString","coordinates":[[[179.0,30.0],[180.0,30.0]],[[-180.0,30.0],[-179.0,30.0]]]}]}
{"type":"GeometryCollection","geometries":[{"type":"LineString","coordinates":[[0.0,0.0],[1.0,0.0]]}...{"type":"MultiLineString","coordinates":[[[0.0,0.0],[1.0,0.0]],[[1.0,1.0],[2.0,1.0]]]}
{"type":"GeometryCollection","geometries":[{"type":"MultiLineString","coordinates":[[[0.0,0.0],[1.0,0.0]],[],[[1.0,1.0],[2.0,1.0]]]},{"type":"MultiPoint","coordinates":[[0.0,0.0],[1.0,1.0]]}]}{"geometries":[{"coordinates":[[[0.0,0.0],[1.0,0.0]],[[1.0,1.0],[2.0,1.0]]],"type":"MultiLineString"},{"coordinates":[[0.0,0.0],[1.0,1.0]],"type":"MultiPoint"}],"type":"GeometryCollection"}
{"type":"MultiPolygon","coordinates":[[[[1.0,1.0],[2.0,1.0],[2.0,2.0],[1.0,2.0],[1.0,1.0]]],[[]],[[[10.0,10.0],[20.0,10.0],[20.0,20.0],[10.0,20.0],[10.0,10.0]]]]}{"type":"MultiPolygon","coordinates":[[[[1.0,2.0],[2.0,2.0],[2.0,1.0],[1.0,1.0],[1.0,2.0]]],[[[10.0,20.0],[20.0,20.0],[20.0,10.0],[10.0,10.0],[10.0,20.0]]]]}

See details.


Reduce array elements

Supported in: Batch, Streaming

Reduces array elements using an expression.

Expression categories: Array

Type variable bounds: T accepts Array<Boolean | Byte | Date | Double | Float | Integer | Long | Map<AnyType, AnyType> | Short | String | Timestamp> | Boolean | Byte | Date | Double | Float | Integer | Long | Map<AnyType, AnyType> | Short | String | Timestamp

Output type: T

Example

Argument values:

  • Array: miles
  • Expression to reduce:
    add(
     expressions: [accumulator, element],
    )
  • Initial value: 0
milesOutput
[ 12300, 12342 ]24642

See details.


Regex extract

Supported in: Batch, Streaming

Extracts the specified group from a regex. Returns empty string when no match is found.

Expression categories: Regex, String

Output type: String

Example

Description: Extract the first two initials from the first match. Argument values:

  • Expression: MT-112, XB-967
  • Group: 1
  • Pattern: (\w\w)(-)

Output: MT

See details.


Regex find

Supported in: Batch, Streaming

Matches an expression against a regular expression. Regular expression can match any part of the string.

Expression categories: Regex, String

Output type: Boolean

Example

Description: You can find regex patterns. Argument values:

  • Expression: abcdefg
  • Regex: abc?d

Output: true

See details.


Regex index

Supported in: Batch

Returns an array of indices at which the regular expression pattern is found in the given expression.

Expression categories: Regex, String

Output type: Array<Integer>

Example

Description: You can find regex patterns and their indices. Argument values:

  • Expression: ababab
  • Regex: ab

Output: [ 0, 2, 4 ]

See details.


Regex match

Supported in: Batch, Streaming

Matches an expression against a regular expression. Regular expression must match the whole string.

Expression categories: Regex, String

Output type: Boolean

Example

Description: You can match regex patterns Argument values:

  • Expression: abcdefg
  • Regex: abc?d.+

Output: true

See details.


Regex replace

Supported in: Batch, Streaming

Replace a string using a regex pattern.

Expression categories: Regex, String

Output type: String

Example

Argument values:

  • Expression: tail_number
  • Pattern: (\w\w)(-)
  • Replace: **-
tail_numberOutput
MT-123**-123
XB-434**-434
MT-123, XB-434**-123, **-434

See details.


Remove map entry by key

Supported in: Batch, Streaming

Removes a map entry by the given key.

Expression categories: Map

Type variable bounds: K accepts AnyType**V accepts AnyType

Output type: Map<K, V>

Example

Argument values:

  • Key: k
  • Map: map_col
map_colOutput
{
 a -> 1,
 k -> 2,
}
{
 a -> 1,
}

See details.


Rename struct field

Supported in: Batch, Streaming

Rename fields within a struct.

Expression categories: Data preparation, Struct

Output type: Struct

Example

Argument values:

  • Expression: struct
  • Renames: [(airline.id, identifier)]
structOutput
{
airline: {
id: NA,
},
}
{
airline: {
identifier: NA,
},
}
{
airline: {
id: FE,
},
}
{
airline: {
identifier: FE,
},
}

See details.


Right of string

Supported in: Batch, Streaming

Extract right hand side of a string based on index.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: Hello world!
  • Length: 6

Output: world!

See details.


Right pad string

Supported in: Batch, Streaming

Right-pad the string column to width of length with pad. If the length of the string is greater than the length provided, it will be trimmed.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: Hello world!
  • Length: 15
  • Pad: *

Output: Hello world!***

See details.


Round number

Supported in: Batch, Streaming

Round number to 'scale' decimal places.

Expression categories: Numeric

Output type: Decimal | Double | Float

Example

Argument values:

  • Column: 10.123
  • Scale: 2

Output: 10.12

See details.


Secant

Supported in: Batch, Streaming

Takes the secant of an angle.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Angle unit: degrees
  • Angle value: angle
angleOutput
0.01.0
90.01.633123935319537E16
180.0-1.0

See details.


Sentence case

Supported in: Batch, Streaming

Converts the first character of the first word to be uppercase.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: hello world

Output: Hello world

See details.


Sequence

Supported in: Batch, Streaming

Creates an array with numbers in range from start to end.

Expression categories: Array

Type variable bounds: T accepts Byte | Integer | Long | Short

Output type: Array<T>

Example

Description: Sequences increase by 1 unless otherwise specified. Argument values:

  • End: 10
  • Start: 0
  • Step size: null

Output: [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ]

See details.


Similarity score

Supported in: Batch

Returns the similarity score of two embedding vectors.

Expression categories: Distance measurement, Numeric

Type variable bounds: T accepts Array<Float>

Output type: Double

See details.


Simplify geometry

Supported in: Batch, Streaming

This expression simplifies GeoJSON geometry by removing points within the given tolerance distance using a spherical model of the globe. Loops smaller than the tolerance may be removed entirely.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Geometry: Geometry
  • Tolerance: Tolerance
  • Coordinate precision: null
GeometryToleranceOutput
{"type":"LineString","coordinates":[[30.0,0.0],[35.0,0.0],[40.0,0.0]]}1000{"type":"LineString","coordinates":[[30.0,0.0],[40.0,0.0]]}
{"type":"Polygon","coordinates":[[[-1.0,-1.0],[1.0,-1.0],[1.0,1.0],[0.0,1.0],[-1.0,1.0],[-1.0,-1.0]]]}12000{"type":"Polygon","coordinates":[[[-1.0,1.0],[1.0,1.0],[1.0,-1.0],[-1.0,-1.0],[-1.0,1.0]]]}
{"type":"MultiLineString","coordinates":[[[0.0,0.0],[5.0,0.1],[10.0,0.0]], [[0.0,-5.0],[5.0,0.1],[10.0,5.0]]]}12000{"type":"MultiLineString","coordinates":[[[0.0,0.0],[10.0,0.0]],[[0.0,-5.0],[10.0,5.0]]]}
{"type":"MultiPolygon","coordinates":[[[[-2.0,-2.0],[2.0,-2.0],[2.0,2.0],[0.0,2.1],[-2.0,2.0],[-2.0,...12000{"type":"MultiPolygon","coordinates":[[[[-2.0,2.0],[2.0,2.0],[2.0,-2.0],[-2.0,-2.0],[-2.0,2.0]], [[1...

See details.


Sine

Supported in: Batch, Streaming

Takes the sine of an angle.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Angle unit: degrees
  • Angle value: angle
angleOutput
0.00.0
90.01.0
180.00.0

See details.


Skip bytes

Supported in: Batch, Streaming

Skip a given number of bytes in a binary column.

Expression categories: Binary

Output type: Binary

Example

Argument values:

  • Bytes: aGk=
  • Number of bytes to skip: 1

Output: aQ==

See details.


Slice array

Supported in: Batch, Streaming

Returns the array sliced from the first position to the second position. First position must be 1 or higher. If second position is longer than the array, the entire rest of the array will be returned.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array<T>

See details.


Soundex

Supported in: Batch

Compute the soundex encoding (a phonetic representation) for a word.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: input_string
input_stringOutput
catC300
caatC300
twoT000
tooT000
toT000
fourF600
forF600
foreF600
furF600
meowM000
me owM000

See details.


Split string

Supported in: Batch, Streaming

Split string on specified regex pattern.

Expression categories: String

Output type: Array<String>

Example

Argument values:

  • Expression: string
  • Pattern:
  • Limit: 2
stringOutput
hello[ hello ]
hello world[ hello, world ]
hello there world[ hello, there world ]

See details.


Square root

Supported in: Batch, Streaming

Calculates the square root of a column.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Expression: 9.0

Output: 3.0

See details.


Starts with

Supported in: Batch, Streaming

Expression categories: Boolean, String

Output type: Boolean

Example

Argument values:

  • Expression: Hello world
  • Ignore case: true
  • Value: hello

Output: true

See details.


String after delimiter

Supported in: Batch, Streaming

Extract the string after the first delimiter. Return full string if no matches are found.

Expression categories: String

Output type: String

Example

Argument values:

  • Delimiter: hello
  • Expression: ... Hello world
  • Ignore case: true

Output: world

See details.


String before delimiter

Supported in: Batch, Streaming

Extract the string before the first delimiter. Return the full string if no matches are found.

Expression categories: String

Output type: String

Example

Argument values:

  • Delimiter: hello
  • Expression: ... Hello world
  • Ignore case: true

Output: ...

See details.


String contains

Supported in: Batch, Streaming

Expression categories: Boolean, String

Output type: Boolean

Example

Argument values:

  • Expression: ... Hello world
  • Ignore case: true
  • Value: hello

Output: true

See details.


Substring

Supported in: Batch, Streaming

Extract substring.

Expression categories: Numeric

Output type: String

Example

Argument values:

  • Expression: string
  • Start: start
  • Length: length
stringstartlengthOutput
hello, world15hello
hello, world85world
hello, world-55world

See details.


Subtract multiple expressions

Supported in: Batch, Streaming

Calculates the difference between a number and all input columns.

Expression categories: Numeric

Output type: Numeric

Example

Argument values:

  • Expressions list: [col_b, col_c]
  • Value to be subtracted: col_a
col_acol_bcol_cOutput
5320
240-2
-2-4-24

See details.


Subtract numbers

Supported in: Batch, Streaming

Subtract one number from another number.

Expression categories: Numeric

Output type: Numeric

Example

Argument values:

  • Left: col_a
  • Right: col_b
col_acol_bOutput
32428
-5-3-2

See details.


Subtract value from date

Supported in: Batch, Streaming

Returns the date that is 'value' days/weeks/months/quarter/years before 'start'.

Expression categories: Datetime

Output type: Date

Example

Argument values:

  • Date: 2022-04-05
  • Unit: DAYS
  • Value: 2

Output: 2022-04-03

See details.


Sum of array elements

Supported in: Batch, Streaming

Sums the elements contained within the array.

Expression categories: Array

Type variable bounds: T accepts DefiniteNumeric

Output type: T

Example

Argument values:

  • Expression: [ 1, 2, 3 ]
  • Treat null as zero: true

Output: 6

See details.


Tangent

Supported in: Batch, Streaming

Takes the tangent of an angle.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Angle unit: degrees
  • Angle value: angle
angleOutput
0.00.0
90.01.633123935319537E16
180.00.0

See details.


Text segmentation

Supported in: Batch, Streaming

Extract a series of text segments using sliding window segmentation.

Expression categories: String

Output type: Array<String>

See details.


Text to embeddings

Supported in: Batch

Converts text into embeddings.

Expression categories: String

Output type: Embedded vector

Example

Description: Example embeddings for the word 'palantir'. Argument values:

  • Model:
    ada002Embedding(

    )
  • Text column: text
  • Output mode: null
textOutput
palantir[ -0.019182289, -0.02127992, 0.009529043, -0.008066221, -0.0014429842, 0.019154688, -0.023556953, -0...

See details.


Timestamp add

Supported in: Batch, Streaming

Add value to timestamp in specified unit.

Expression categories: Datetime

Output type: Timestamp

Example

Argument values:

  • Timestamp: 2022-02-01T00:00:00Z
  • Unit: MILLISECONDS
  • Value to add: 2

Output: 2022-02-01T00:00:00.002Z

See details.


Timestamp difference

Supported in: Batch, Streaming

Returns the difference between two timestamps in the given time unit.

Expression categories: Datetime

Output type: Long

Example

Argument values:

  • End: 2022-10-01T10:00:00Z
  • Start: 2022-10-01T09:00:00Z
  • Unit: HOURS

Output: 1

See details.


Timestamp sequence

Supported in: Batch

Creates an array with timestamps in range from start to end.

Expression categories: Datetime

Output type: Array<Timestamp>

Example

Argument values:

  • End time: end_time
  • Start time: start_time
  • Step unit: DAYS
  • Step size: 1.0
start_timeend_timeOutput
2023-01-01T00:00:00Z2023-01-03T00:00:00Z[ 2023-01-01T00:00:00Z, 2023-01-02T00:00:00Z, 2023-01-03T00:00:00Z ]
2023-01-01T01:50:00Z2023-01-03T00:00:00Z[ 2023-01-01T01:50:00Z, 2023-01-02T01:50:00Z ]

See details.


Timestamp subtract

Supported in: Batch, Streaming

Subtract value from timestamp in specified unit.

Expression categories: Datetime

Output type: Timestamp

Example

Argument values:

  • Timestamp: 2022-02-02T00:00:00Z
  • Unit: MILLISECONDS
  • Value to subtract: 2

Output: 2022-02-01T23:59:59.998Z

See details.


Timestamp to epoch millis

Supported in: Batch, Streaming

Converts from timestamp in UTC to epoch milliseconds.

Expression categories: Cast, Datetime

Output type: Long

Example

Argument values:

  • Timestamp: 2022-10-01T09:00:00Z

Output: 1664614800000

See details.


Timestamp to epoch seconds

Supported in: Batch, Streaming

Converts from timestamp in UTC to epoch seconds.

Expression categories: Cast, Datetime

Output type: Long

Example

Argument values:

  • Timestamp: 2022-10-01T09:01:13.47Z

Output: 1664614873

See details.


Title case

Supported in: Batch, Streaming

Converts the first character of each word to be uppercase and the rest lowercase.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: hello world

Output: Hello World

See details.


Transcribe audio into JSON using CPU

Supported in: Batch

Transcribe audio files into JSON using CPU.

Expression categories: Media

Output type: String

See details.


Transcribe audio into JSON using GPU

Supported in: Batch

Transcribe audio files into JSON using GPU.

Expression categories: Media

Output type: String

See details.


Transcribe audio into text

Supported in: Batch

Transcribes an audio file into text.

Expression categories: Media

Output type: String | Struct<ok, error>

See details.


Transform array element

Supported in: Batch, Streaming

Maps each element of an array using an expression. Note, array index starts at 1.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array<T>

Example

Argument values:

  • Array: flight_number
  • Expression to apply:
    stringBeforeDelimiter(
     delimiter: -,
     expression: element,
     ignoreCase: false,
    )
flight_numberOutput
[ XB-134, MT-111 ][ XB, MT ]

See details.


Transform map keys

Supported in: Batch, Streaming

Transforms keys of a map by applying an expression to every key-value pair.

Expression categories: Map

Type variable bounds: K accepts AnyType**V accepts AnyType

Output type: Map<K, V>

Example

Argument values:

  • Expression to apply.:
    stringBeforeDelimiter(
     delimiter: -,
     expression: key,
     ignoreCase: false,
    )
  • Map: flight_number
flight_numberOutput
{
 MT-111 -> 2,
 XB-134 -> 1,
}
{
 MT -> 2,
 XB -> 1,
}

See details.


Transform map values

Supported in: Batch

Transforms values of a map by applying an expression to every key-value pair.

Expression categories: Map

Type variable bounds: K accepts AnyType**V accepts AnyType

Output type: Map<K, V>

Example

Argument values:

  • Expression to apply.:
    stringBeforeDelimiter(
     delimiter: -,
     expression: value,
     ignoreCase: false,
    )
  • Map: flight_number
flight_numberOutput
{
 1 -> XB-134,
 2 -> MT-111,
}
{
 1 -> XB,
 2 -> MT,
}

See details.


Trim whitespace

Supported in: Batch, Streaming

Trims whitespace at beginning and end of string. Whitespace is defined as characters in any of: 1) Unicode's \p{whitespace} set, 2) Java's String#trim() method, or 3) Java's Character#isWhitespace() method.

Expression categories: Data preparation, String

Output type: String

Example

Argument values:

  • Expression: hello world

Output: hello world

See details.


Truncate date

Supported in: Batch

Returns the date rounded down to the nearest day/week/month/quarter/year.

Expression categories: Datetime

Output type: Date

See details.


Truncate timestamp

Supported in: Batch

Returns the timestamp truncated to the specified unit.

Expression categories: Datetime

Output type: Timestamp

Example

Argument values:

  • Start: 2022-02-01T10:10:10.0022Z
  • Unit: MILLISECONDS

Output: 2022-02-01T10:10:10.002Z

See details.


Uncompact a set of H3 indices

Supported in: Batch, Streaming

Uncompact H3 indices to the specified resolution. All input indices must be at a resolution less than or equal to the requested resolution or this transform will return null. If any of the input indices are invalid this transform will return null. Output indices are sorted in ascending order.

Expression categories: Geospatial

Output type: Array<H3 Index>

See details.


Unicode normalize

Supported in: Batch, Streaming

Perform unicode normalization as per Unicode Standard Annex #15.

Expression categories: Data preparation, String

Output type: String

Example

Argument values:

  • Expression: string
  • Normalization form: nfkc
stringOutput
123123
イナゴイナゴ

See details.


Uniform random number

Supported in: Batch, Streaming

Returns a column of uniform random numbers drawn between 0 and 1. This is not deterministic and will not produce the same result on repeated builds, even when using a seed.

Expression categories: Numeric

Output type: Double

See details.


Universally unique identifier (uuid) (unstable)

Supported in: Batch, Streaming

Returns a column of uuids. This is not deterministic and will not produce the same result on repeated builds. This is not the preferred way to build an id column and users should look into sha256 or others that are deterministic.

Expression categories: String

Output type: String

See details.


Uppercase

Supported in: Batch, Streaming

Converts all characters in string to uppercase.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: hello World

Output: HELLO WORLD

See details.


Url decode

Supported in: Batch, Streaming

Decodes a percent-encoded string to plain text.

Expression categories: Cast, String

Output type: String

Example

Argument values:

  • Expression: string
stringOutput
raw_string_with_no_special_charactersraw_string_with_no_special_characters
test%2Fapi%3Fstring%3D3test/api?string=3

See details.


Url encode

Supported in: Batch, Streaming

Percent-encodes a string to be sent in a url.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: string
stringOutput
raw_string_with_no_special_charactersraw_string_with_no_special_characters
test/api?string=3test%2Fapi%3Fstring%3D3

See details.


Use LLM

Supported in: Batch

Call an LLM with a configurable prompt.

Expression categories: String

Output type: Array<AnyType> | Boolean | Date | Decimal | Double | Float | Integer | Long | Short | String | Struct | Struct<ok<AnyType> | Boolean | Date | Decimal | Double | Float | Integer | Long | Short | String | Struct | Timestamp, error> | Timestamp

Example

Argument values:

  • Model:
    gpt4ChatModel(
     temperature: 0.0,
    )
  • Prompt: prompt
  • System prompt: [In the context of a food delivery app, your job is to rate reviews given in the following user promp...]
  • Output mode: null
  • Output type: null
promptOutput
The food was great!5

See details.


Value from map

Supported in: Batch, Streaming

Get a value from a map using a key.

Expression categories: Map

Type variable bounds: K accepts ComparableType**V accepts AnyType

Output type: V

Example

Argument values:

  • Key: Foo
  • Map: {
     Bar -> World,
     Foo -> Hello,
    }

Output: Hello

See details.


Aggregate expressions


All of

Supported in: Batch

Calculate the boolean 'and' of an aggregate. Nulls are considered false.

Expression categories: Aggregate

Output type: Boolean

Example

Argument values:

  • Expression: values

Given input table:

values
true
false
true

Outputs: false

See details.


Any of

Supported in: Batch

Calculate the boolean 'or' of an aggregate. Nulls are considered false.

Expression categories: Aggregate

Output type: Boolean

Example

Argument values:

  • Expression: values

Given input table:

values
true
false
true

Outputs: true

See details.


Approximate median

Supported in: Batch

Computes approximate median of values in the column.

Expression categories: Aggregate

Output type: Numeric

Example

Argument values:

  • Expression: values

Given input table:

values
2
4
3

Outputs: 3

See details.


Approximate percentile

Supported in: Batch

Returns the approximate percentile of the expression which is the smallest value in the ordered expression values (sorted from least to greatest) such that no more than percentage of expression values is less than the value or equal to that value.

Expression categories: Aggregate

Output type: Array<Numeric> | Byte | Decimal | Double | Float | Integer | Long | Short

Example

Argument values:

  • Expression: values
  • Percentiles: [0.5]
  • Accuracy: null

Given input table:

values
2
4
3

Outputs: 3

See details.


Collect array

Supported in: Batch, Streaming

Collects an array of values within each group. Null values are ignored.

Expression categories: Aggregate

Type variable bounds: T accepts AnyType

Output type: Array<T>

Example

Argument values:

  • Expression: factor

Given input table:

factor
2
2
3

Outputs: [ 2, 2, 3 ]

See details.


Collect distinct array

Supported in: Batch, Streaming

Collects an array of deduplicated values within each group. Null values are ignored.

Expression categories: Aggregate

Type variable bounds: T accepts ComparableType

Output type: Array<T>

Example

Argument values:

  • Expression: factor

Given input table:

factor
2
2
3

Outputs: [ 2, 3 ]

See details.


Covariance

Supported in: Batch, Streaming

Calculate the population covariance of values in two columns.

Expression categories: Aggregate

Output type: Double

Example

Argument values:

  • Left: left
  • Right: right

Given input table:

leftright
15
24
33
42
51

Outputs: -2.0

See details.


Create simple geometries from ordered rows of GeoPoints

Supported in: Batch

Given a column of GeoPoints and an ordering, return either a polygon or a line string by connecting the GeoPoints in the specified order. This function assumes that the data is tabular, with a single row representing an individual GeoPoint in a line string or in the shell of a polygon, along with a column specifying the order of those points. For a polygon this ordering should identify the points as you move counter-clockwise around the shell. Given an ordering of these points and a partition (grouping), the function constructs the required geometry for that partition by joining the GeoPoints in ascending order of the order-by column.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • GeoPoint: geo_point
  • Order by (ascending): order
  • Output geometry type: LINE_STRING

Given input table:

geo_pointorder
{
 latitude -> 0.0,
 longitude -> 0.0,
}
0
{
 latitude -> 1.0,
 longitude -> 0.0,
}
1
{
 latitude -> 1.0,
 longitude -> 1.0,
}
2

Outputs: {"type":"LineString","coordinates": [[0.0,0.0],[0.0, 1.0],[1.0,1.0]]}

See details.


Dense rank

Supported in: Batch

Returns the rank of rows within a window partition, without any gaps. In case of ties the rows get same rank. The difference between rank and dense_rank is that dense_rank leaves no gaps in ranking sequence when there are ties.

Expression categories: Aggregate

Output type: Integer

See details.


Distinct count

Supported in: Batch, Streaming

Calculate distinct number of values in column.

Expression categories: Aggregate

Output type: Long

Example

Argument values:

  • Expression: values

Given input table:

values
2
4
3

Outputs: 3

See details.


First

Supported in: Batch, Streaming

First item in the group. Note, if used within an aggregate or unordered window, the row selected will be non-deterministic.

Expression categories: Aggregate

Type variable bounds: T accepts AnyType

Output type: T

Example

Argument values:

  • Expression: values
  • Ignore nulls: false

Given input table:

values
null
2
4
3

Outputs: null

See details.


Grouped geometry envelope

Supported in: Batch

Returns the envelope of all valid geometries in the given column. Invalid geometries are treated as null and ignored.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Expression: geometry

Given input table:

geometry
{"type":"LineString","coordinates":[[1,0],[0,8.4]]}
{"type":"Point","coordinates":[125.6, -92.3]}
{"type":"Polygon","coordinates":[[[0,0],[1,6.3],[-6,1],[0,0]]]}

Outputs: {"type":"Polygon","coordinates":[[[-6.0,-92.3],[-6.0,8.4],[125.6,8.4],[125.6,-92.3],[-6.0,-92.3]]]}

See details.


Grouped geometry union

Supported in: Batch

Combines the grouped geometries to create a single geometry.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Expression: geometry

Given input table:

geometry
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}
{"type":"Polygon","coordinates":[[[0.5,0.0],[1.5,0.0],[1.5,1.0],[0.5,1.0],[0.5,0.0]]]}

Outputs: {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[0.5,1.0],[1.0,1.0],[1.5,1.0],[1.5,0.0],[1.0,0.0],[0.5,0.0],[0.0,0.0]]]}

See details.


Grouped latitude/longitude bounding box

Supported in: Batch

Returns a struct containing the entire bounding box of all valid geometries in the given column. Invalid geometries are treated as null and ignored.

Expression categories: Geospatial

Output type: LatLonBoundingBox

Example

Argument values:

  • Expression: geometry

Given input table:

geometry
{"type":"LineString","coordinates":[[1,0],[0,8.4]]}
{"type":"Point","coordinates":[125.6, -92.3]}
{"type":"Polygon","coordinates":[[[0,0],[1,6.3],[-6,1],[0,0]]]}

Outputs: {
 maxLat -> 8.4,
 maxLon -> 125.6,
 minLat -> -92.3,
 minLon -> -6.0,
}

See details.


Lag

Supported in: Batch

Returns the value of the input at 'lag' before the current row in the window.

Expression categories: Aggregate

Type variable bounds: T accepts AnyType

Output type: T

See details.


Last

Supported in: Batch, Streaming

Last item in the group. Note, if used within an aggregate or unordered window, the row selected will be non-deterministic.

Expression categories: Aggregate

Type variable bounds: T accepts AnyType

Output type: T

Example

Argument values:

  • Expression: values
  • Ignore nulls: false

Given input table:

values
2
4
3
null

Outputs: null

See details.


Lead

Supported in: Batch

Returns the value of the input at 'lead' after the current row in the window.

Expression categories: Aggregate

Type variable bounds: T accepts AnyType

Output type: T

See details.


Linear regression gradient

Supported in: Batch

Returns the slope of the linear regression line for non-null pairs in a group. Returns null if there are insufficient non-null pairs or if the variance of the independent variable is zero.

Expression categories: Aggregate

Output type: Double

Example

Argument values:

  • Left: left
  • Right: right

Given input table:

leftright
15
24
33
42
51

Outputs: -1.0

See details.


Max

Supported in: Batch, Streaming

Calculate maximum value in column.

Expression categories: Numeric

Output type: ComparableType

Example

Argument values:

  • Expression: values

Given input table:

values
2
4
3

Outputs: 4

See details.


Max by

Supported in: Streaming

This expression computes a max row according to the max column expression after applying the provided filter specification. If there is no maximum row, null will be returned.

Expression categories: Aggregate

Output type: AnyType

Example

Argument values:

  • Expression: salary
  • Output projection expression: salary
  • Filter condition:
    lessThan(
     left: salary,
     right: 5000,
    )

Given input table:

dep_namesalary
develop9900
develop4000
develop3000

Outputs: 4000

See details.


Mean

Supported in: Batch, Streaming

Calculate mean of values in column.

Expression categories: Numeric

Output type: Decimal | Double

Example

Argument values:

  • Expression: values

Given input table:

values
2
4
3

Outputs: 3.0

See details.


Median

Supported in: Batch

Calculate median of values in column.

Expression categories: Numeric

Output type: Decimal | Double

Example

Argument values:

  • Expression: values

Given input table:

values
2
4
3

Outputs: 3.0

See details.


Min

Supported in: Batch, Streaming

Calculate minimum value in column.

Expression categories: Numeric

Output type: ComparableType

Example

Argument values:

  • Expression: values

Given input table:

values
2
4
3

Outputs: 2

See details.


Min by

Supported in: Streaming

This expression computes a min row according to the min column expression after applying the provided filter specification. If there is no minimum row, null will be returned.

Expression categories: Aggregate

Output type: AnyType

Example

Argument values:

  • Expression: salary
  • Output projection expression: salary
  • Filter condition:
    greaterThan(
     left: salary,
     right: 0,
    )

Given input table:

dep_namesalary
develop-999
develop4000
develop3000

Outputs: 3000

See details.


Mode

Supported in: Batch

Calculate mode of values in column.

Expression categories: Aggregate

Type variable bounds: Any accepts Binary | Boolean | Byte | Date | Decimal | Double | Float | Integer | Long | Short | String | Timestamp

Output type: Any

Example

Argument values:

  • Expression: values

Given input table:

values
a
b
b
b
c
c
d

Outputs: b

See details.


Percent rank

Supported in: Batch

Returns the percentile of rows within a window partition. A draw is assigned the same percent.

Expression categories: Aggregate

Output type: Double

See details.


Pivot

Supported in: Streaming

Apply an aggregate expression in a pivot context. The aggregation will run as a set of separate aggregations scoped to each distinct value of the pivot expression. The output is a map from pivot value to aggregate expression value.

Expression categories: Aggregate

Type variable bounds: K accepts ComparableType**V accepts AnyType

Output type: Map<K, V>

Example

Argument values:

  • Aggregate expression:
    sum(
     expression: value,
    )
  • Pivot expression: pivot

Given input table:

pivotvalue
a1
b2
a3

Outputs: {
 a -> 4,
 b -> 2,
}

See details.


Product

Supported in: Batch

Calculates the product of all input columns.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Expression: factor

Given input table:

factor
2
4
3

Outputs: 24.0

See details.


Rank

Supported in: Batch

Returns the rank of rows within a window partition. In case of ties the rows get same rank. The difference between rank and dense_rank is that rank leaves gaps in ranking sequence when there are ties.

Expression categories: Aggregate

Output type: Integer

See details.


Row count

Supported in: Batch, Streaming

Counts the number of non null rows in a group.

Expression categories: Aggregate

Output type: Long

Example

Argument values:

  • Expression: values

Given input table:

values
2
4
3

Outputs: 3

See details.


Row number

Supported in: Batch, Streaming

Returns a sequential number starting at 1 inside each partition.

Expression categories: Aggregate

Output type: Integer

See details.


Sample covariance

Supported in: Batch, Streaming

Calculate the sample covariance of values in two columns.

Expression categories: Aggregate

Output type: Double

Example

Argument values:

  • Left: left
  • Right: right

Given input table:

leftright
15
24
33
42
51

Outputs: -2.5

See details.


Sample variance

Supported in: Batch, Streaming

Calculate the sample variance of values in column.

Expression categories: Aggregate

Output type: Double

Example

Argument values:

  • Expression: values

Given input table:

values
2
2
3

Outputs: 0.33333333333

See details.


Standard deviation

Supported in: Batch

Calculate standard deviation of the values in column.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Expression: values

Given input table:

values
2
4
3

Outputs: 0.81649658092773

See details.


Sum

Supported in: Batch, Streaming

Sums the specified expression.

Expression categories: Numeric

Output type: Decimal | Double | Long

Example

Argument values:

  • Expression: values

Given input table:

values
2
4
3

Outputs: 9

See details.


Variance

Supported in: Batch, Streaming

Calculate population variance of values in column.

Expression categories: Aggregate

Output type: Double

Example

Argument values:

  • Expression: values

Given input table:

values
2
4
3

Outputs: 0.66666666667

See details.


Generator expressions


Explode array

Supported in: Batch, Streaming

Explode array into a row per value.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: T

See details.


Explode array with position

Supported in: Batch, Streaming

Explode array into a row per value as a struct containing the element's relative position in the array and the element itself.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Struct<Optional[position], Optional[element]>

Example

Argument values:

  • Array: array
  • Keep empty / null arrays: null

Given input table:

array
[ one, two, three ]
[ four, five ]

Expected output table: | array | | ----- | | {
 element -> one,
 position -> 1,
} | | {
 element -> two,
 position -> 2,
} | | {
 element -> three,
 position -> 3,
} | | {
 element -> four,
 position -> 1,
} | | {
 element -> five,
 position -> 2,
} |

See details.


Explode map

Supported in: Batch, Streaming

Explode map into a row per key, value pair.

Expression categories: Map

Type variable bounds: TKey accepts AnyType**TValue accepts AnyType

Output type: Struct<Optional[key], Optional[value]>

Example

Argument values:

  • Expression: map

Given input table:

map
{
 1 -> val1,
 2 -> val2,
}
{
 3 -> val3,
 4 -> val4,
}

Expected output table: | map | | ----- | | {
 key -> 1,
 value -> val1,
} | | {
 key -> 2,
 value -> val2,
} | | {
 key -> 3,
 value -> val3,
} | | {
 key -> 4,
 value -> val4,
} |

See details.


Transforms


Aggregate

Supported in: Batch

Performs the specified aggregations on the input dataset grouped by a set of columns.

Transform categories: Aggregate, Popular

Example

Argument values:

  • Aggregations: [
    alias(
     alias: factor,
     expression:
    sum(
     expression: factor,
    ),
    )]
  • Dataset: ri.foundry.main.dataset.aggregate
  • Group by columns: [tail_number]

Input:

tail_numberairlinemilesfactor
XB-123foundry air1242
MT-222new airline11235
XB-123foundry airline3355
MT-222new air5654
KK-452new air2221
XB-123foundry airline11343

Output:

tail_numberfactor
XB-12310
MT-2229
KK-4521

See details.


Aggregate on condition

Supported in: Batch

Aggregate expressions based on a condition statement.

Transform categories: Aggregate, Popular

See details.


Aggregate over window

Supported in: Streaming

Performs the specified aggregations on the data within a window, emitting outputs as specified by the provided trigger.

Transform categories: Aggregate

See details.


Anti join

Supported in: Batch

Anti joins left and right dataset inputs, removing all rows from the left relation that match the provided condition.

Transform categories: Join

Example

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [tail_number, airline],
    )
  • Join condition:
    equals(
     left: tail_number,
     right: tail_number,
    )
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right

Inputs: ri.foundry.main.dataset.left

tail_numberairlinemilesfactor
XB-123foundry air1242
MT-222new airline11235
XB-123foundry airline3355
MT-222new air5654
KK-452new air2221
PA-452new air2122
XB-123foundry airline11342

ri.foundry.main.dataset.right

tail_numberhome_airport
XB-123LHR
MT-222CPH
KK-452JFK
JR-201IAD

Output:

tail_numberairline
PA-452new air

See details.


Apply expression

Supported in: Batch, Streaming

Transforms input dataset by applying a single expression.

Transform categories: Popular

Example

Argument values:

  • Dataset: ri.foundry.main.dataset.a
  • Expression:
    alias(
     alias: kilometers,
     expression:
    convertDistance(
     amount: miles,
     currentUnit: mile,
     targetUnit: kilometer,
    ),
    )

Input:

airlinemiles
foundry airways2500
new air3000

Output:

kilometersairlinemiles
4023.36foundry airways2500
4828.03new air3000

See details.


Apply multiple expressions

Supported in: Batch, Streaming

Transforms input dataset either by selecting columns or applying functions to columns.

Transform categories: Popular

Example

Argument values:

  • Columns: [
    alias(
     alias: airline,
     expression: airlin,
    )]
  • Dataset: ri.foundry.main.dataset.a
  • Keep remaining columns: false

Input:

airlinmiles
foundry airways2500
new air3000

Output:

airline
foundry airways
new air

See details.


Apply to multiple columns

Supported in: Batch, Streaming

Transforms input dataset either by selecting columns or applying functions to columns.

Transform categories: Popular

See details.


Array elements to columns

Supported in: Batch

Extracts elements from an array into columns.

Transform categories: Array

Example

Argument values:

  • Array: stats
  • Columns to extract: [miles, id]
  • Dataset: ri.foundry.main.dataset.a

Input:

stats
[ 1000, 2 ]

Output:

milesidstats
10002[ 1000, 2 ]

See details.


Assign timestamps and watermarks

Supported in: Streaming

Assigns timestamps and watermarks to the input, filtering out records where the timestamp is null.

Transform categories: Other

Example

Argument values:

  • Dataset: ri.foundry.main.dataset.a
  • Timestamp expression: timestamp
  • Emit watermark on every record: null

Input:

timestamptemperaturesensor_id
1969-12-31T23:59:50Z28sensor_1
1969-12-31T23:59:40Z30sensor_2
1969-12-31T23:59:35Z29sensor_1

Output:

timestamptemperaturesensor_id
1969-12-31T23:59:50Z28sensor_1
1969-12-31T23:59:40Z30sensor_2
1969-12-31T23:59:35Z29sensor_1

See details.


Coalesce data

Supported in: Batch

Operation to reduce the number of partitions. If you have 1000 partitions and you coalesce to 100 there will not be a shuffle, instead each of the 100 new partitions will claim 10 of the current partitions. If a larger number of partitions is requested, it will stay at the current number of partitions.

Transform categories: Other

See details.


Compute if expression absent

Supported in: Batch

Computes the expression for new rows, the value for a given key will only ever be computed once, even across builds.

Transform categories: Other

See details.


Convert media set to table rows

Supported in: Batch

Produces a dataset containing media references and basic metadata for media items in a media set. Use this transform first to apply other media transforms.

Transform categories: File, Media

See details.


Cross join

Supported in: Batch

Cross joins left and right dataset inputs together, matching all rows from each side against all rows from the other. The output is the cartesian product of the two datasets.

Transform categories: Join

Example

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [tail_number, airline],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [home_airport],
    )
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs: ri.foundry.main.dataset.left

tail_numberairlinemilesfactor
XB-123foundry air1242
MT-222new airline11235
PA-452new air2122

ri.foundry.main.dataset.right

tail_numberhome_airport
XB-123LHR
MT-222CPH
KK-452JFK
JR-201IAD

Output:

tail_numberairlinehome_airport
XB-123foundry airLHR
XB-123foundry airCPH
XB-123foundry airJFK
XB-123foundry airIAD
MT-222new airlineLHR
MT-222new airlineCPH
MT-222new airlineJFK
MT-222new airlineIAD
PA-452new airLHR
PA-452new airCPH
PA-452new airJFK
PA-452new airIAD

See details.


Date distribution

Supported in: Batch

Computes the distribution of dates/timestamps in a specified column.

Transform categories: Datetime

See details.


Drop columns

Supported in: Batch, Streaming

Transforms input dataset by dropping the specified columns.

Transform categories: Popular

Example

Argument values:

  • Columns to drop: {miles}
  • Dataset: ri.foundry.main.dataset.a

Input:

airlinemilesairports
foundry airways3000[ JFK, SFO ]

Output:

airlineairports
foundry airways[ JFK, SFO ]

See details.


Drop duplicates

Supported in: Batch

Drops duplicate rows from the input.

Transform categories: Other

Example

Argument values:

  • Dataset: ri.foundry.main.dataset.aggregate
  • Column subset: {tail_number}

Input:

tail_numberairlinemilesfactor
XB-123foundry air1242
MT-222new airline11235
XB-123foundry airline3355
MT-222new air5654
KK-452new air2221
XB-123foundry airline11343

Output:

tail_numberairlinemilesfactor
XB-123foundry air1242
MT-222new airline11235
KK-452new air2221

See details.


Empty file

Supported in: Batch

Creates an empty file.

Transform categories: Other

See details.


Empty media set file

Supported in: Batch, Streaming

Creates an empty media set file with the given schema and snapshot read mode.

Transform categories: Other

See details.


Empty table

Supported in: Batch, Streaming

Creates an empty table with the given schema and read mode.

Transform categories: Other

Example

Argument values:

  • Schema: Struct<flight_code, flight_number, airline>

Inputs: Output:

flight_codeflight_numberairline

See details.


Extract file metadata from dataset as rows

Supported in: Batch

Reads file metadata as rows from a dataset of files.

Transform categories: File

See details.


Extract many struct fields

Supported in: Batch

Extracts many fields from a struct. Original struct will be dropped.

Transform categories: Struct

Example

Argument values:

  • Dataset: ri.foundry.main.dataset.a
  • Locators: [(airline.name, airline), (tail_no, tail_number)]
  • Struct: raw

Input:

raw
{
airline: {
id: NA,
name: new air,
},
tail_no: NA-123,
}
{
airline: {
id: FA,
name: foundry airways,
},
tail_no: FA-123,
}

Output:

airlinetail_number
new airNA-123
foundry airwaysFA-123

See details.


Extract rows from a CSV file

Supported in: Batch

Reads a dataset of files and parses each CSV file into rows.

Transform categories: File

See details.


Extract rows from a GeoJSON file

Supported in: Batch

Reads a dataset of files and parses each GeoJSON file into rows. The output dataset will have a geometry column, and a column for each property listed by the user, apart from the _error and _file columns. If the user provides no properties to extract, the entire properties struct will be extracted into a properties column as a string. All GeoJSONs in the files must either be: a) multiline FeatureCollection: an entire file with one GeoJSON of type FeatureCollection b) single-line Feature: a file where every line is a fully valid GeoJSON of type Feature.

Transform categories: File, Geospatial

See details.


Extract rows from a JSON file

Supported in: Batch

Reads a dataset of files and parses each JSON file into rows.

Transform categories: File, String, Struct

See details.


Extract rows from a dataset of email files

Supported in: Batch

Reads a dataset of email files and parses each file into a row. Supported file extensions: .eml, .emltpl, and .msg.

Transform categories: File, Media

See details.


Extract rows from a dataset of text files

Supported in: Batch

Reads a dataset of text files and parses each file into a row.

Transform categories: File, String

See details.


Extract rows from an Excel file

Supported in: Batch

Reads a dataset of Microsoft Excel files and parses each file into rows. Supported file formats: .xls, .xlt, .xltm, .xltx, .xlsx, .xlsm.

The processing of individual Excel files is not distributed across multiple Spark executors, so we recommend enabling the usage of local Spark in build settings if the input dataset is expected to have exactly one file.

Particularly large Excel files can require a lot of memory to process, so if you observe builds failing with out-of-memory errors, consider using custom build settings with increased executor memory (or increased driver memory in the case of local Spark). For such large files, it may not be possible to preview the output, but deployment can still succeed given appropriate build settings.

Transform categories: File

See details.


Extract rows from an XML file

Supported in: Batch

Reads a dataset of files and parses each XML file into rows.

Transform categories: File

See details.


Extract rows from shapefile

Supported in: Batch

Reads a dataset of files and parses each shapefile into rows. All files except .shp, .shx and .dbf files will be ignored. This shapefile parser only supports point, polyline, polygon and multipoint geometry types. The output dataset will have a geometry column, and a column for each property listed by the user, apart from the _error and _file columns. If the user provides no properties to extract, the entire properties struct will be extracted into a properties column as a string. UTF-8 is the only supported encoding for property names and values (even if a .cpg file that specifies an alternative coding exists, it will be ignored).

Transform categories: File, Geospatial

See details.


Filter

Supported in: Batch, Streaming

Filters the input dataset based on the specified filter condition.

Transform categories: Data preparation, Popular

Example

Argument values:

  • Dataset: ri.foundry.main.dataset.a
  • Filter condition: recently_serviced

Input:

recently_servicedtail_number
trueKK-150
falseXB-120
trueMT-190

Output:

recently_servicedtail_number
trueKK-150
trueMT-190

See details.


Filter files

Supported in: Batch

Filters a dataset of files.

Transform categories: File

See details.


First union by name

Supported in: Batch

Unions a set of datasets together on columns from the first dataset, adding nulls when columns are missing. Columns that are not present in the first dataset are removed.

Transform categories: Join

Example

Argument values:

  • Datasets to union: [ri.foundry.main.dataset.a, ri.foundry.main.dataset.b]

Inputs: ri.foundry.main.dataset.a

recently_servicedtail_numberairline_code
trueKK-150KK
falseXB-120XB
trueMT-190MT

ri.foundry.main.dataset.b

recently_servicedtail_numberhome_country
trueAA-200US
trueBN-435UK
trueBN-111UK

Output:

recently_servicedtail_numberairline_code
trueKK-150KK
falseXB-120XB
trueMT-190MT
trueAA-200null
trueBN-435null
trueBN-111null

See details.


Flatten struct

Supported in: Batch, Streaming

Take all fields in a struct and turn them into columns in the output dataset.

Transform categories: Struct

Example

Argument values:

  • Dataset: ri.foundry.main.dataset.a
  • Expression: raw
  • Max depth: 2
  • Column prefix: new_
  • Separator: null

Input:

raw
{
airline: {
id: NA,
name: new air,
},
tail_no: NA-123,
}
{
airline: {
id: FA,
name: foundry airways,
},
tail_no: FA-123,
}

Output:

new_airline_namenew_airline_idnew_tail_noraw
new airNANA-123{
airline: {
id: NA,
name: new air,
},
tail_no: NA-123,
}
foundry airwaysFAFA-123{
airline: {
id: FA,
name: foundry airways,
},
tail_no: FA-123,
}

See details.


Frequent pattern growth

Supported in: Batch

Frequent pattern (fp) growth finds frequent patterns in your dataset.

Transform categories: Aggregate, Other

Example

Argument values:

  • Input dataset: ri.foundry.main.dataset.a
  • Items column: customer_attributes
  • Minimum support: 0.6

Input:

customer_attributes
[ age_group: 20-30, country: Germany, gender: Female ]
[ age_group: 20-30, country: Germany, gender: Male ]

Output:

patternpattern_occurrencetotal_count
[ country: Germany, age_group: 20-30 ]22
[ age_group: 20-30 ]22
[ country: Germany ]22

See details.


Geo distance inner join

Supported in: Batch

Inner joins left and right datasets together based on the distance between input geometries. Internally converts geometries into the given projected coordinate reference system prior to the join and back to WGS84.

Transform categories: Geospatial, Join

Example

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [geometryColLhs, lhs-1],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [geometryCol, arrayCol],
    )
  • Distance: 10.0
  • Join key: (geometryColLhs, geometryCol)
  • Left dataset: ri.foundry.main.dataset.left
  • Projected coordinate system: EPSG:4326
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: rhs_

Inputs: ri.foundry.main.dataset.left

geometryColLhslhs-1
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]}42.0
{"coordinates": [55.0, 5.0], "type":"Point"}43.0
{"coordinates": [[25.0, 0.0], [0.0, 25.0]], "type":"LineString"}44.0

ri.foundry.main.dataset.right

geometryColcol1arrayCol
{"coordinates": [[[20.0, 10.0], [27.0, 10.0], [27.0, 17.0], [20.0, 17.0], [20.0, 10.0]]], "type": "Polygon"}rhsVal1[ 0.0, 1.0 ]
{"coordinates": [[[21.0, 21.0], [27.0, 21.0], [27.0, 27.0], [21.0, 27.0], [21.0, 21.0]]], "type": "Polygon"}rhsVal2[ 0.0, 1.0 ]

Output:

geometryColLhslhs-1rhs_geometryColrhs_arrayCol
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]}42.0{"coordinates": [[[20.0, 10.0], [27.0, 10.0], [27.0, 17.0], [20.0, 17.0], [20.0, 10.0]]], "type": "Polygon"}[ 0.0, 1.0 ]
{"coordinates": [[25.0, 0.0], [0.0, 25.0]], "type":"LineString"}44.0{"coordinates": [[[20.0, 10.0], [27.0, 10.0], [27.0, 17.0], [20.0, 17.0], [20.0, 10.0]]], "type": "Polygon"}[ 0.0, 1.0 ]

See details.


Geo distance left join

Supported in: Batch

Left joins datasets together if the distance between input geometries is less than or equal to the specified distance. Internally converts geometries into the given projected coordinate reference system prior to the join and back to WGS84.

Transform categories: Geospatial, Join

Example

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [geometryColLhs, lhs-1],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [geometryColRhs, rhs-1],
    )
  • Distance: 1640.42
  • Join key: (geometryColLhs, geometryColRhs)
  • Left dataset: ri.foundry.main.dataset.left
  • Projected coordinate system: epsg:2868
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs: ri.foundry.main.dataset.left

geometryColLhslhs-1
{"coordinates": [-112.14843750000001,33.440609443703586], "type":"Point"}42.0
null43.0

ri.foundry.main.dataset.right

geometryColRhsrhs-1
{"coordinates": [-112.14560508728029,33.44082430962016], "type":"Point"}rhsVal1
{"coordinates": [-112.11796760559083,33.440895931474124], "type":"Point"}rhsVal2

Output:

geometryColLhslhs-1geometryColRhsrhs-1
{"coordinates": [-112.14843750000001,33.440609443703586], "type":"Point"}42.0{"coordinates": [-112.14560508728029,33.44082430962016], "type":"Point"}rhsVal1
null43.0nullnull

See details.


Geo intersection inner join

Supported in: Batch, Streaming

Inner joins left and right datasets together based on whether input geometries overlap. Includes just touching geometries in the results.

Transform categories: Geospatial, Join

Example

Argument values:

  • Condition for columns to select on the left:
    allColumns(

    )
  • Condition for columns to select on the right:
    allColumns(

    )
  • Join key: (geometryColLhs, geometryColRhs)
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs: ri.foundry.main.dataset.left

geometryColLhscol1Lhs
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]}42.0

ri.foundry.main.dataset.right

geometryColRhscol1Rhs
{"coordinates": [[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], "type": "Polygon"}rhsVal1
{"coordinates": [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]], "type": "Polygon"}rhsVal2
{"coordinates": [0.0, 0.0], "type":"Point"}rhsVal3
{"coordinates": [15.0, 15.0], "type":"Point"}rhsVal4
{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"}rhsVal5
{"coordinates": [[20.0, 20.0], [21.0, 23.0]], "type":"LineString"}rhsVal6
{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"}rhsVal7
{"coordinates": [[20.0, 20.0], [21.0, 23.0]], "type":"LineString"}rhsVal8
{"coordinates": [[[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"}rhsVal9
{"coordinates": [[[[170.0, 170.0], [190.0, 170.0], [190.0, 190.0], [170.0, 190.0], [170.0, 170.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"}rhsVal10

Output:

geometryColLhscol1LhsgeometryColRhscol1Rhs
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]}42.0{"coordinates": [[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], "type": "Polygon"}rhsVal1
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]}42.0{"coordinates": [0.0, 0.0], "type":"Point"}rhsVal3
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]}42.0{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"}rhsVal5
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]}42.0{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"}rhsVal7
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]}42.0{"coordinates": [[[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"}rhsVal9

See details.


Geo intersection left join

Supported in: Batch

Left joins input datasets based on whether input geometries overlap. Includes just touching geometries in the results.

Transform categories: Geospatial, Join

Example

Argument values:

  • Condition for columns to select on the left:
    allColumns(

    )
  • Condition for columns to select on the right:
    allColumns(

    )
  • Join key: (geometryColLhs, geometryColRhs)
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs: ri.foundry.main.dataset.left

geometryColLhscol1Lhs
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]}42.0
{"coordinates": [55.0, 5.0], "type":"Point"}43.0

ri.foundry.main.dataset.right

geometryColRhscol1Rhs
{"coordinates": [[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], "type": "Polygon"}rhsVal1
{"coordinates": [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]], "type": "Polygon"}rhsVal2
{"coordinates": [0.0, 0.0], "type":"Point"}rhsVal3
{"coordinates": [15.0, 15.0], "type":"Point"}rhsVal4
{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"}rhsVal5
{"coordinates": [[20.0, 20.0], [21.0, 23.0]], "type":"LineString"}rhsVal6
{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"}rhsVal7
{"coordinates": [[20.0, 20.0], [21.0, 23.0]], "type":"LineString"}rhsVal8
{"coordinates": [[[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"}rhsVal9
{"coordinates": [[[[170.0, 170.0], [190.0, 170.0], [190.0, 190.0], [170.0, 190.0], [170.0, 170.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"}rhsVal10

Output:

geometryColLhscol1LhsgeometryColRhscol1Rhs
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]}42.0{"coordinates": [[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], "type": "Polygon"}rhsVal1
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]}42.0{"coordinates": [0.0, 0.0], "type":"Point"}rhsVal3
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]}42.0{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"}rhsVal5
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]}42.0{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"}rhsVal7
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]}42.0{"coordinates": [[[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"}rhsVal9
{"coordinates": [55.0, 5.0], "type":"Point"}43.0nullnull

See details.


GeoPoint-to-GeoPoint 3d distance inner join

Supported in: Batch

Inner joins left and right datasets together based on the distance between point geometries. The geometries must represent points, and may optionally include a z-coordinate. Internally converts geometries into the given projected coordinate reference system prior to the join and back to WGS84. Non-point geometries are ignored, and the entire right dataset must be able to fit into driver and executor memory. A 3 gb executor should be able to handle up to 4 million points in the neighbors dataset.

Transform categories: Geospatial, Join

Example

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [geometryColLhs, lhs-1],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [geometryCol, arrayCol],
    )
  • Distance: 2.5
  • Join key: (geometryColLhs, geometryCol)
  • Left dataset: ri.foundry.main.dataset.left
  • Projected coordinate system: EPSG:4326
  • Right dataset: ri.foundry.main.dataset.right
  • Use z-coordinate: false
  • Prefix for columns from right: rhs_

Inputs: ri.foundry.main.dataset.left

geometryColLhslhs-1
{"coordinates": [0.0, 0.0, 0.0], "type":"Point"}42.0
{"coordinates": [0.0, 0.0, 5.0], "type":"Point"}43.0
{"coordinates": [0.0, 0.0], "type":"Point"}44.0

ri.foundry.main.dataset.right

geometryColcol1arrayCol
{"coordinates": [0.0, 0.0, 2.0], "type":"Point"}rhsVal1[ 0.0, 1.0 ]
{"coordinates": [0.0, 1.0], "type":"Point"}rhsVal2[ 0.0, 1.0 ]

Output:

geometryColLhslhs-1rhs_geometryColrhs_arrayCol
{"coordinates": [0.0, 0.0, 0.0], "type":"Point"}42.0{"coordinates": [0.0, 0.0, 2.0], "type":"Point"}[ 0.0, 1.0 ]
{"coordinates": [0.0, 0.0, 0.0], "type":"Point"}42.0{"coordinates": [0.0, 1.0], "type":"Point"}[ 0.0, 1.0 ]
{"coordinates": [0.0, 0.0, 5.0], "type":"Point"}43.0{"coordinates": [0.0, 0.0, 2.0], "type":"Point"}[ 0.0, 1.0 ]
{"coordinates": [0.0, 0.0, 5.0], "type":"Point"}43.0{"coordinates": [0.0, 1.0], "type":"Point"}[ 0.0, 1.0 ]
{"coordinates": [0.0, 0.0], "type":"Point"}44.0{"coordinates": [0.0, 0.0, 2.0], "type":"Point"}[ 0.0, 1.0 ]
{"coordinates": [0.0, 0.0], "type":"Point"}44.0{"coordinates": [0.0, 1.0], "type":"Point"}[ 0.0, 1.0 ]

See details.


Geometry intersection join

Supported in: Batch

Inner joins left and right datasets together based on whether input geometries overlap. Returns a row containing all of the columns from both datasets if the join key column pair has geometries which intersect. Currently does not support joining on multiple join keys. Silently filters null join key geometry values. Left and right datasets must not have the same column names. Silently nullifies invalid GeoJSON in join columns.

Transform categories: Geospatial, Join

Example

Argument values:

  • Join key: [(geometryColLhs, geometryColRhs)]
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right

Inputs: ri.foundry.main.dataset.left

geometryColLhslhs-1
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]}42.0

ri.foundry.main.dataset.right

geometryColRhsrhs-1
{"coordinates": [[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], "type": "Polygon"}rhsVal1
{"coordinates": [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]], "type": "Polygon"}rhsVal2
{"coordinates": [0.0, 0.0], "type":"Point"}rhsVal3
{"coordinates": [15.0, 15.0], "type":"Point"}rhsVal4
{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"}rhsVal5
{"coordinates": [[20.0, 20.0], [21.0, 23.0]], "type":"LineString"}rhsVal6
{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"}rhsVal7
{"coordinates": [[20.0, 20.0], [21.0, 23.0]], "type":"LineString"}rhsVal8
{"coordinates": [[[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"}rhsVal9
{"coordinates": [[[[170.0, 170.0], [190.0, 170.0], [190.0, 190.0], [170.0, 190.0], [170.0, 170.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"}rhsVal10

Output:

geometryColLhslhs-1geometryColRhsrhs-1
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]}42.0{"coordinates": [[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], "type": "Polygon"}rhsVal1
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]}42.0{"coordinates": [0.0, 0.0], "type":"Point"}rhsVal3
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]}42.0{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"}rhsVal5
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]}42.0{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"}rhsVal7
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]}42.0{"coordinates": [[[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"}rhsVal9

See details.


Geometry knn inner join

Supported in: Batch

Selects the k closest points from the neighbors dataset for each valid input geometry from the base dataset. Internally converts the input datasets to the given coordinate reference system, and back to WGS84. The entire neighbors dataset must be able to fit into driver and executor memory. A 3 gb executor should be able to handle up to 1 million points in the neighbors dataset.

Transform categories: Geospatial, Join

Example

Argument values:

  • Base dataset: ri.foundry.main.dataset.left
  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [geometryCol, lhsCol],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [geometryCol, col],
    )
  • Join key: (geometryCol, geometryCol)
  • K: 2
  • Neighbors dataset: ri.foundry.main.dataset.right
  • Projected coordinate system: epsg:2868
  • Prefix for columns from right: rhs_

Inputs: ri.foundry.main.dataset.left

geometryCollhsCol
{"coordinates": [-112.14843750000001,33.440609443703586], "type":"Point"}42.0

ri.foundry.main.dataset.right

geometryColcol
{
latitude: 33.440609443703586,
longitude: -112.14843750000001,
}
rhsVal1
{
latitude: 33.44082430962016,
longitude: -112.14560508728029,
}
rhsVal2
{
latitude: 33.440895931474124,
longitude: -112.11796760559083,
}
rhsVal3

Output:

geometryCollhsColrhs_geometryColrhs_col
{"coordinates": [-112.14843750000001,33.440609443703586], "type":"Point"}42.0{
latitude: 33.440609443703586,
longitude: -112.14843750000001,
}
rhsVal1
{"coordinates": [-112.14843750000001,33.440609443703586], "type":"Point"}42.0{
latitude: 33.44082430962016,
longitude: -112.14560508728029,
}
rhsVal2

See details.


Geometry knn left join

Supported in: Batch

Selects the k closest points from the neighbors dataset for each valid input geometry from the base dataset. Internally converts the input datasets to the given coordinate reference system, and back to WGS84. The entire neighbors dataset must be able to fit into driver and executor memory. A 3 gb executor should be able to handle up to 1 million points in the neighbors dataset.

Transform categories: Geospatial, Join

Example

Argument values:

  • Base dataset: ri.foundry.main.dataset.left
  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [geometryCol, lhsCol],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [geometryCol, col],
    )
  • Join key: (geometryCol, geometryCol)
  • K: 2
  • Neighbors dataset: ri.foundry.main.dataset.right
  • Projected coordinate system: epsg:2868
  • Prefix for columns from right: rhs_

Inputs: ri.foundry.main.dataset.left

geometryCollhsCol
{"coordinates": [-112.14843750000001,33.440609443703586], "type":"Point"}42.0

ri.foundry.main.dataset.right

geometryColcol
{
latitude: 33.440609443703586,
longitude: -112.14843750000001,
}
rhsVal1
{
latitude: 33.44082430962016,
longitude: -112.14560508728029,
}
rhsVal2
{
latitude: 33.440895931474124,
longitude: -112.11796760559083,
}
rhsVal3

Output:

geometryCollhsColrhs_geometryColrhs_col
{"coordinates": [-112.14843750000001,33.440609443703586], "type":"Point"}42.0{
latitude: 33.440609443703586,
longitude: -112.14843750000001,
}
rhsVal1
{"coordinates": [-112.14843750000001,33.440609443703586], "type":"Point"}42.0{
latitude: 33.44082430962016,
longitude: -112.14560508728029,
}
rhsVal2

See details.


Get media references (datasets)

Supported in: Batch

Produces a dataset containing media references and basic metadata for files in a dataset.

Transform categories: File

See details.


Heartbeat detection

Supported in: Streaming

Detects when a record hasn't been seen for a configurable amount of time for a set of keys.

Transform categories: Other

See details.


Inner join

Supported in: Batch

Joins two datasets together, keeping only rows that satisfy the provided condition from each table.

Transform categories: Join

Example

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [tail_number, airline],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [home_airport],
    )
  • Join condition:
    equals(
     left: tail_number,
     right: tail_number,
    )
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs: ri.foundry.main.dataset.left

tail_numberairlinemilesfactor
XB-123foundry air1242
MT-222new airline11235
XB-123foundry airline3355
MT-222new air5654
KK-452new air2221
PA-452new air2122
XB-123foundry airline11342

ri.foundry.main.dataset.right

tail_numberhome_airport
XB-123LHR
MT-222CPH
KK-452JFK
JR-201IAD

Output:

tail_numberairlinehome_airport
XB-123foundry airLHR
MT-222new airlineCPH
XB-123foundry airlineLHR
MT-222new airCPH
KK-452new airJFK
XB-123foundry airlineLHR

See details.


Join

Supported in: Batch, Streaming

Joins left and right dataset inputs together.

Transform categories: Join

See details.


K-means clustering

Supported in: Batch

K-means clustering is an unsupervised machine learning algorithm. It groups dataset vectors into k clusters. The k value is determined by computing the best silhouette score of the specified range between minimum k and maximum k. Number of k values defines how many k values should be tried within this range, inclusive of the boundaries.

Transform categories: Other

Example

Argument values:

  • Input dataset: ri.foundry.main.dataset.a
  • Maximum k: 12
  • Minimum k: 3
  • Number of k values: 4
  • Vector column: feature_column

Input:

feature_column
[ 0.05, 3.1, 2.3 ]
[ 1.0, 3.1, 2.3 ]
[ 1.0, 3.5, 2.3 ]
[ 19.0, 12.3, -1.4 ]

Output:

feature_columncluster_id
[ 1.0, 3.1, 2.3 ]0
[ 1.0, 3.5, 2.3 ]0
[ 19.0, 12.3, -1.4 ]1
[ 0.05, 3.1, 2.3 ]2

See details.


KNN join

Supported in: Batch

Return the 'k' nearest rows from the right dataset for each row in the left dataset, based on the distance measure.

Transform categories: Join

Example

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [tail_number, airline],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [fuzzy_airline, home_airport],
    )
  • Distance measure expression.:
    alias(
     alias: distance,
     expression:
    levenshteinDistance(
     ignoreCase: true,
     left: airline,
     right: fuzzy_airline,
    ),
    )
  • K nearest: 2
  • Left dataset: ri.foundry.main.dataset.left
  • Rank column name: rank
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs: ri.foundry.main.dataset.left

tail_numberairlinemilesfactor
XB-123foundry air1242
MT-222new airline11235
PA-452new air2122

ri.foundry.main.dataset.right

fuzzy_airlinehome_airport
airLHR
new airlineCPH
new planeJFK
old airIAD

Output:

rankdistancetail_numberairlinefuzzy_airlinehome_airport
13PA-452new airold airIAD
24PA-452new airairLHR
24PA-452new airnew airlineCPH
24PA-452new airnew planeJFK
10MT-222new airlinenew airlineCPH
24MT-222new airlinenew planeJFK
15XB-123foundry airold airIAD
28XB-123foundry airairLHR

See details.


Keeps duplicates

Supported in: Batch

Keep duplicate rows from the input.

Transform categories: Other

Example

Argument values:

  • Column subset: {tail_number}
  • Dataset: ri.foundry.main.dataset.aggregate

Input:

tail_numberairlinemilesfactor
XB-123foundry air1242
MT-222new airline11235
XB-123foundry airline3355
MT-222new air5654
KK-452new air2221
XB-123foundry airline11343

Output:

tail_numberairlinemilesfactor
XB-123foundry air1242
MT-222new airline11235
XB-123foundry airline3355
MT-222new air5654
XB-123foundry airline11343

See details.


Key by

Supported in: Streaming

Keys the input by the provided key by columns. Note that this does not re-sort the data and only maintains per key ordering from the point the keys are set. Re-keying data may be unsafe in that if the newly keyed data was depending on any specific ordering then we can't guarantee that ordering if it wasn't already maintained by the previous keying. Additionally sets the primary key if cdc (change data capture) mode is enabled. Primary key defines columns that indicate which rows are updates, deletes, and the ordering of when read as a current view.

Transform categories: Other

See details.


Left join

Supported in: Batch

Joins two datasets together, keeping all rows from the left table and only rows which satisfy the provided condition from the right table.

Transform categories: Join

Example

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [tail_number, airline],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [home_airport],
    )
  • Join condition:
    equals(
     left: tail_number,
     right: tail_number,
    )
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs: ri.foundry.main.dataset.left

tail_numberairlinemilesfactor
XB-123foundry air1242
MT-222new airline11235
XB-123foundry airline3355
MT-222new air5654
KK-452new air2221
PA-452new air2122
XB-123foundry airline11342

ri.foundry.main.dataset.right

tail_numberhome_airport
XB-123LHR
MT-222CPH
KK-452JFK
JR-201IAD

Output:

tail_numberairlinehome_airport
XB-123foundry airLHR
MT-222new airlineCPH
XB-123foundry airlineLHR
MT-222new airCPH
KK-452new airJFK
PA-452new airnull
XB-123foundry airlineLHR

See details.


Left lookup join

Supported in: Streaming

Joins two datasets together, keeping all rows from the left table and only matching rows from the right dataset.

Transform categories: Join

Example

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [tail_number, airline],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [home_airport],
    )
  • Join condition: [(tail_number, tail_number)]
  • Left dataset: ri.foundry.main.dataset.left
  • Max rows to join with a single row: 10
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs: ri.foundry.main.dataset.left

tail_numberairlinemilesfactor
XB-123foundry air1242
MT-222new airline11235
XB-123foundry airline3355
MT-222new air5654
KK-452new air2221
PA-452new air2122
XB-123foundry airline11342

ri.foundry.main.dataset.right

tail_numberhome_airport
XB-123LHR
MT-222CPH
KK-452JFK
JR-201IAD

Output:

tail_numberairlinehome_airport
XB-123foundry airLHR
MT-222new airlineCPH
XB-123foundry airlineLHR
MT-222new airCPH
KK-452new airJFK
PA-452new airnull
XB-123foundry airlineLHR

See details.


Manually entered table

Supported in: Batch, Streaming

Uses manually entered table data to create an output.

Transform categories: Other

Example

Argument values:

  • Rows: [{
    airline: foundry airlines,
    flight_code: 112,
    flight_number: XB-123,
    }, {
    airline: foundry airlines,
    flight_code: 533,
    flight_number: MT-444,
    }, {
    airline: new air,
    flight_code: 934,
    flight_number: KK-123,
    }]
  • Schema: Struct<flight_code, flight_number, airline>

Inputs: Output:

flight_codeflight_numberairline
112XB-123foundry airlines
533MT-444foundry airlines
934KK-123new air

See details.


Mapping join

Supported in: Batch

Replaces values from the target columns in the source dataset with values in the mapping dataset.

Transform categories: Join

Type variable bounds: T1 accepts AnyType**T2 accepts AnyType

Example

Argument values:

  • Input dataset: ri.foundry.main.dataset.input
  • Key column for mapping values: flight_code
  • Mapping dataset: ri.foundry.main.dataset.mapping
  • Target columns: [flight_no, next_flight]
  • Values to use for mapping: flight_number
  • Assume unique mappings: null
  • Default value: unknown

Inputs: ri.foundry.main.dataset.input

flight_nonext_flightdeparture_time
5331122022-01-20T10:45:00Z
9345332022-01-20T11:20:00Z
2229342022-01-20T11:20:00Z

ri.foundry.main.dataset.mapping

flight_codeflight_numberairline
112XB-123foundry airlines
533MT-444foundry airlines
934KK-123new air

Output:

flight_nonext_flightdeparture_time
MT-444XB-1232022-01-20T10:45:00Z
KK-123MT-4442022-01-20T11:20:00Z
unknownKK-1232022-01-20T11:20:00Z

See details.


Narrow union by name

Supported in: Batch

Unions a set of datasets together on the intersection of their column names, columns that are not present in all input datasets are removed.

Transform categories: Join

Example

Argument values:

  • Datasets to union: [ri.foundry.main.dataset.a, ri.foundry.main.dataset.b]

Inputs: ri.foundry.main.dataset.a

recently_servicedtail_number
trueKK-150
falseXB-120
trueMT-190

ri.foundry.main.dataset.b

recently_servicedtail_numberairline_code
trueAA-200AA
trueBN-435BN
trueBN-111BN

Output:

recently_servicedtail_number
trueKK-150
falseXB-120
trueMT-190
trueAA-200
trueBN-435
trueBN-111

See details.


New operator chain

Supported in: Streaming

Advanced flink feature, starts new operator chain here.

Transform categories: Other

See details.


Normalize column names

Supported in: Batch, Streaming

Normalizes column names to use lower_snake_case.

Transform categories: Data preparation

Example

Argument values:

  • Dataset: ri.foundry.main.dataset.a
  • Remove special characters: null

Input:

recentlyServicedtailNumber_airlineCode
trueKK-150KK
falseXB-120XB
trueMT-190MT

Output:

recently_servicedtail_numberairline_code
trueKK-150KK
falseXB-120XB
trueMT-190MT

See details.


Numeric distribution

Supported in: Batch

Computes the distribution of numeric values in a specified column.

Transform categories: Numeric

See details.


Outer caching join

Supported in: Streaming

Rows from the left & right inputs which meet all of the match conditions and are within the caching window, along with unmatched rows from both inputs.

Transform categories: Join

See details.


Outer caching join

Supported in: Streaming

Joins left and right dataset inputs together, caching the record with the highest event time from each side for use in subsequent joins. Processing time of a record is used as a tiebreaker. In the case of a time results are optimistically emitted if there's no value to join against.

Transform categories: Join

See details.


Outer join

Supported in: Batch

Outer joins the provided dataset inputs together, keeping all rows from both datasets. Columns have nulls when there is no row satisfying the provided condition.

Transform categories: Join

Example

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [tail_number, airline],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [home_airport],
    )
  • Join condition:
    equals(
     left: tail_number,
     right: tail_number,
    )
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs: ri.foundry.main.dataset.left

tail_numberairlinemilesfactor
XB-123foundry air1242
MT-222new airline11235
XB-123foundry airline3355
MT-222new air5654
KK-452new air2221
PA-452new air2122
XB-123foundry airline11342

ri.foundry.main.dataset.right

tail_numberhome_airport
XB-123LHR
MT-222CPH
KK-452JFK
JR-201IAD

Output:

tail_numberairlinehome_airport
XB-123foundry airLHR
MT-222new airlineCPH
XB-123foundry airlineLHR
MT-222new airCPH
KK-452new airJFK
PA-452new airnull
XB-123foundry airlineLHR
JR-201nullIAD

See details.


Parse KML files into geometry lists

Supported in: Batch

Parses each raw KML file into a list of typed geometries.

Transform categories: File

See details.


Pivot

Supported in: Batch

Performs the specified aggregations on the input dataset grouped by a set of columns. Unique values to pivot on must be provided such that the output schema is known ahead of runtime. This improves runtime stability over time.

Transform categories: Aggregate, Popular

Type variable bounds: T accepts Boolean | Byte | Integer | Long | Short | String

Example

Argument values:

  • Aggregations: [
    alias(
     alias: miles,
     expression:
    mean(
     expression: miles,
    ),
    )]
  • Dataset: ri.foundry.main.dataset.a
  • Group by columns: [airline]
  • Pivot by column: airport
  • Pivot by values: [(JFK, new_york), (LHR, london)]
  • Prefix or suffix alias: null

Input:

airlineairportmiles
foundry airwaysJFK1002345
foundry airwaysLHR2221324
new airSFO21356673
new airJFK12323456
foundry airwaysLHR12542352
new airJFK12232355

Output:

airlinenew_york_mileslondon_miles
foundry airways1002345.07381838.0
new air1.22779055E7null

See details.


Project over window

Supported in: Batch, Streaming

Performs the specified aggregations on the data within the window. Emits one row each time a new row is received.

Transform categories: Aggregate

See details.


Rename columns

Supported in: Batch, Streaming

Renames a set of columns.

Transform categories: Data preparation, Popular

Example

Argument values:

  • Input dataset: ri.foundry.main.dataset.a
  • Renames: [(recently_serviced, does_not_require_service)]

Input:

recently_servicedtail_numberairline_code
trueKK-150KK
falseXB-120XB
trueMT-190MT

Output:

does_not_require_servicetail_numberairline_code
trueKK-150KK
falseXB-120XB
trueMT-190MT

See details.


Repartition data

Supported in: Batch

Forces a shuffle of the data based on optionally provided partitioning columns and a resulting number of partitions. If these are not provided, the partitioning will be determined automatically.

Transform categories: Other

See details.


Rollup

Supported in: Batch

Performs the specified aggregations on the input dataset at different levels of granularity, providing both intermediate and super aggregates.

Transform categories: Aggregate

Example

Argument values:

  • Aggregations: [
    alias(
     alias: mean_price,
     expression:
    mean(
     expression: price,
    ),
    )]
  • Dataset: ri.foundry.main.dataset.rollupBaseCase
  • Rollup columns: [city, model]

Input:

citymodelpricestore
Londonnew phone900.0MegaMart
Londonnew phone850.75AA
Londonnew phone870.75ABC Zone
San Francisconew phone1000.0Prescos
San Francisconew phone950.25XZY Force
San Francisconew phone1105.7Phone Mart
LondonforestX 20750.1MegaMart
LondonforestX 20690.0AA
LondonforestX 20730.0ABC Zone
San FranciscoforestX 20890.4Prescos
San FranciscoforestX 20900.1XZY Force
San FranciscoforestX 201050.75Phone Mart

Output:

citymodelmean_price
Londonnew phone873.8333333333334
LondonforestX 20723.3666666666667
Londonnull798.6
San Francisconew phone1018.65
San FranciscoforestX 20947.0833333333334
San Francisconull982.8666666666667
nullnull890.7333333333335

See details.


Row size

Supported in: Batch

Estimates the size of a single row in the JVM.

Transform categories: Other

See details.


Select columns

Supported in: Batch, Streaming

Selects a set of columns from the input dataset.

Transform categories: Popular

See details.


Semi join

Supported in: Batch

Semi joins left and right dataset inputs together. This removes all rows that don't match the join condition.

Transform categories: Join

Example

Argument values:

  • Condition for columns to select on the left:
    allColumns(

    )
  • Join condition:
    equals(
     left: tail_number,
     right: tail_number,
    )
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right

Inputs: ri.foundry.main.dataset.left

tail_numberairlinemilesfactor
XB-123foundry air1242
MT-222new airline11235
XB-123foundry airline3355
MT-222new air5654
KK-452new air2221
PA-452new air2122
XB-123foundry airline11342

ri.foundry.main.dataset.right

tail_numberhome_airport
XB-123LHR
MT-222CPH
KK-452JFK
JR-201IAD

Output:

tail_numberairlinemilesfactor
XB-123foundry air1242
MT-222new airline11235
XB-123foundry airline3355
MT-222new air5654
KK-452new air2221
XB-123foundry airline11342

See details.


Sort

Supported in: Batch

Transforms input dataset either by selecting columns or applying functions to columns.

Transform categories: Other

Example

Argument values:

  • Dataset: ri.foundry.main.dataset.a
  • Sort specification: [(b, DESCENDING)]

Input:

ab
12
34
56

Output:

ab
56
34
12

See details.


Split on condition

Supported in: Batch

Split an input into two outputs based on chosen condition.

Transform categories: Other

See details.


Text block

Supported in: Batch, Streaming

Insert a text description between your transformations. This does not transform the input data in any way.

Transform categories: Other

See details.


Time bounded drop duplicates

Supported in: Streaming

Drops duplicate rows from the input for given column subset, rows seen will expire after configured amount of event time. Row that arrive late by an amount greater than the configured amount of event time will always be dropped. Partitions by keys specified. Each drop duplicates will be computed separately for distinct key column values.

Transform categories: Other

See details.


Time bounded drop out of order

Supported in: Streaming

Drops rows with the same values for all key columns that are out of order. A row is out of order if it would have come before an already received row with the same key values based on sort columns and directions. Two rows are compared by evaluating the first sort column and direction first, and then moving on to the next sort column and direction if and only if there was a tie, and so on until order is determined or all sort columns are tied in which case the rows are equal. The current maximum for each key is stored until no new rows have been seen for that key for an event time greater than or equal to the expiry. After a key has received no new rows for greater or equal to the expiry time, any new row for that key will be never be dropped, and will always be stored as the new current maximum.

Transform categories: Other

See details.


Time bounded event time sort

Supported in: Streaming

Emits rows by key in ascending event time order, allowing for late arriving records up until at least the allowed lateness. Records arriving after the allowed lateness plus some small buffer interval will be dropped.

Transform categories: Other

See details.


Top rows

Supported in: Batch

Picks the top rows in each sorted partition.

Transform categories: Aggregate

Example

Argument values:

  • Dataset: ri.foundry.main.dataset.a
  • Partition by columns: {airline}
  • Sort specification: [(airport, DESCENDING), (miles, ASCENDING)]
  • Number of rows: null

Input:

airlineairportmiles
foundry airwaysJFK1002345
foundry airwaysLHR2221324
new airSFO21356673
new airJFK12323456
foundry airwaysLHR12542352
new airJFK12232355

Output:

airlineairportmiles
foundry airwaysLHR2221324
new airSFO21356673

See details.


Union by name

Supported in: Batch, Streaming

Unions a set of datasets together on matching column names.

Transform categories: Join

Example

Argument values:

  • Datasets to union: [ri.foundry.main.dataset.a, ri.foundry.main.dataset.b]

Inputs: ri.foundry.main.dataset.a

recently_servicedtail_numberairline_code
trueKK-150KK
falseXB-120XB
trueMT-190MT

ri.foundry.main.dataset.b

recently_servicedtail_numberairline_code
trueAA-200AA
trueBN-435BN
trueBN-111BN

Output:

recently_servicedtail_numberairline_code
trueKK-150KK
falseXB-120XB
trueMT-190MT
trueAA-200AA
trueBN-435BN
trueBN-111BN

See details.


Union files

Supported in: Batch

Union datasets of files.

Transform categories: File

See details.


Unpivot

Supported in: Batch, Streaming

Unpivot is the opposite operation of pivot. This converts multiple columns into rows, transforming data from a wide format to a long format. To do so it creates two new columns: one containing the original column names as values, and another containing the corresponding data values. All other columns that are not unpivoted are kept as is.

Transform categories: Aggregate, Popular

Type variable bounds: T accepts AnyType

Example

Argument values:

  • Columns to unpivot: [new_york_miles, london_miles]
  • Dataset: ri.foundry.main.dataset.a
  • Name column: city
  • Value column: miles

Input:

airlinenew_york_mileslondon_miles
foundry airways10006000
new airnull8000

Output:

citymilesairline
new_york_miles1000foundry airways
london_miles6000foundry airways
new_york_milesnullnew air
london_miles8000new air

See details.


Unzip files

Supported in: Batch

Unzips each file in a dataset of zipped files. Any non-zip files are ignored. Note that users must have editor permission to be able to preview the unzip file transform and all downstream nodes.

Transform categories: File

See details.


Wide union by name

Supported in: Batch, Streaming

Unions a set of datasets together on the superset of their column names, adding nulls when columns are missing.

Transform categories: Join

Example

Argument values:

  • Datasets to union: [ri.foundry.main.dataset.a, ri.foundry.main.dataset.b]

Inputs: ri.foundry.main.dataset.a

recently_servicedtail_number
trueKK-150
falseXB-120
trueMT-190

ri.foundry.main.dataset.b

recently_servicedtail_numberairline_code
trueAA-200AA
trueBN-435BN
trueBN-111BN

Output:

recently_servicedtail_numberairline_code
trueKK-150null
falseXB-120null
trueMT-190null
trueAA-200AA
trueBN-435BN
trueBN-111BN

See details.


Window

Supported in: Batch

Performs the specified aggregations on the input dataset grouped by a set of columns.

Transform categories: Aggregate, Popular

See details.