Pipeline Builder provides expressions that operate at different levels. They can generally be categorized as row level, aggregations or generators.
Row level functions operate on values from a single row. Most expressions fall in this category, for example add
.
Aggregations aggregate multiple row values into one. For example the 'sum' expression.
Generators produce multiple values from a single row. For example the 'explode_array' expression
Transforms are functions that operate on a whole table or multiple tables. For example the 'drop' transform.The following document will outline the available expressions and transforms.
Supported in: Batch, Streaming
Returns the absolute value.
Expression categories: Numeric
Type variable bounds: T accepts Numeric
Output type: T
Argument values:
numeric_column
numeric_column | Output |
---|---|
0.0 | 0.0 |
1.1 | 1.1 |
-1.1 | 1.1 |
See more details here.
Supported in: Batch, Streaming
Calculates the sum of all input columns.
Expression categories: Numeric
Output type: Numeric
Argument values:
col_a
, col_b
]col_a | col_b | Output |
---|---|---|
0 | 1 | 1 |
3 | -2 | 1 |
See more details here.
Supported in: Batch, Streaming
Updates a field of a struct or adds a new field.
Expression categories: Struct
Output type: Struct
Argument values:
value
struct
struct | value | Output |
---|---|---|
{ airline: { id: NA, }, } | 1 | { airline: { id: 1, }, } |
{ airline: { id: FE, }, } | 2 | { airline: { id: 2, }, } |
See more details here.
Supported in: Batch, Streaming
Returns the date that is 'value' days/weeks/months/quarter/years after 'start'.
Expression categories: Datetime
Output type: Date
Argument values:
DAYS
Output: 2022-02-03
See more details here.
Supported in: Batch, Streaming
Return true if the expression is true for all elements in the array.
Expression categories: Array
Output type: Boolean
Argument values:
miles
element
,miles | Output |
---|---|
[ 12300, null ] | false |
[ null, null ] | true |
See more details here.
Supported in: Batch, Streaming
Returns true if all of the specified conditions are true. Nulls are considered false.
Expression categories: Boolean
Output type: Boolean
Argument values:
left_boolean
, right_boolean
]left_boolean | right_boolean | Output |
---|---|---|
true | true | true |
true | false | false |
false | true | false |
false | false | false |
See more details here.
Supported in: Batch, Streaming
Return true if the expression is true for any element in the array.
Expression categories: Array
Output type: Boolean
Argument values:
miles
element
,miles | Output |
---|---|
[ 12300, null ] | true |
[ 12300, 12000 ] | false |
See more details here.
Supported in: Batch, Streaming
Inverse cosine function.
Expression categories: Numeric
Output type: Double
Argument values:
radians
Output: 0.0
See more details here.
Supported in: Batch, Streaming
Inverse sine function.
Expression categories: Numeric
Output type: Double
Argument values:
radians
Output: 0.0
See more details here.
Supported in: Batch, Streaming
Inverse tangent function.
Expression categories: Numeric
Output type: Double
Argument values:
degrees
angle
angle | Output |
---|---|
-1.0 | -45.0 |
0.0 | 0.0 |
1.0 | 45.0 |
See more details here.
Supported in: Batch, Streaming
Returns the angle θ between the ray from the origin to the point (x, y) and the positive x-axis, confined to −π<θ<=π.
Expression categories: Numeric
Output type: Double
Argument values:
degrees
x
y
y | x | Output |
---|---|---|
0.0 | 0.0 | 0.0 |
1.0 | 0.0 | 90.0 |
0.0 | -1.0 | 180.0 |
-1.0 | 0.0 | -90.0 |
See more details here.
Supported in: Batch, Streaming
Calculates area of a geometry in meters squared using a spherical approximation of the globe. For a line string or a point, this equals 0.
Expression categories: Geospatial
Output type: Double
See more details here.
Supported in: Batch, Streaming
Adds a value to the array at a specified index.
Expression categories: Array
Type variable bounds: T accepts AnyType
Output type: Array<T>
Argument values:
numbers
numbers | Output |
---|---|
[ 3, 5 ] | [ 1, 3, 5 ] |
[ 2 ] | [ 1, 2 ] |
[ ] | [ 1 ] |
See more details here.
Supported in: Batch, Streaming
Compute the cartesian product of arrays.
Expression categories: Array
Output type: Array<Struct>
Argument values:
first
, second
]first | second | Output |
---|---|---|
[ 1, 2 ] | [ 3, 4 ] | [ { first: 1, second: 3, }, { first: 1, *second... |
See more details here.
Supported in: Batch, Streaming
Concatenates the provided arrays into a single array, without de-duplication.
Expression categories: Array
Type variable bounds: T accepts AnyType
Output type: Array<T>
Argument values:
Output: [ 1, 2, 3, 4, 5 ]
See more details here.
Supported in: Batch, Streaming
Returns true if the array contains the value.
Expression categories: Array, Boolean
Output type: Boolean
Argument values:
part_ids
part_ids | Output |
---|---|
[ AWE-112, BRR-123 ] | true |
[ AWE-222, ABC-543 ] | false |
See more details here.
Supported in: Batch, Streaming
Returns true if the array
contains null.
Expression categories: Array, Boolean
Output type: Boolean
Argument values:
part_ids
part_ids | Output |
---|---|
[ AWE-112, BRR-123, null ] | true |
[ AWE-222, ABC-543 ] | false |
See more details here.
Supported in: Batch, Streaming
Returns all unique elements in the left
array that are not in the right
array.
Expression categories: Array
Type variable bounds: T accepts AnyType
Output type: Array<T>
Argument values:
Output: [ 1 ]
See more details here.
Supported in: Batch, Streaming
Removes duplicates and returns distinct values from the array.
Expression categories: Array
Type variable bounds: T accepts AnyType
Output type: Array<T>
Argument values:
Output: [ 1, 2, 3 ]
See more details here.
Supported in: Batch, Streaming
Returns the element at a given position from the input array. Positions outside of the array will return null
.
Expression categories: Array
Type variable bounds: T accepts AnyType
Output type: T
Argument values:
Output: 10
See more details here.
Supported in: Batch, Streaming
Returns true if the array's elements are distinct, false otherwise. If the array is null, the returned value is false.
Expression categories: Array, Boolean
Output type: Boolean
Argument values:
part_ids
part_ids | Output |
---|---|
[ ABC-123, DCE-123, EFG-123 ] | true |
[ ABC-123, ABC-123, EFG-123 ] | false |
See more details here.
Supported in: Batch, Streaming
Creates a single array from an input nested array by unioning the elements within the first level of nesting.
Expression categories: Array
Type variable bounds: T accepts AnyType
Output type: Array<T>
Argument values:
array
array | Output |
---|---|
[ [ 1, 2, 3 ], [ 4, 5, 6 ] ] | [ 1, 2, 3, 4, 5, 6 ] |
See more details here.
Supported in: Batch, Streaming
Removes duplicates and intersects a list of arrays.
Expression categories: Array
Type variable bounds: T accepts AnyType
Output type: Array<T>
Argument values:
Output: [ 3 ]
See more details here.
Supported in: Batch, Streaming
Returns the maximum value of an array column.
Expression categories: Array
Type variable bounds: T accepts Numeric
Output type: T
Argument values:
Output: 3
See more details here.
Supported in: Batch, Streaming
Returns the minimum value of an array column.
Expression categories: Array
Type variable bounds: T accepts Numeric
Output type: T
Argument values:
Output: 1
See more details here.
Supported in: Batch, Streaming
Returns a position/index of the first occurrence of the 'value' in a given array. Returns null
when value is not found or when any of the arguments are null
.
Expression categories: Array
Type variable bounds: T accepts AnyType
Output type: Long
Argument values:
Output: 1
See more details here.
Supported in: Batch, Streaming
Returns an array after removing all provided 'value' from the given array.
Expression categories: Array
Type variable bounds: T accepts AnyType
Output type: Array<T>
Argument values:
Output: [ 2, 3 ]
See more details here.
Supported in: Batch, Streaming
Returns an array with the contents of array
concatenated value
times.
Expression categories: Array
Type variable bounds: T accepts AnyType
Output type: Array<T>
Argument values:
Output: [ 1, 2, 1, 2 ]
See more details here.
Supported in: Batch, Streaming
Reverse the order of elements in 'array'.
Expression categories: Array
Type variable bounds: T accepts AnyType
Output type: Array<T>
Argument values:
Output: [ 3, 2, 1 ]
See more details here.
Supported in: Batch, Streaming
Returns a sorted array of the given input array. All null values are placed at the end of a descending array and at the front of an ascending array.
Expression categories: Array
Type variable bounds: T accepts ComparableType
Output type: Array<T>
Argument values:
ASCENDING
Output: [ 3, 5, 6 ]
See more details here.
Supported in: Batch, Streaming
Returns a sorted array of the given input array of structs sorted by the values of the given struct keys.
Expression categories: Array
Output type: Array<Struct>
Argument values:
ASCENDING
)]Output: [ {
age: 10,
}, {
age: 20,
}, {
age: 30,
} ]
See more details here.
Supported in: Batch, Streaming
Removes duplicates and unions a list of arrays.
Expression categories: Array
Type variable bounds: T accepts AnyType
Output type: Array<T>
Argument values:
Output: [ 1, 2, 3, 4 ]
See more details here.
Supported in: Batch, Streaming
Checks if given arrays have at least one shared element.
Expression categories: Array, Boolean
Type variable bounds: T accepts AnyType
Output type: Boolean
Argument values:
Output: true
See more details here.
Supported in: Batch, Streaming
Zips a list of given arrays into a merged array of structs in which the n-th struct contains all n-th values of input arrays.
Expression categories: Array
Output type: Array<Struct>
Argument values:
first_array
, second_array
]first_array | second_array | Output |
---|---|---|
[ 1, 2, 3 ] | [ 4, 5, 6 ] | [ { first_array: 1, second_array: 4, }, { first_array: 2,<... |
See more details here.
Supported in: Batch, Streaming
Base64 decode the given expression. Uses utf-8 encoding for binary.
Expression categories: Binary, Cast, String
Output type: String
Argument values:
encoded
encoded | Output |
---|---|
Zm9v | foo |
YmFy | bar |
See more details here.
Supported in: Batch, Streaming
Base64 decode the given expression.
Expression categories: Binary, Cast
Output type: Binary
Argument values:
city_base64
city_base64 | Output |
---|---|
TG9uZG9u | TG9uZG9u |
Q29wZW5oYWdlbg== | Q29wZW5oYWdlbg== |
TmV3IFlvcms= | TmV3IFlvcms= |
See more details here.
Supported in: Batch, Streaming
Base64 encode the given expression.
Expression categories: Binary, Cast
Output type: String
Argument values:
city
city | Output |
---|---|
London | TG9uZG9u |
Copenhagen | Q29wZW5oYWdlbg== |
New York | TmV3IFlvcms= |
See more details here.
Supported in: Batch, Streaming
Shift the given value a number of bits left.
Expression categories: Binary
Type variable bounds: E accepts Byte | Integer | Long | Short
Output type: E
Argument values:
Output: 2
See more details here.
Supported in: Batch, Streaming
Shift the given value a number of bits right.
Expression categories: Binary
Type variable bounds: E accepts Byte | Integer | Long | Short
Output type: E
Argument values:
Output: 0
See more details here.
Supported in: Batch, Streaming
Creates a buffer of distance k from an array of H3 indices.
Expression categories: Geospatial
Output type: Array<H3 Index>
See more details here.
Supported in: Batch, Streaming
Calculates the destination point along a specified path given a starting point, course, and distance.
Expression categories: Geospatial
Output type: GeoPoint
Argument values:
course
distance
point_a
GREAT_CIRCLE
point_a | course | distance | Output |
---|---|---|---|
{ latitude: 48.8567, longitude: 2.3508, } | 225.0 | 32000.0 | { latitude: 48.65279552300661, longitude: 2.0427666779658806, } |
See more details here.
Supported in: Batch, Streaming
Calculates the haversine distance between two latitude and longitude point pairs in meters.
Expression categories: Geospatial
Output type: Double
Argument values:
point_a
point_b
point_a | point_b | Output |
---|---|---|
{ latitude: 41.507483, longitude: -99.436554, } | { latitude: 38.504048, longitude: -98.315949, } | 347328.82778977347 |
{ latitude: 22.308919, longitude: 113.914603, } | { latitude: -33.946111, longitude: 151.177222, } | 7393894.00134442 |
See more details here.
Supported in: Batch, Streaming
Choose between different branches based on conditions.
Expression categories: Popular
Type variable bounds: T accepts AnyType
Output type: T
Argument values:
miles
,miles | Output |
---|---|
20053 | Yes |
10210 | No |
34120 | Yes |
See more details here.
Supported in: Batch, Streaming
Cast expression to given type.
Expression categories: Cast, Popular
Type variable bounds: C accepts AnyType
Output type: C
Description: Casting long to string Argument values:
Output: 1234
See more details here.
Supported in: Batch, Streaming
Returns ceil of a given fractional value.
Expression categories: Numeric
Output type: Decimal | Long
Argument values:
Output: 11
See more details here.
Supported in: Batch
Changes the time zone of a timestamp.
Expression categories: Datetime
Output type: Timestamp
Argument values:
Output: 2020-04-28T04:09:00Z
See more details here.
Supported in: Batch, Streaming
Replaces individual characters from the input column that are found in the matching with the corresponding character in the replacement string. If the matching string is longer than the replacement string, characters at the end of the matching string will be dropped.
Expression categories: String
Output type: String
Argument values:
Output: 1a2s3ae
See more details here.
Supported in: Batch
Chunk string into chunks of a specified size and on specified separators.
Expression categories: String
Output type: Array<String>
Argument values:
string
string | Output |
---|---|
hello | [ hello ] |
hello world. the quick brown fox jumps over the fence. | [ hello, world., the quick, brown fox, jumps, over the, fence. ] |
hello world. the quick brown fox jumps over the fence. | [ hello, world., the quick, brown fox, jumps, over the, fence. ] |
hello world. the quick brown fox jumps over the fence. | [ hello, world., the quick, brown fox, jumps, over the, fence. ] |
See more details here.
Supported in: Batch, Streaming
Decrypts expression with cipher.
Expression categories: Other
Output type: String
Argument values:
string
string | Output |
---|---|
CIPHER::ri.bellaso.main.cipher-channel.1::OCRBIW3iHDltOGa6MEHwb7f/Dw==::CIPHER | bar |
See more details here.
Supported in: Batch, Streaming
Encrypts expression with cipher.
Expression categories: Other
Output type: Cipher Text
Argument values:
string
string | Output |
---|---|
bar | CIPHER::ri.bellaso.main.cipher-channel.1::OCRBIW3iHDltOGa6MEHwb7f/Dw==::CIPHER |
See more details here.
Supported in: Batch, Streaming
Hashes expression with cipher.
Expression categories: Other
Output type: Cipher Text
Argument values:
string
string | Output |
---|---|
bar | CIPHER::ri.bellaso.main.cipher-channel.1::c70a14f5cc57c940e3265045a5554d641bd549ee27a571a05cdbc75c77762eb86b1144c12f1bb7811a0bcec08b2f143989c44022e4664f615d6885ad640332cb::CIPHER |
See more details here.
Supported in: Batch, Streaming
Applies the set of clean actions on the expression.
Expression categories: Data preparation, String
Output type: String
Argument values:
trim
}Output: hello world
See more details here.
Supported in: Batch, Streaming
Compact H3 indices into a subset of mixed resolutions if possible. Running the inverse operation uncompact is guaranteed to yield the same set of indices that were compacted if the input indices were all the same resolution. If any of the input indices are invalid this transform will return null. Output indices are sorted in ascending order.
Expression categories: Geospatial
Output type: Array<H3 Index>
Argument values:
h3_set
h3_set | Output |
---|---|
[ 87754a914ffffff, 87754a916ffffff, 87754a930ffffff, 87754a932ffffff, 87754a933ffffff, 87754a934ffff... | [ 86754e64fffffff, 87754a914ffffff, 87754a916ffffff, 87754a930ffffff, 87754a932ffffff, 87754a933ffff... |
See more details here.
Supported in: Batch, Streaming
Concatenates a list of strings with the specified separator.
Expression categories: String
Output type: String
Argument values:
Output: hello_world
See more details here.
Supported in: Batch, Streaming
Constructs a GeoPoint column from a latitude and longitude column. Validates that the latitude parameter is between -90 and 90, inclusive, and that the longitude parameter is between -180 and 180, inclusive; if not, returns a null value.
Expression categories: Geospatial
Output type: GeoPoint
See more details here.
Supported in: Batch, Streaming
Expression to construct a valid delegated media Gotham identifier (GID) from components. If result is more than 1024 characters, produces a null row.
Expression categories: Other
Output type: Delegated media Gotham identifier (GID)
Argument values:
locator
mediaType
mediaType | locator | Output |
---|---|---|
testaudiotype | empty string | null |
See more details here.
Supported in: Batch, Streaming
Converts a geospatial coordinate string in degrees, minutes, seconds (DMS) format to a GeoPoint in accordance to user-provided formats. The default formats are DDD*°MM*'SS*"H
and DDD*MMSSssH
. The formats are run in order, and the first matching format will be returned. See formatting guide on how to write user-generated formats.
Expression categories: Geospatial
Output type: GeoPoint
Argument values:
coordinates
coordinates | Output |
---|---|
078261594N075220923E | { latitude: 78.43776111111112, longitude: 75.36923055555555, } |
046115095S069524119W | { latitude: -46.19748611111111, longitude: -69.87810833333333, } |
023°45'55"N 069°52'11"W | { latitude: 23.76527777777777, longitude: -69.86972222222222, } |
-123°55'55"N 069°53'00"W | { latitude: -123.93194444444445, longitude: -69.88333333333334, } |
123456789N23456789E | { latitude: 123.76885833333333, longitude: 23.768858333333334, } |
See more details here.
Supported in: Batch, Streaming
Converts a GeoPoint to a base32-encoded Geohash with specified precision that contains the GeoPoint. For more information on Geohash, see: https://en.wikipedia.org/wiki/Geohash .
Expression categories: Geospatial
Output type: Geohash
See more details here.
Supported in: Batch, Streaming
Converts a GeoPoint following the WGS84 coordinate system (which is EPSG:4326) to a MGRS (military grid reference system) coordinate. The output MGRS will follow a space-delimited format with 5 digits of precision.
Expression categories: Geospatial
Output type: MGRS
Argument values:
geoPoint
geoPoint | Output |
---|---|
{ latitude -> 88.99999659707431, longitude -> 0.9996456505181999, } | Z AF 01937 88990 |
See more details here.
Supported in: Batch, Streaming
Convert GeoPoint to a GeoJSON of type point.
Expression categories: Geospatial
Output type: Geometry
See more details here.
Supported in: Batch, Streaming
Converts a MGRS (military grid reference system) coordinate into a GeoPoint following the WGS84 coordinate system (which is EPSG:4326).
Expression categories: Geospatial
Output type: GeoPoint
Argument values:
mgrs
mgrs | Output |
---|---|
ZAF0193788990 | { latitude: 88.99999659707431, longitude: 0.9996456505181999, } |
See more details here.
Supported in: Batch, Streaming
Returns the date given a formatted string in accordance to the Java DateTimeFormatter. The default formats are yyyy-MM-dd
and yyyy-MM-dd'T'HH:mm:ss.SSSXXX
. The formats are run in order, the first matching format will be returned.
Expression categories: Cast, Datetime
Output type: Date
Description: Date formats are optional Argument values:
Output: 2020-04-28
See more details here.
Supported in: Batch, Streaming
Returns the timestamp given a formatted string in accordance to the Java DateTimeFormatter. The default formats are yyyy-MM-dd'T'HH:mm:ss.SSSXXX
and yyyy-MM-dd
. The formats are run in order, the first matching format will be returned.
Expression categories: Cast, Datetime
Output type: Timestamp
Argument values:
timestamp
timestamp | Output |
---|---|
28-2020-04 10:09:00 | 2020-04-28T10:09:00Z |
2020-04-28 | 2020-04-28T00:00:00Z |
See more details here.
Supported in: Batch
Convert a number (or it string representation) from one base to another.
Expression categories: Binary, Cast, Numeric
Output type: String
Argument values:
Output: 305153
See more details here.
Supported in: Batch, Streaming
Expression categories: Geospatial, Numeric
Output type: Double
See more details here.
Supported in: Batch, Streaming
Expression categories: Numeric
Output type: Double
See more details here.
Supported in: Batch, Streaming
Expression categories: Datetime
Output type: Double
See more details here.
Supported in: Batch, Streaming
Expression categories: Numeric
Output type: Double
See more details here.
Supported in: Batch, Streaming
Transforms input into json string.
Expression categories: File, String
Output type: String
Argument values:
struct
struct | Output |
---|---|
{ airline: { id: NA, }, } | {"airline":{"id":"NA"}} |
See more details here.
Supported in: Batch, Streaming
Convert an Ontology GeoPoint into a regular GeoPoint. Ontology GeoPoints are strings of the format '{lat},{lon}', where -90 <= lat <= 90 and -180 <= lon <= 180. Regular GeoPoints are structures of the format {"longitude": {long},"latitude": {lat}}.
Expression categories: Geospatial
Output type: GeoPoint
Argument values:
geopoint
geopoint | Output |
---|---|
-20.0000000,80.0000000 | { latitude: -20.0, longitude: 80.0, } |
38.9031000,-77.0599000 | { latitude: 38.9031, longitude: -77.0599, } |
41.9876543,-99.1234568 | { latitude: 41.9876543, longitude: -99.1234568, } |
See more details here.
Supported in: Batch
Inverse of hex. Interprets each pair of characters as a hexadecimal number and converts to the byte representation of the number.
Expression categories: Numeric, String
Output type: Binary
Argument values:
string_hex
string_hex | Output |
---|---|
68656C6C6F | aGVsbG8= |
3039 | MDk= |
FFFFFFFFFFFFCFC7 | ////////z8c= |
4C6F6E646F6E | TG9uZG9u |
See more details here.
Supported in: Batch, Streaming
Inverse of hex, interprets each pair of characters as a hexadecimal number and converts to the utf-8 string of the byte representation of the number.
Expression categories: String
Output type: String
Argument values:
string_hex
string_hex | Output |
---|---|
68656C6C6F | hello |
4C6F6E646F6E | London |
See more details here.
Supported in: Batch, Streaming
Converts geocentric cartesian coordinates to geodesic polar coordinates. Altitude is defined as height-above-ellipsoid. If any coordinates are null, the output will be null.
Expression categories: Geospatial
Output type: GeoPoint with altitude
Argument values:
x_coordinate
y_coordinate
z_coordinate
x_coordinate | y_coordinate | z_coordinate | Output |
---|---|---|---|
0.0 | 6378137.0 | 0.0 | { altitude -> 0.0, geoPoint -> { latitude -> 0.0, longitude -> 90.0, }, } |
0.0 | -6378137.0 | 0.0 | { altitude -> 0.0, geoPoint -> { latitude -> 0.0, longitude -> -90.0, }, } |
-6378137.0 | 0.0 | 0.0 | { altitude -> 0.0, geoPoint -> { latitude -> 0.0, longitude -> 180.0, }, } |
-6378137.0 | -0.0 | 0.0 | { altitude -> 0.0, geoPoint -> { latitude -> 0.0, longitude -> -180.0, }, } |
0.0 | 0.0 | 6356752.314245179 | { altitude -> 0.0, geoPoint -> { latitude -> 90.0, longitude -> 0.0, }, } |
0.0 | 0.0 | -6356752.314245179 | { altitude -> 0.0, geoPoint -> { latitude -> -90.0, longitude -> 0.0, }, } |
See more details here.
Supported in: Batch
Converts a legacy OffsetDateTime column to a timestamp that can be used in all Foundry pipelines. The timestamp is returned in UTC.
Expression categories: Datetime
Output type: Timestamp
See more details here.
Supported in: Batch, Streaming
Convert a linestring geometry to a polygon geometry. This expression assumes the linestring geometry is closed. If not, the expression will return null.
Expression categories: Geospatial
Output type: Geometry
Argument values:
polygon_points
polygon_points | Output |
---|---|
{"type":"LineString","coordinates":[[-77.49,38.01],[-77.47,38.15],[-77.19,38.14],[-77.49,38.01]]} | {"type":"Polygon","coordinates":[[[-77.49,38.01],[-77.47,38.15],[-77.19,38.14],[-77.49,38.01]]]} |
See more details here.
Supported in: Batch, Streaming
Converts a timestamp from UTC to a given time zone.
Expression categories: Datetime
Output type: Timestamp
Argument values:
Output: 2020-04-28T05:09:00Z
See more details here.
Supported in: Batch, Streaming
Converts a timestamp to UTC based on a given time zone.
Expression categories: Datetime
Output type: Timestamp
Argument values:
Output: 2020-04-28T15:09:00Z
See more details here.
Supported in: Batch, Streaming
Convert a GeoPoint into a string that the Ontology will accept for a geo-indexed column (a geohash type column). Ontology GeoPoints are strings of the format '{lat},{lon}', where -90 <= lat <= 90 and -180 <= lon <= 180.
Expression categories: Geospatial
Output type: Ontology GeoPoint
Argument values:
point
point | Output |
---|---|
{ latitude: -20.0, longitude: 80.0, } | -20.0000000,80.0000000 |
{ latitude: 38.9031, longitude: -77.0599, } | 38.9031000,-77.0599000 |
{ latitude: 41.987654321, longitude: -99.123456789, } | 41.9876543,-99.1234568 |
null | null |
See more details here.
Supported in: Batch, Streaming
Computes hex value of given expression.
Expression categories: Numeric, String
Output type: String
Argument values:
city_hex
city_hex | Output |
---|---|
TG9uZG9u | 4C6F6E646F6E |
See more details here.
Supported in: Batch, Streaming
Computes octal value of given expression.
Expression categories: Numeric
Output type: String
Argument values:
Output: 30071
See more details here.
Supported in: Batch, Streaming
Takes the cosine of an angle.
Expression categories: Numeric
Output type: Double
Argument values:
degrees
angle
angle | Output |
---|---|
0.0 | 1.0 |
90.0 | 0.0 |
180.0 | -1.0 |
See more details here.
Supported in: Batch, Streaming
Takes a pair of coordinates from a source coordinate system and transforms them into WGS 84 latitude/longitude values. Coordinate systems (also know as coordinate reference systems or spatial reference systems) represent different systems for identifying the location of a point on the globe and are often identified by key in standardized databases such as EPSG. If the given projection is not supported or either coordinate is null, returns null.
Expression categories: Geospatial
Output type: GeoPoint
Argument values:
x_coordinate
y_coordinate
x_coordinate | y_coordinate | Output |
---|---|---|
322190.2233952965 | 4306505.703879281 | { latitude -> 38.88944258, longitude -> -77.05014581, } |
323243.1361536059 | 4318298.06539618 | { latitude -> 38.99585379643137, longitude -> -77.04105678275415, } |
407063.63465300016 | 4764873.719585404 | { latitude -> 43.03086518778498, longitude -> -76.14077251822197, } |
See more details here.
Supported in: Batch, Streaming
Returns an empty array of the given type.
Expression categories: Array
Type variable bounds: T accepts AnyType
Output type: Array<T>
Argument values:
Output: [ ]
See more details here.
Supported in: Batch, Streaming
Creates an array from the columns provided.
Expression categories: Array
Type variable bounds: T accepts AnyType
Output type: Array<T>
Argument values:
Output: [ 1, 2, 3 ]
See more details here.
Supported in: Batch, Streaming
Approximates an ellipse as a polygon centered at the given geo coordinate. The distance between points is computed along the surface of the WGS84 ellipsoid approximating the surface of the earth.
Expression categories: Geospatial
Output type: Geometry
See more details here.
Supported in: Batch, Streaming
Creates a geodesic line between two points.
Expression categories: Geospatial
Output type: Geometry
See more details here.
Supported in: Batch, Streaming
Creates a GeoJSON linestring geometry from the given points.
Expression categories: Geospatial
Type variable bounds: T accepts Struct<longitude, latitude>
Output type: Geometry
Argument values:
points
points | Output |
---|---|
[ { latitude: 10.0, longitude: 0.0, }, { latitude: 10.0, longitude: 10.0, } ] | {"type":"LineString","coordinates":[[0.0,10.0],[10.0,10.0]]} |
[ { latitude: 10.0, longitude: 10.0, }, { latitude: 20.0,<... | {"type":"LineString","coordinates":[[10.0,10.0],[20.0,20.0],[30.0,30.0]]} |
[ { latitude: 0.0, longitude: 179.0, }, { latitude: 0.0, longitude: 181.0, } ] | {"type":"MultiLineString","coordinates":[[[179.0,0.0],[180.0,0.0]],[[-180.0,0.0],[-179.0,0.0]]]} |
[ { latitude: 0.0, longitude: -179.0, }, { latitude: 0.0, longitude: -181.0, } ] | {"type":"MultiLineString","coordinates":[[[180.0,0.0],[179.0,0.0]],[[-179.0,0.0],[-180.0,0.0]]]} |
See more details here.
Supported in: Batch, Streaming
Returns a map using key-value pairs from the zipped arrays. Null values are not allowed as keys and will cause a runtime error.
Expression categories: Array, Map
Type variable bounds: K accepts AnyType**V accepts AnyType
Output type: Map<K, V>
Argument values:
Output: {
1 -> 4,
2 -> 5,
3 -> 6,
}
See more details here.
Supported in: Batch, Streaming
Returns a null value of the given type.
Expression categories: Data preparation
Type variable bounds: T accepts AnyType
Output type: T
Argument values:
Output: null
See more details here.
Supported in: Batch, Streaming
Approximates a range fan as a polygon, specifying the region of all points whose haversine distance to the origin point is between the minimum and maximum radii, and to which the bearing from the origin is contained with the angular range centered around the specified bearing parameter. The left and right sides of the range fan are drawn as geodesic lines computed along the surface of the WGS84 ellipsoid approximating the surface of the earth. Returns null if the range spans more than 180 degrees while also crossing the anti-meridian, or if the maximum radius spans more than half of the circumference of the earth.
Expression categories: Geospatial
Output type: Geometry
See more details here.
Supported in: Batch, Streaming
Combines multiple columns into a single structured column.
Expression categories: Struct
Output type: Struct
Argument values:
tail_number
, id
]tail_number | id | Output |
---|---|---|
MT-112 | 1 | { id: 1, tail_number: MT-112, } |
XB-123 | 2 | { id: 2, tail_number: XB-123, } |
PA-654 | 3 | { id: 3, tail_number: PA-654, } |
See more details here.
Supported in: Batch, Streaming
Creates time series reference values.
Expression categories: String
Output type: String
Argument values:
seriesId
seriesId | Output |
---|---|
seriesOne | {"seriesId":"seriesOne","syncRid":"ri.time-series-catalog.main.sync.11111111"} |
See more details here.
Supported in: Batch, Streaming
Returns the current date of when computation started.
Expression categories: Datetime
Output type: Date
See more details here.
Supported in: Batch, Streaming
Returns the current timestamp when computation started.
Expression categories: Datetime
Output type: Timestamp
See more details here.
Supported in: Batch
Creates an array with dates in range from start to end.
Expression categories: Datetime
Output type: Array<Date>
Argument values:
last_planned_flight
first_planned_flight
DAYS
first_planned_flight | last_planned_flight | Output |
---|---|---|
2023-01-01 | 2023-01-03 | [ 2023-01-01, 2023-01-02, 2023-01-03 ] |
2023-01-31 | 2023-02-02 | [ 2023-01-31, 2023-02-01, 2023-02-02 ] |
2023-02-28 | 2023-03-01 | [ 2023-02-28, 2023-03-01 ] |
See more details here.
Supported in: Batch, Streaming
Decode Geobuf geometry as GeoJSON.
Expression categories: Geospatial
Output type: Geometry
See more details here.
Supported in: Batch, Streaming
Divide one number by another number.
Expression categories: Numeric
Output type: Decimal | Double
Argument values:
col_a
col_b
col_a | col_b | Output |
---|---|---|
4 | 2 | 2.0 |
11 | 2 | 5.5 |
See more details here.
Supported in: Batch, Streaming
Encodes GeoJSON geometry as Geobuf.
Expression categories: Geospatial
Output type: Geobuf
See more details here.
Supported in: Batch, Streaming
Expression categories: Boolean, String
Output type: Boolean
Argument values:
Output: true
See more details here.
Supported in: Batch, Streaming
Converts from epoch milliseconds to date, UTC.
Expression categories: Cast, Datetime
Output type: Date
Description: You can convert epoch timestamps in milliseconds to the date type Argument values:
Output: 2023-01-17
See more details here.
Supported in: Batch, Streaming
Converts from epoch milliseconds to timestamp in UTC.
Expression categories: Cast, Datetime
Output type: Timestamp
Description: You can convert epoch timestamps in milliseconds to the timestamp type Argument values:
Output: 2023-01-17T14:01:51Z
See more details here.
Supported in: Batch, Streaming
Converts from epoch seconds to date in UTC.
Expression categories: Cast, Datetime
Output type: Date
Description: You can convert epoch timestamps to the date type Argument values:
Output: 2023-01-17
See more details here.
Supported in: Batch, Streaming
Converts from epoch seconds to timestamp in UTC.
Expression categories: Cast, Datetime
Output type: Timestamp
Description: You can convert epoch timestamps to the timestamp type Argument values:
Output: 2023-01-17T14:01:51Z
See more details here.
Supported in: Batch, Streaming
Returns true if left and right are equal.
Expression categories: Boolean
Output type: Boolean
Argument values:
a
b
a | b | Output |
---|---|---|
1 | 1 | true |
1 | 0 | false |
See more details here.
Supported in: Batch, Streaming
Calculates the exponential, e^x, of a column.
Expression categories: Numeric
Output type: Double
Argument values:
Output: 7.38905609893
See more details here.
Supported in: Batch
Extract all instances of a regex match into an array.
Expression categories: Regex, String
Output type: Array<String>
Description: Extract the first two initials from each code. Argument values:
Output: [ MT, XB ]
See more details here.
Supported in: Batch, Streaming
Extracts a part of a date like year or day of week.
Expression categories: Datetime
Output type: Integer
See more details here.
Supported in: Batch
Extract metadata fields from a document.
Expression categories: Media
Output type: Struct
Argument values:
Document Author
, Page Count
, Document Title
]Media Reference
Media Reference | Output |
---|---|
{"mimeType":"application/pdf","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}} | { author: Jane Doe, page_count: 23, title: Document Title, } |
See more details here.
Supported in: Batch
Extract metadata fields from an image.
Expression categories: Media
Output type: Struct
Argument values:
Attributes
, Bands
, Bytes
, Dimensions
, Format
, Geographic Metadata
, ICC Profile
]Media Reference
Media Reference | Output |
---|---|
{"mimeType":"image/tiff","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}} | { attributes: { outer_key1 -> { inner_key1 -> inner_value1, }, ... |
See more details here.
Supported in: Batch, Streaming
Return map keys as an array. Note the order of array elements is not deterministic.
Expression categories: Map
Type variable bounds: K accepts AnyType
Output type: Array<K>
Argument values:
flight_number
flight_number | Output |
---|---|
{ MT-111 -> 2, XB-134 -> 1, } | [ XB-134, MT-111 ] |
See more details here.
Supported in: Batch, Streaming
Return map values as an array. Note the order of array elements is not deterministic.
Expression categories: Map
Type variable bounds: V accepts AnyType
Output type: Array<V>
Argument values:
flight_number
flight_number | Output |
---|---|
{ MT-111 -> 2, XB-134 -> 1, } | [ 1, 2 ] |
See more details here.
Supported in: Batch
Extract raw text from pages in PDF files.
Expression categories: Media
Output type: Array<String>
Argument values:
Media Reference
End Page
Start Page
Media Reference | Start Page | End Page | Output |
---|---|---|---|
{"mimeType":"application/pdf","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}} | 1 | 2 | [ first page, second page ] |
See more details here.
Supported in: Batch
Run OCR on PDF files in a media set to extract text.
Expression categories: Media
Output type: Array<String>
See more details here.
Supported in: Batch
Run OCR on image files in a media set to extract text.
Expression categories: Media
Output type: String
See more details here.
Supported in: Batch, Streaming
Extracts a part of a timestamp like year or day of week.
Expression categories: Datetime
Output type: Integer
See more details here.
Supported in: Batch, Streaming
Filters an array based on the filter expression. Note, array index starts at 1.
Expression categories: Array
Type variable bounds: T accepts AnyType
Output type: Array<T>
Argument values:
array
element
,array | Output |
---|---|
[ 2, 5, null, 11 ] | [ 2, 5, 11 ] |
See more details here.
Supported in: Batch, Streaming
Nulls any values in the geometry column that are not of the provided geometry types.
Expression categories: Geospatial
Output type: Geometry
See more details here.
Supported in: Batch, Streaming
Picks first non null value of the inputs. Known as coalesce in sql.
Expression categories: Data preparation
Type variable bounds: T accepts AnyType
Output type: T
Argument values:
tail_number
, airline
]tail_number | airline | Output |
---|---|---|
XB-123 | null | XB-123 |
null | MT | MT |
See more details here.
Supported in: Batch, Streaming
Returns floor of a given fractional value.
Expression categories: Numeric
Output type: Decimal | Long
Argument values:
Output: 10
See more details here.
Supported in: Batch, Streaming
Returns the date as formatted string in accordance to the Java DateTimeFormatter. The default format is ISO8601.
Expression categories: Cast, String
Output type: String
Argument values:
Output: 22-12-20
See more details here.
Supported in: Batch
Formats a number to a specific number of decimal places.
Expression categories: Cast, Numeric, String
Output type: String
Description: Formats a number to 2 decimal places. Argument values:
Output: 1,234.57
See more details here.
Supported in: Batch, Streaming
Formats string printf style.
Expression categories: String
Output type: String
Argument values:
argument1
, argument2
]argument1 | argument2 | Output |
---|---|---|
Alice | Bob | Hello Alice, my name is Bob |
Jane | John | Hello Jane, my name is John |
See more details here.
Supported in: Batch, Streaming
Returns the timestamp as ISO8601 formatted string.
Expression categories: Cast, Datetime, String
Output type: String
Argument values:
Output: 2022-10-01
See more details here.
Supported in: Batch, Streaming
Determines if two geometries intersect.
Expression categories: Geospatial
Output type: Boolean
Argument values:
geometry_a
geometry_b
geometry_a | geometry_b | Output |
---|---|---|
{"coordinates":[[[-112.94377956164206,34.81725414459382],[-112.94377956164206,30.006795384733323], [... | {"coordinates":[[[-103.78627755867336,33.162750522563925],[-103.78627755867336,28.29724741894266],[-... | true |
{"coordinates":[[[0.3651446504365481,15.159518507965103],[0.3651446504365481,13.427462911044273],[3.... | {"coordinates":[[[5.656394524666183,13.405417496831944],[5.656394524666183,11.29869961209053],[8.551... | false |
See more details here.
Supported in: Batch, Streaming
Applies a three dimensional affine transformation to the input geometry. This transformation occurs in the user-provided projected coordinate system, and the result is projected back to WGS84. Two dimensional geometries will have their z-coordinates set to 0 before the affine transformation is applied. The returned geometry is three dimensional and for each coordinate [x,y,z] represents the matrix multiplication [[x0, x1, x2, x-offset], [y0, y1, y2, y-offset], [z0, z1, z2, z-offset], [0, 0, 0, 1]] * [x, y, z, 1], where the first three ordinates of the result are returned.
Expression categories: Geospatial
Output type: Geometry
Argument values:
geometry
geometry | Output |
---|---|
{"type":"Polygon","coordinates":[[[0.0, 0.0],[1.0, 0.0],[1.0, 1.0],[0.0, 1.0],[0.0, 0.0]]]} | {"type":"Polygon","coordinates":[[[0.0, 0.0, 0.0],[0.0, 1.0, 0.0],[-1.0, 1.0, 0.0],[-1.0, 0.0, 0.0],[0.0, 0.0, 0.0]]]} |
See more details here.
Supported in: Batch, Streaming
Given an array of geometries, combine these into a single geometry, merging without overlap.
Expression categories: Geospatial
Type variable bounds: T accepts Geometry
Output type: T
Argument values:
geometries
geometries | Output |
---|---|
[ {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}, {"type":"Polygon","coordinates":[[[0.5,0.0],[1.5,0.0],[1.5,1.0],[0.5,1.0],[0.5,0.0]]]} ] | {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[0.5,1.0],[1.0,1.0],[1.5,1.0],[1.5,0.0],[1.0,0.0],[0.5,0.0],[0.0,0.0]]]} |
[ ] | null |
null | null |
See more details here.
Supported in: Batch, Streaming
Given an array of geometries, combine these into a linear geometry. Dissolve simplifies an input set of line-strings by removing unnecessary nodes and concatenating line-strings that can be combined. Z-coordinates will be ignored for the purpose of the dissolve operation, but the vertices in the resultant geometry will have the same z-coordinate as the corresponding points in the input.
Expression categories: Geospatial
Type variable bounds: T accepts Geometry
Output type: T
Argument values:
geometries
geometries | Output |
---|---|
[ {"type":"LineString","coordinates":[[0,0],[0,1],[1,1]]}, {"type":"LineString","coordinates":[[1,1]... | {"type":"MultiLineString","coordinates":[[[5.0, 5.0],[4.0, 4.0],[3.0, 3.0],[2.0, 2.0],[1.0, 1.0],[0.0, 1.0],[0.0, 0.0]],[[7.0, 7.0], [6.0, 7.0], [6.0, 6.0]]]} |
[ {"type":"LineString","coordinates":[[0,0,1],[0,1,1],[1,1,1]]}, {"type":"LineString","coordinates":[[1,1,1],[2,2,2]]}, {"type":"LineString","coordinates":[[1,1,2],[2,2,2],[3,3,3]]} ] | {"type":"LineString","coordinates":[[0.0, 0.0, 1.0],[0.0, 1.0, 1.0],[1.0, 1.0, 1.0],[2.0, 2.0, 2.0],[3.0, 3.0, 3.0]]} |
See more details here.
Supported in: Batch, Streaming
Computes the buffer of a geometry for both positive and negative buffer distances. Returns an approximate representation of all points within a given distance of the this geometric object (or for negative buffers, all points minus those within the buffer distance of the boundary). Buffer drops any z coordinates, and zero/negative distance buffers of lines and points will return null.
Expression categories: Geospatial
Output type: Geometry
Argument values:
distance
geometry
ROUND
ROUND
DOUBLE_SIDED
geometry | distance | Output |
---|---|---|
{"type":"Point","coordinates":[-77.07368071728229,38.83040844313318]} | 10.0 | {"type":"Polygon","coordinates":[[[-77.07356558299462, 38.83041048767274],[-77.07356728534256, 38.83... |
{"type":"LineString","coordinates":[[-77.07368071728229,38.83040844313318, 1],[-77.0725293738795,38.83042888342659, 1]]} | 10.0 | {"type":"Polygon","coordinates":[[[-77.07253198637027, 38.83051894052714],[-77.07250947453703, 38.83... |
{"type":"Polygon","coordinates":[[[-77.07368071728229,38.83040844313318, 1],[-77.0725293738795,38.83... | 10.0 | {"type":"Polygon","coordinates":[[[-77.07379585155829, 38.83040639848026],[-77.07382199292853, 38.83... |
See more details here.
Supported in: Batch, Streaming
Return the centroid, or "center of mass", of the geometry using a spherical approximation of the globe. If the geometry is a collection of mixed dimensions, only the elements of the highest dimension will contribute to the centroid (e.g. in a collection of points, lines and polygons, points and lines are ignored).
Expression categories: Geospatial
Output type: GeoPoint
See more details here.
Supported in: Batch, Streaming
Determines if geometry a contains geometry b. Points or lines lying on the boundary of a polygon are not contained within another geometry.
Expression categories: Geospatial
Output type: Boolean
Argument values:
geometry_a
geometry_b
geometry_a | geometry_b | Output |
---|---|---|
{"coordinates":[[[-112.94377956164206,34.81725414459382],[-112.94377956164206,30.006795384733323], [... | {"type":"Point","coordinates":[-100.0,32.0]} | true |
{"coordinates":[[[-112.94377956164206,34.81725414459382],[-112.94377956164206,30.006795384733323], [... | {"type":"LineString","coordinates":[[-112.94377956164206,34.81725414459382],[-112.94377956164206,30.006795384733323]]} | false |
{"type":"LineString","coordinates":[[-112.94377956164206,34.81725414459382],[-112.94377956164206,30.006795384733323]]} | {"type":"Point","coordinates":[-112.94377956164206,34.81725414459382]} | false |
{"type":"Point","coordinates":[-112.94377956164206,34.81725414459382]} | {"type":"Point","coordinates":[-112.94377956164206,34.81725414459382]} | true |
{"coordinates":[[[-112.94377956164206,34.81725414459382],[-112.94377956164206,30.006795384733323], [... | {"coordinates":[[[-111.94377956164206,33.81725414459382],[-111.94377956164206,31.006795384733323], [... | true |
See more details here.
Supported in: Batch, Streaming
Calculates the portion of geometry a that is not intersecting geometry b.
Expression categories: Geospatial
Output type: Geometry
Argument values:
geometry_a
geometry_b
geometry_a | geometry_b | Output |
---|---|---|
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} | {"type":"Polygon","coordinates":[[[0.25,0.25],[0.5,0.25],[0.5,0.5],[0.25,0.5],[0.25,0.25]]]} | {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]],[[0.25,0.25],[0.5,0.25],[0.5,0.5],[0.25,0.5],[0.25,0.25]]]} |
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} | {"type":"Polygon","coordinates":[[[0.0,0.0],[0.5,0.0],[0.5,1.0],[0.0,1.0],[0.0,0.0]]]} | {"type":"Polygon","coordinates":[[[0.5,1.0],[1.0,1.0],[1.0,0.0],[0.5,0.0],[0.5,1.0]]]} |
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} | {"type":"Polygon","coordinates":[[[5.0,5.0],[5.0,6.0],[6.0,6.0],[6.0,5.0],[5.0,5.0]]]} | {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} |
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} | {"type":"LineString","coordinates":[[0.0,0.0],[0.0,1.0]]} | {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} |
See more details here.
Supported in: Batch, Streaming
Converts a geometry to an array of its constituent simple geometries.
Expression categories: Geospatial
Output type: Array<Geometry>
Argument values:
geometry
geometry | Output |
---|---|
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} | [ {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} ] |
{"type":"MultiPolygon","coordinates":[[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]],[[[5.0,5.0],[5.0,6.0],[6.0,6.0],[6.0,5.0],[5.0,5.0]]]]} | [ {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}, {"type":"Polygon","coordinates":[[[5.0,5.0],[5.0,6.0],[6.0,6.0],[6.0,5.0],[5.0,5.0]]]} ] |
See more details here.
Supported in: Batch, Streaming
Calculates the portion of geometry a that is intersecting geometry b.
Expression categories: Geospatial
Output type: Geometry
Argument values:
geometry_a
geometry_b
geometry_a | geometry_b | Output |
---|---|---|
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} | {"type":"Polygon","coordinates":[[[0.5,0.0],[1.5,0.0],[1.5,1.0],[0.5,1.0],[0.5,0.0]]]} | {"type":"Polygon","coordinates":[[[0.5,1.0],[1.0,1.0],[1.0,0.0],[0.5,0.0],[0.5,1.0]]]} |
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} | {"type":"Polygon","coordinates":[[[5.0,5.0],[5.0,6.0],[6.0,6.0],[6.0,5.0],[5.0,5.0]]]} | {"type":"Polygon","coordinates":[[]]} |
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} | {"type":"Polygon","coordinates":[[[1.0,0.0],[1.0,1.0],[2.0,1.0],[2.0,0.0],[1.0,0.0]]]} | {"type":"LineString","coordinates":[[1.0,1.0],[1.0,0.0]]} |
{"type":"Point","coordinates":[0.0,0.0]} | {"type":"LineString","coordinates":[[0.0,0.0],[1.0,0.0]]} | {"type":"Point","coordinates":[0.0,0.0]} |
{"type":"LineString","coordinates":[[0.0,0.0],[1.0,0.0]]} | {"type":"Polygon","coordinates":[[[2.0,0.0],[2.0,1.0],[3.0,1.0],[3.0,0.0],[2.0,0.0]]]} | {"type":"LineString","coordinates":[]} |
See more details here.
Supported in: Batch, Streaming
Get the length of the line strings and multi line strings in the geometry in meters. Uses a spherical approximation of the globe. Non-linear geometries (polygons and points) count as 0.
Expression categories: Geospatial
Output type: Double
Argument values:
geometry
geometry | Output |
---|---|
{"type":"LineString","coordinates":[[-73.778128,40.641195],[-118.408535,33.941563]]} | 3974344.7433354934 |
{"type":"LineString","coordinates":[[0.0,0.0],[1.0,0.0],[1.0,1.0],[1.0,2.0]]} | 333585.2407005987 |
{"type":"MultiLineString","coordinates":[[[0.0,0.0],[1.0,0.0],[1.0,1.0]], [[1.0,2.0],[2.0,2.0]]]} | 333517.50194413937 |
See more details here.
Supported in: Streaming
Applies a two dimensional clockwise rotation centered at the provided GeoPoint to the supplied geometry. This rotation occurs in the provided coordinate reference system and is then projected back to WGS84.
Expression categories: Geospatial
Output type: Geometry
See more details here.
Supported in: Batch, Streaming
Sets the z-coordinate of a geometry. If the geometry has an existing z-coordinate it will be overwritten.
Expression categories: Geospatial
Output type: Geometry
Argument values:
geometry
zCoordinate
geometry | zCoordinate | Output |
---|---|---|
{"type":"Point","coordinates":[1.0, 2.0]} | 1.0 | {"type":"Point","coordinates":[1.0, 2.0, 1.0]} |
{"type":"Point","coordinates":[1.0, 2.0, 3.0]} | 1.0 | {"type":"Point","coordinates":[1.0, 2.0, 1.0]} |
See more details here.
Supported in: Batch, Streaming
Given two valid geometries, calculates the shortest (great circle) distance in meters between them. Uses a spherical approximation of the globe. Overlapping geometries have a distance of zero.
Expression categories: Geospatial
Output type: Double
See more details here.
Supported in: Batch, Streaming
Given a valid geometry, standardizes it by enforcing the right-hand rule on the input, which is the convention for GeoJSON. This enables equality comparisons between equivalent geometries. This expression may reverse linestrings.
Expression categories: Geospatial
Output type: Geometry
Argument values:
geometry
geometry | Output |
---|---|
{"type":"Polygon","coordinates":[[[32.26868,-26.53253],[32.26465,-26.45873],[32.25262,-26.38563],[32.26868,-26.53253]]]} | {"type":"Polygon","coordinates":[[[32.25262, -26.38563],[32.26868, -26.53253],[32.26465, -26.45873],[32.25262, -26.38563]]]} |
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,0.0]]]} | {"type":"Polygon","coordinates":[[[0.0,0.0],[1.0,0.0],[0.0,1.0],[0.0,0.0]]]} |
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,0.0]],[[0.25,0.25],[0.5,0.25],[0.25,0.5]]]} | {"type":"Polygon","coordinates":[[[0.0,0.0],[1.0,0.0],[0.0,1.0],[0.0,0.0]], [[0.25,0.25],[0.25,0.5],[0.5,0.25],[0.25,0.25]]]} |
{"coordinates": [[[20.0, 10.0], [27.0, 10.0], [27.0, 17.0], [20.0, 17.0], [20.0, 10.0]]], "type": "Polygon"} | {"coordinates": [[[20.0, 10.0], [27.0, 10.0], [27.0, 17.0], [20.0, 17.0], [20.0, 10.0]]], "type": "Polygon"} |
{"coordinates": [[[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]], [[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]]], "type":"MultiPolygon"} | {"coordinates": [[[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"} |
{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"} | {"coordinates": [[5.0, 5.0],[-1.0, -1.0]], "type":"LineString"} |
{"coordinates": [[[0.0, 0.0, 0.0], [10.0, 0.0, 0.0], [10.0, 10.0, 0.0], [10.0, 10.0, 10.0], [0.0, 10.0, 10.0], [0.0, 0.0, 10.0], [0.0, 0.0, 0.0]]], "type": "Polygon"} | {"coordinates": [[[0.0, 0.0, 0.0], [10.0, 0.0, 0.0], [10.0, 10.0, 0.0], [10.0, 10.0, 10.0], [0.0, 10.0, 10.0], [0.0, 0.0, 10.0], [0.0, 0.0, 0.0]]], "type": "Polygon"} |
See more details here.
Supported in: Batch, Streaming
Calculates the portion that is in either geometry, but not in their intersection.
Expression categories: Geospatial
Output type: Geometry
Argument values:
geometry_a
geometry_b
geometry_a | geometry_b | Output |
---|---|---|
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[2.0,1.0],[2.0,0.0],[0.0,0.0]]]} | {"type":"Polygon","coordinates":[[[1.0,0.0],[1.0,1.0],[3.0,1.0],[3.0,0.0],[1.0,0.0]]]} | {"type":"MultiPolygon","coordinates":[[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]],[[[2.0,0.0],[2.0,1.0],[3.0,1.0],[3.0,0.0],[2.0,0.0]]]]} |
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} | {"type":"Polygon","coordinates":[[[0.0,0.0],[0.5,0.0],[0.5,1.0],[0.0,1.0],[0.0,0.0]]]} | {"type":"Polygon","coordinates":[[[0.5,1.0],[1.0,1.0],[1.0,0.0],[0.5,0.0],[0.5,1.0]]]} |
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} | {"type":"Polygon","coordinates":[[[0.25,0.25],[0.5,0.25],[0.5,0.5],[0.25,0.5],[0.25,0.25]]]} | {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]],[[0.25,0.25],[0.5,0.25],[0.5,0.5],[0.25,0.5],[0.25,0.25]]]} |
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} | {"type":"Polygon","coordinates":[[[5.0,5.0],[5.0,6.0],[6.0,6.0],[6.0,5.0],[5.0,5.0]]]} | {"type":"MultiPolygon","coordinates":[[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]],[[[5.0,5.0],[5.0,6.0],[6.0,6.0],[6.0,5.0],[5.0,5.0]]]]} |
See more details here.
Supported in: Batch, Streaming
Applies a translation to a geometry. Two dimensional geometries are only converted to three dimensional geometries if a z offset is supplied.
Expression categories: Geospatial
Output type: Geometry
Argument values:
geometry
geometry | Output |
---|---|
{"type":"Point","coordinates":[0.0, 0.0]} | {"type":"Point","coordinates":[1.0, -1.0]} |
{"type":"LineString","coordinates":[[0.0, 0.0], [1.0, 1.0]]} | {"type":"LineString","coordinates":[[1.0, -1.0], [2.0, 0.0]]} |
{"type":"Polygon","coordinates":[[[0.0, 0.0],[1.0, 0.0],[1.0, 1.0],[0.0, 1.0], [0.0, 0.0]]]} | {"type":"Polygon","coordinates":[[[1.0, -1.0],[2.0, -1.0],[2.0, 0.0],[1.0, 0.0],[1.0, -1.0]]]} |
See more details here.
Supported in: Batch, Streaming
Combines the two geometries to create a single geometry.
Expression categories: Geospatial
Output type: Geometry
Argument values:
geometry_a
geometry_b
geometry_a | geometry_b | Output |
---|---|---|
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} | {"type":"Polygon","coordinates":[[[0.5,0.0],[1.5,0.0],[1.5,1.0],[0.5,1.0],[0.5,0.0]]]} | {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[0.5,1.0],[1.0,1.0],[1.5,1.0],[1.5,0.0],[1.0,0.0],[0.5,0.0],[0.0,0.0]]]} |
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} | {"type":"Polygon","coordinates":[[[5.0,5.0],[5.0,6.0],[6.0,6.0],[6.0,5.0],[5.0,5.0]]]} | {"type":"MultiPolygon","coordinates":[[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]],[[[5.0,5.0],[5.0,6.0],[6.0,6.0],[6.0,5.0],[5.0,5.0]]]]} |
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} | {"type":"Polygon","coordinates":[[[1.0,0.0],[1.0,1.0],[2.0,1.0],[2.0,0.0],[1.0,0.0]]]} | {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[2.0,1.0],[2.0,0.0],[1.0,0.0],[0.0,0.0]]]} |
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} | {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]],[[0.25,0.25],[0.5,0.25],[0.5,0.5],[0.25,0.5],[0.25,0.25]]]} | {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} |
{"type":"LineString","coordinates":[[0.0,0.0],[1.0,0.0]]} | {"type":"Polygon","coordinates":[[[1.0,0.0],[1.0,1.0],[2.0,1.0],[2.0,0.0],[1.0,0.0]]]} | {"type":"GeometryCollection","geometries":[{"type":"LineString","coordinates":[[0.0,0.0],[1.0,0.0]]},{"type":"Polygon","coordinates":[[[1.0,0.0],[1.0,1.0],[2.0,1.0],[2.0,0.0],[1.0,0.0]]]}]} |
See more details here.
Supported in: Batch, Streaming
Convert GeoPoint to H3 index at given resolution. Returns null for resolution <0 or >15.
Expression categories: Geospatial
Output type: H3 Index
See more details here.
Supported in: Batch, Streaming
Convert geometry to H3 indices at a certain resolution. Resolution must be between 0 and 15, inclusive. For a polygon, three conversions are supported: a) H3 indices that fully cover the polygon, b) H3 indices that are fully contained by the polygon, c) H3 indices whose centroids are contained in the polygon. Returns null when the expected number of H3 indices exceed 7 million.
Expression categories: Geospatial
Output type: Array<H3 Index>
See more details here.
Supported in: Batch, Streaming
Encodes the envelope in an XZ curve.
Expression categories: Geospatial
Output type: Long
Argument values:
LON_LAT_10KM
envelope
envelope | Output |
---|---|
{ maxLat -> 2.0, maxLon -> 3.0, minLat -> 0.0, minLon -> 1.0, } | 16777222 |
{ maxLat -> 2.0, maxLon -> 3.0, minLat -> null, minLon -> 1.0, } | null |
See more details here.
Supported in: Batch, Streaming
Calculates the absolute true bearing (clockwise angle relative to geographical north) from the first point to the second point in degrees using a spherical approximation of the earth.
Expression categories: Geospatial
Output type: Double
Argument values:
end_point
start_point
start_point | end_point | Output |
---|---|---|
{ latitude: 40.69325025929194, longitude: -74.00522662934995, } | { latitude: 51.4988509390695, longitude: -0.1238396067697046, } | 51.20964213763489 |
See more details here.
Supported in: Batch, Streaming
Given a valid geometry or array of geometries, return a geometry representing the envelope of the input. The envelope is the smallest axis-aligned rectangular region containing the minimum and maximum x and y values of the geometry.
Expression categories: Geospatial
Output type: Geometry
Argument values:
geometry
geometry | Output |
---|---|
{"type":"Polygon","coordinates":[[[0.0,0.0],[1.0,0.0],[0.0,1.0]]]} | {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} |
See more details here.
Supported in: Batch, Streaming
Given a valid geometry or array of geometries, return a struct containing the bounds of the geometry or geometries.
Expression categories: Geospatial
Output type: LatLonBoundingBox
Argument values:
geometry
geometry | Output |
---|---|
{"type":"Polygon","coordinates":[[[0.0,0.0],[1.0,0.0],[0.0,1.0]]]} | { maxLat -> 1.0, maxLon -> 1.0, minLat -> 0.0, minLon -> 0.0, } |
See more details here.
Supported in: Batch, Streaming
Get all neighbors of an H3 index.
Expression categories: Geospatial
Output type: Array<H3 Index>
See more details here.
Supported in: Batch, Streaming
Extracts a field from a struct.
Expression categories: Struct
Output type: AnyType
Argument values:
struct
struct | Output |
---|---|
{ airline: { id: NA, }, } | NA |
{ airline: { id: FE, }, } | FE |
See more details here.
Supported in: Batch, Streaming
Given a valid GeoJSON input string, return a GeoJSON string that is the convex hull for the geometry. The convex hull is the smallest convex polygon containing the geometry.
Expression categories: Geospatial
Output type: Geometry
Argument values:
geometry
geometry | Output |
---|---|
{"type":"Polygon","coordinates":[[[0.0,0.0],[2.0,0.0],[2.0,1.0],[1.0,1.0],[1.0,2.0],[0.0,2.0],[0.0,0.0]]]} | {"type":"Polygon", "coordinates":[[[0.0, 0.0], [0.0, 2.0], [1.0, 2.0], [2.0, 1.0], [2.0, 0.0], [0.0, 0.0]]]} |
null | null |
See more details here.
Supported in: Batch, Streaming
Returns true if left is greater than right.
Expression categories: Numeric
Output type: Boolean
Argument values:
a
b
a | b | Output |
---|---|---|
1 | 0 | true |
1 | 1 | false |
0 | 1 | false |
See more details here.
Supported in: Batch, Streaming
Returns true if left is greater than or equal to right.
Expression categories: Boolean
Output type: Boolean
Argument values:
a
b
a | b | Output |
---|---|---|
1 | 0 | true |
1 | 1 | true |
0 | 1 | false |
See more details here.
Supported in: Batch, Streaming
Computes the greatest value amongst all input columns, skipping null values.
Expression categories: Numeric
Type variable bounds: T accepts ComparableType
Output type: T
Argument values:
a
, b
, c
]a | b | c | Output |
---|---|---|---|
1 | 2 | 3 | 3 |
1 | 3 | 2 | 3 |
3 | 2 | 1 | 3 |
See more details here.
Supported in: Batch, Streaming
Decompresses gzip-compressed binary into a string.
Expression categories: File
Output type: String
Argument values:
gzip
gzip | Output |
---|---|
H4sIAAAAAAAA//NIzcnJ11Eozy/KSVEEAObG5usNAAAA | Hello, world! |
See more details here.
Supported in: Batch, Streaming
Get children of an H3 index at given resolution specifying children coarseness. Returns null for resolution <0 or >15 or for children resolution lower than given H3 index's resolution.
Expression categories: Geospatial
Output type: Array<H3 Index>
See more details here.
Supported in: Batch, Streaming
Get parent of an H3 index at given resolution specifying parent coarseness. Returns null for resolution <0 or >15 or resolution higher than given index.
Expression categories: Geospatial
Output type: H3 Index
See more details here.
Supported in: Batch, Streaming
Convert H3 index to polygon.
Expression categories: Geospatial
Output type: Geometry
See more details here.
Supported in: Batch, Streaming
Hashes the input using sha256 hashing algorithm.
Expression categories: String
Output type: String
Argument values:
Output: 7f83b1657ff1fc53b92dc18148a1d65dfc2d4b1fa3d677284addd200126d9069
See more details here.
Supported in: Batch, Streaming
Returns a point interpolated along a line. Implementation interprets lines as the shortest path, using a spherical approximation of the globe.
Expression categories: Geospatial
Output type: GeoPoint
Argument values:
fraction
linestring
linestring | fraction | Output |
---|---|---|
{"type":"LineString","coordinates":[[0.0,2.0],[30.0,0.0]]} | 0.5 | { latitude: 1.0352686301676643, longitude: 15.004677545504547, } |
{"type":"LineString","coordinates":[[30.0,2.0],[50.0,3.0]]} | 0.8 | { latitude: 2.8256098405656185, longitude: 45.99752305664789, } |
{"type":"LineString","coordinates":[[45.0,9.0],[90.0,4.0]]} | 0.2 | { latitude: 8.363732883448177, longitude: 54.073497456494955, } |
See more details here.
Supported in: Batch, Streaming
Returns true if the input is nan, false otherwise.
Expression categories: Boolean
Output type: Boolean
Argument values:
Output: true
See more details here.
Supported in: Batch, Streaming
Returns true if the input is an empty struct, with recursive checking of inner arrays and structs.
Expression categories: Boolean
Output type: Boolean
Argument values:
struct
struct | Output |
---|---|
{ airline: { id: null, name: null, }, tail_no: null, } | true |
{ airline: { id: NA, name: null, }, tail_no: null, } | false |
See more details here.
Supported in: Batch, Streaming
Returns true if the list contains the value.
Expression categories: Boolean
Type variable bounds: T accepts ComparableType
Output type: Boolean
Description: You can check if the list contains the value. Argument values:
value
value | Output |
---|---|
BRR-123 | true |
ABC-543 | false |
See more details here.
Supported in: Batch, Streaming
Returns true if the input is not null, can optionally treat empty strings as null.
Expression categories: Boolean
Output type: Boolean
Argument values:
Output: true
See more details here.
Supported in: Batch, Streaming
Returns true if the input is null, can optionally treat empty strings as null.
Expression categories: Boolean
Output type: Boolean
Argument values:
Output: true
See more details here.
Supported in: Batch, Streaming
Returns true if the input is a valid GeoJSON input string. Not all GeoJSON strings are indexable by the ontology; use the "normalize geometry" expression to prepare geometry prior to Ontology use.
Expression categories: Geospatial
Output type: Boolean
Argument values:
geoJson
geoJson | Output |
---|---|
{"type":"Point","coordinates":[3.0, 5.0, 2.0]} | true |
{"type":"Polygon","coordinates":[[[0.0,0.0],[1.0,0.0],[0.0,1.0],[0.0,0.0]]]} | true |
{"type":"LineString","coordinates":[[0.0,0.0],[1.0,0.0]]} | true |
not a GeoJSON string | false |
See more details here.
Supported in: Batch, Streaming
Returns true if the input is a valid Geohash input string.
Expression categories: Geospatial
Output type: Boolean
Argument values:
geohash
geohash | Output |
---|---|
sk4d | true |
dt9zy9cg36j7 | true |
not a Geohash string | false |
null | false |
See more details here.
Supported in: Batch, Streaming
Returns true if the input is a valid H3 index string.
Expression categories: Geospatial
Output type: Boolean
Argument values:
h3
h3 | Output |
---|---|
862a1072fffffff | true |
not an h3 value | false |
See more details here.
Supported in: Batch, Streaming
Returns true if the input is a valid MGRS (military grid reference system) string.
Expression categories: Geospatial
Output type: Boolean
Argument values:
mgrs
mgrs | Output |
---|---|
4Q FJ 1 6 | true |
4Q FJ 12345 67890 | true |
See more details here.
Supported in: Batch, Streaming
Returns true if the input is a valid MIME type.
Expression categories: Boolean, Other
Output type: Boolean
See more details here.
Supported in: Batch, Streaming
Returns true if the input is a valid Ontology GeoPoint. Ontology GeoPoints are strings of the format '{lat},{lon}', where -90 <= lat <= 90 and -180 <= lon <= 180.
Expression categories: Geospatial
Output type: Boolean
Argument values:
geopoint
geopoint | Output |
---|---|
-35.307428203,149.122686883 | true |
149.122686883,-35.307428203 | false |
10.0, 20.0 | true |
10.0, 20.0 | true |
not a GeoPoint | false |
null | false |
(10.0,20.0) | false |
See more details here.
Supported in: Batch, Streaming
Returns true if the input is a valid gotham delegated media gid. Check gotham's delegated media rtfm for more details.
Expression categories: Boolean
Output type: Boolean
Argument values:
Output: true
See more details here.
Supported in: Batch, Streaming
Returns true if the input is a valid Foundry media reference.
Expression categories: Boolean
Output type: Boolean
See more details here.
Supported in: Batch, Streaming
Returns true if the input is a valid Foundry resource identifier.
Expression categories: Boolean
Output type: Boolean
See more details here.
Supported in: Batch, Streaming
Returns true if the input is a valid uuid.
Expression categories: Boolean
Output type: Boolean
See more details here.
Supported in: Batch, Streaming
Joins array with specified separator.
Expression categories: Array
Output type: String
Argument values:
Output: hello-world
See more details here.
Supported in: Batch
Returns the last day of the week/month/quarter/year.
Expression categories: Datetime
Output type: Date
See more details here.
Supported in: Batch, Streaming
Computes the least value amongst all input columns, skipping null values.
Expression categories: Boolean, Numeric
Type variable bounds: T accepts ComparableType
Output type: T
Argument values:
a
, b
, c
]a | b | c | Output |
---|---|---|---|
1 | 2 | 3 | 1 |
1 | 3 | 2 | 1 |
3 | 2 | 1 | 1 |
See more details here.
Supported in: Batch, Streaming
Extract left hand side of a string based on index.
Expression categories: String
Output type: String
Argument values:
Output: Hello
See more details here.
Supported in: Batch, Streaming
Left-pad the string column to width of length with pad.
Expression categories: String
Output type: String
Argument values:
Output: ***Hello world!
See more details here.
Supported in: Batch, Streaming
Returns the length of each value in a string column or an array column.
Expression categories: Array, Numeric
Output type: Integer
Argument values:
string
string | Output |
---|---|
hello | 5 |
bye | 3 |
See more details here.
Supported in: Batch, Streaming
Returns true if left is less than right.
Expression categories: Boolean
Output type: Boolean
Argument values:
left
right
left | right | Output |
---|---|---|
1.0 | 10 | true |
10.0 | 1 | false |
See more details here.
Supported in: Batch, Streaming
Returns true if left is less than or equal to right.
Expression categories: Boolean
Output type: Boolean
Argument values:
a
b
a | b | Output |
---|---|---|
1 | 0 | false |
1 | 1 | true |
0 | 1 | true |
See more details here.
Supported in: Batch, Streaming
Compute the levenshtein distance between two strings.
Expression categories: Distance measurement, String
Output type: Integer
Argument values:
left
right
left | right | Output |
---|---|---|
hello | hello | 0 |
hallo | hello | 1 |
hello | hEllO | 2 |
hello | hello, world! | 8 |
hello | farewell | 6 |
See more details here.
Supported in: Batch, Streaming
Calculates the natural logarithm, ln(x), of a column.
Expression categories: Numeric
Output type: Double
Argument values:
Output: 2.3148100626166146
See more details here.
Supported in: Batch, Streaming
Calculates logarithm with a given base.
Expression categories: Numeric
Output type: Double
Argument values:
Output: 3.0
See more details here.
Supported in: Batch, Streaming
Cast expression to given logical type. Unlike the regular cast expression, this expression will not change the underlying base representation of the data, but rather enforce the constraints associated with the specified logical type, so that the output can be used as the input to downstream expressions which specifically demand an instance of that logical type.
Expression categories: Cast
Type variable bounds: C accepts AnyType
Output type: C
Description: Successful cast to natural number Argument values:
Output: 1234
See more details here.
Supported in: Batch, Streaming
Converts all characters in string to lowercase.
Expression categories: String
Output type: String
Argument values:
Output: hello world
See more details here.
Supported in: Batch, Streaming
Map a set of values in a column to new values.
Expression categories: Data preparation
Type variable bounds: T1 accepts ComparableType**T2 accepts AnyType
Output type: T2
Argument values:
country
country | Output |
---|---|
United Kingdom | UK |
Denmark | DNK |
United States of America | null |
See more details here.
Supported in: Batch, Streaming
Returns modulus of an expression.
Expression categories: Numeric
Output type: DefiniteNumeric
Argument values:
Output: 2.123
See more details here.
Supported in: Batch, Streaming
Calculates the product of all input columns.
Expression categories: Numeric
Output type: Numeric
Argument values:
col_a
, col_b
, col_c
]col_a | col_b | col_c | Output |
---|---|---|---|
10 | 2 | 3 | 60 |
See more details here.
Supported in: Batch, Streaming
Expression categories: Numeric
Output type: Numeric
See more details here.
Supported in: Batch, Streaming
Returns a column of normally distributed random numbers with zero mean and unit variance. This is not deterministic and will not produce the same result on repeated builds, even when using a seed.
Expression categories: Numeric
Output type: Double
See more details here.
Supported in: Batch, Streaming
Returns the negated boolean value of a boolean expression.
Expression categories: Boolean
Output type: Boolean
Argument values:
boolean
boolean | Output |
---|---|
true | false |
false | true |
See more details here.
Supported in: Batch, Streaming
Returns the nth ring in a single polygon in the geometry. Indexing is 1-based, and an index of 0 is out-of-bounds. An index equal to 1 returns an external ring. An index greater than 1 returns an internal ring. Returns null for any of the following conditions: geometry isn't a single polygon, a feature collection or geometry collection is provided, index is out-of-bounds, or at least one argument is null.
Expression categories: Geospatial
Output type: Geometry
Argument values:
n
polygon
polygon | n | Output |
---|---|---|
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"} | 1 | {"coordinates": [[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]], "type": "LineString"} |
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"} | 2 | null |
{"coordinates":[[[60.0,60.0],[50.0,60.0],[50.0,50.0],[60.0,50.0],[60.0,60.0]],[[57.0,57.0],[55.0,52.0],[52.0,52.0],[50.0,57.0],[57.0,57.0]]],"type":"Polygon"} | 1 | {"coordinates": [[60.0,60.0],[50.0,60.0],[50.0,50.0],[60.0,50.0],[60.0,60.0]], "type": "LineString"} |
{"coordinates":[[[60.0,60.0],[50.0,60.0],[50.0,50.0],[60.0,50.0],[60.0,60.0]],[[57.0,57.0],[55.0,52.0],[52.0,52.0],[50.0,57.0],[57.0,57.0]]],"type":"Polygon"} | 2 | {"coordinates": [[57.0,57.0],[55.0,52.0],[52.0,52.0],[50.0,57.0],[57.0,57.0]], "type": "LineString"} |
See more details here.
Supported in: Batch, Streaming
Returns the nth point in a single linestring in the geometry. Indexing is 1-based, and an index of 0 is out-of-bounds. A negative index is counted backwards from the end of the linestring, so that -1 is the last point. Returns null for any of the following conditions: geometry isn't a single linestring, a feature collection or geometry collection is provided, index is out-of-bounds, or at least one argument is null.
Expression categories: Geospatial
Output type: GeoPoint
Argument values:
linestring
n
linestring | n | Output |
---|---|---|
{"type":"LineString","coordinates":[[30.0,2.0],[35.0,0.0],[50.0,3.0]]} | 1 | { latitude: 2.0, longitude: 30.0, } |
{"type":"LineString","coordinates":[[30.0,2.0],[35.0,0.0],[50.0,3.0]]} | 3 | { latitude: 3.0, longitude: 50.0, } |
{"type":"LineString","coordinates":[[45.0,9.0],[90.0,4.0],[40.0,0.0]]} | -1 | { latitude: 0.0, longitude: 40.0, } |
See more details here.
Supported in: Batch, Streaming
Convert empty strings to null.
Expression categories: String
Output type: String
Argument values:
Output: null
See more details here.
Supported in: Batch, Streaming
Returns true if any of the specified conditions are true. Nulls are considered false.
Expression categories: Boolean
Output type: Boolean
Argument values:
left_boolean
, right_boolean
]left_boolean | right_boolean | Output |
---|---|---|
true | true | true |
true | false | true |
false | true | true |
false | false | false |
See more details here.
Supported in: Batch
Expression categories: Media
Output type: Array<Struct<level, title, page>>
Argument values:
Media Reference
Media Reference | Output |
---|---|
{"mimeType":"application/pdf","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}} | [ { level: 0, page: 2, title: Chapter 1, }, { **l... |
See more details here.
Supported in: Batch, Streaming
Convert GeoJSON string from a non-WGS 84 coordinate system to WGS 84 geometry. For GeoJSON already in WGS 84 (longitude, latitude), the "logical type cast" expression can convert directly with less overhead. Returns null for strings that fail during parsing or conversion.
Expression categories: Geospatial
Output type: Geometry
Argument values:
geojson_string
geojson_string | Output |
---|---|
{"type":"Point","coordinates":[320000.0,4300000.0]} | {"type":"Point","coordinates":[-77.07368071728229,38.83040844313318]} |
{"type":"LineString","coordinates":[[320000.0,4300000.0],[320100.0,4300000.0]]} | {"type":"LineString","coordinates":[[-77.07368071728229,38.83040844313318],[-77.0725293738795,38.83042888342659]]} |
{"type":"Polygon","coordinates":[[[320000.0,4300000.0],[320100.0,4300000.0],[320000.0,4300100.0],[320000.0,4300000.0]]]} | {"type":"Polygon","coordinates":[[[-77.07368071728229,38.83040844313318],[-77.0725293738795,38.83042888342659],[-77.07370685720375,38.83130901341597],[-77.07368071728229,38.83040844313318]]]} |
See more details here.
Supported in: Batch, Streaming
Parses xml strings following the given schema definition, ignoring any fields not in the schema.
Expression categories: File, Struct
Output type: Struct
Argument values:
xml
xml | Output |
---|---|
<airline> <id>XB-112</id> <airport> <id>JFK</id> <miles>2000</miles> </airport> </airline> | { airport: { id: JFK, miles: 2000, }, id: XB-112, } |
See more details here.
Supported in: Batch
Returns the markings parsed from a given classification string. This output is formatted as a struct, where the first element of the struct is the list of strings of relevant markings. This list is null if the classification string is invalid. The second element of the struct is the string of error message(s). This string is null if there are no such messages (if the classification string is valid). Returns null if the classification string is null.
Expression categories: Other
Output type: Struct<groupNames<String>, errors>
See more details here.
Supported in: Batch
Parses an ISO8601 string duration and start time to its length in a specific time unit.
Expression categories: Datetime, String
Output type: Long
Argument values:
SECONDS
Output: 90
See more details here.
Supported in: Batch, Streaming
Parses json strings following the given schema definition, ignoring any fields not in the schema.
Expression categories: Data preparation, File, Popular, Struct
Output type: Array<AnyType> | Map<String, String> | Struct
Argument values:
json
json | Output |
---|---|
{ "airline": "XB-112", "airport": { "id": "JFK", "miles": 2000 } } | { airline: XB-112, airport: { id: JFK, miles: 2000, }, } |
See more details here.
Supported in: Batch, Streaming
Normalizes phone numbers to a common format, parsing them from various regions and formats. Phone numbers containing the + sign followed by the region code will be parsed correctly even if the region is not set. All other number formats require a region to be selected from the options provided in order for them to be correctly parsed. Phone numbers that cannot be parsed will result in nulls.
Expression categories: String
Output type: Phone Number
Description: Return formatted US phone number Argument values:
phoneNumber
E164
US
phoneNumber | Output |
---|---|
(234) 235-5678 | +12342355678 |
+1 415 5552671 | +14155552671 |
(415) 5552671 | +14155552671 |
Whatsapp@14155552671 | +14155552671 |
See more details here.
Supported in: Batch, Streaming
Converts well-known binary (WKB) to geometry logical type. Invalid WKB input will be returned as null. Optionally supply a source coordinate system identifier to convert from the source coordinate system to WGS 84 if the WKB is not in WGS 84 already.
Expression categories: Geospatial
Output type: Geometry
Argument values:
wkb
wkb | Output |
---|---|
AAAAAAFACAAAAAAAAEAUAAAAAAAA | {"type":"Point","coordinates":[3.0, 5.0]} |
AIAAAAFACAAAAAAAAEAUAAAAAAAAQAAAAAAAAAA= | {"type":"Point","coordinates":[3.0, 5.0, 2.0]} |
AAAAAAMAAAABAAAABAAAAAAAAAAAAAAAAAAAAAA/8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA/8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= | {"type":"Polygon","coordinates":[[[0.0,0.0],[1.0,0.0],[0.0,1.0],[0.0,0.0]]]} |
AAAAAAIAAAACAAAAAAAAAAAAAAAAAAAAAD/wAAAAAAAAAAAAAAAAAAA= | {"type":"LineString","coordinates":[[0.0,0.0],[1.0,0.0]]} |
See more details here.
Supported in: Batch, Streaming
Converts well-known text (WKT) string to geometry logical type. Invalid WKT input will be returned as null. Optionally supply a source coordinate system identifier to convert from the source coordinate system to WGS 84 if the WKT is not in WGS 84 already.
Expression categories: Geospatial
Output type: Geometry
Argument values:
wkt
wkt | Output |
---|---|
POINT (3.0 5.0 2.0) | {"type":"Point","coordinates":[3.0, 5.0, 2.0]} |
POLYGON ((0.0 0.0, 1.0 0.0, 0.0 1.0, 0.0 0.0)) | {"type":"Polygon","coordinates":[[[0.0,0.0],[1.0,0.0],[0.0,1.0],[0.0,0.0]]]} |
LINESTRING (0.0 0.0, 1.0 0.0) | {"type":"LineString","coordinates":[[0.0,0.0],[1.0,0.0]]} |
See more details here.
Supported in: Batch, Streaming
Calculates perimeter of a geometry in meters using a spherical approximation of the globe. For a line string or a point, this equals 0.
Expression categories: Geospatial
Output type: Double
See more details here.
Supported in: Batch
Returns positive modulus of an expression.
Expression categories: Numeric
Type variable bounds: T1 accepts Byte | Integer | Long | Short**T2 accepts Byte | Integer | Long | Short
Output type: T1
Argument values:
Output: 1
See more details here.
Supported in: Batch, Streaming
Calculates power of expression to exponent. If any of the values is null, returns null.
Expression categories: Numeric
Output type: Double
Argument values:
Output: 1000.0
See more details here.
Supported in: Batch, Streaming
Prepares a geometry for downstream use, for example indexing to the ontology, by converting a geometry string into valid GeoJSON. Polygons will be closed and deduplicated. Geometries which cross the anti-meridian (as indicated by width > 180 degrees) will be split into multiple features on each side of the anti-meridian. Outputs null if the input string cannot be read as GeoJSON or if the geometry contains out-of-bounds coordinates.
Expression categories: Geospatial
Output type: Geometry
Argument values:
geometry
geometry | Output |
---|---|
{"type":"Polygon","coordinates":[[[0.0,0.0],[1.0,0.0], [0.0,1.0], [0.0,0.0]]]} | {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,0.0],[0.0,0.0]]]} |
{"type":"Polygon","coordinates":[[[0.0,0.0,1.0],[1.0,0.0,1.0], [0.0,1.0,1.0],[0.0,0.0,1.0]]]} | {"type":"Polygon","coordinates":[[[0.0,0.0,1.0],[0.0,1.0,1.0],[1.0,0.0,1.0],[0.0,0.0,1.0]]]} |
{"type":"Polygon","coordinates":[[[0.0,0.0],[1.0,0.0], [0.0,1.0]]]} | {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,0.0],[0.0,0.0]]]} |
{"type":"Polygon","coordinates":[[[0.0,0.0],[1.0,0.0], [1.0,0.0], [0.0,1.0], [0.0,0.0]]]} | {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,0.0],[0.0,0.0]]]} |
{"type":"Polygon","coordinates":[[[179.0,-30.0],[-179.0,-30.0],[-179.0,30.0],[179.0,30.0],[179.0,-30]]]} | {"type":"MultiPolygon","coordinates":[[[[-180.0,-30.0],[-180.0,30.0],[-179.0,30.0],[-179.0,-30.0],[-180.0,-30.0]]],[[[180.0,30.0],[180.0,-30.0],[179.0,-30.0],[179.0,30.0],[180.0,30.0]]]]} |
{"type":"LineString","coordinates":[[179.0,30.0],[-179.0,30.0]]} | {"type":"MultiLineString","coordinates":[[[179.0,30.0],[180.0,30.0]],[[-180.0,30.0],[-179.0,30.0]]]} |
{"type":"GeometryCollection","geometries":[{"type":"LineString","coordinates":[[40.0,10.0],[0.0,1.0]... | {"type":"GeometryCollection","geometries":[{"type":"LineString","coordinates":[[40.0,10.0],[0.0,1.0]... |
{"type":"GeometryCollection","geometries":[{"type":"Point","coordinates":[1.0,0.0]},{"type":"LineString","coordinates":[[179.0,30.0],[-179.0,30.0]]}]} | {"type":"GeometryCollection","geometries":[{"type":"Point","coordinates":[1.0,0.0]},{"type":"MultiLineString","coordinates":[[[179.0,30.0],[180.0,30.0]],[[-180.0,30.0],[-179.0,30.0]]]}]} |
See more details here.
Supported in: Batch, Streaming
Reduces array elements using an expression.
Expression categories: Array
Type variable bounds: T accepts Array<Boolean | Byte | Date | Double | Float | Integer | Long | Map<AnyType, AnyType> | Short | String | Timestamp> | Boolean | Byte | Date | Double | Float | Integer | Long | Map<AnyType, AnyType> | Short | String | Timestamp
Output type: T
Argument values:
miles
accumulator
, element
],miles | Output |
---|---|
[ 12300, 12342 ] | 24642 |
See more details here.
Supported in: Batch, Streaming
Extracts the specified group from a regex. Returns empty string when no match is found.
Expression categories: Regex, String
Output type: String
Description: Extract the first two initials from the first match. Argument values:
Output: MT
See more details here.
Supported in: Batch, Streaming
Matches an expression against a regular expression. Regular expression can match any part of the string.
Expression categories: Regex, String
Output type: Boolean
Description: You can find regex patterns. Argument values:
Output: true
See more details here.
Supported in: Batch
Returns an array of indices at which the regular expression pattern is found in the given expression.
Expression categories: Regex, String
Output type: Array<Integer>
Description: You can find regex patterns and their indices. Argument values:
Output: [ 0, 2, 4 ]
See more details here.
Supported in: Batch, Streaming
Matches an expression against a regular expression. Regular expression must match the whole string.
Expression categories: Regex, String
Output type: Boolean
Description: You can match regex patterns Argument values:
Output: true
See more details here.
Supported in: Batch, Streaming
Replace a string using a regex pattern.
Expression categories: Regex, String
Output type: String
Argument values:
tail_number
tail_number | Output |
---|---|
MT-123 | **-123 |
XB-434 | **-434 |
MT-123, XB-434 | **-123, **-434 |
See more details here.
Supported in: Batch, Streaming
Rename fields within a struct.
Expression categories: Data preparation, Struct
Output type: Struct
Argument values:
struct
struct | Output |
---|---|
{ airline: { id: NA, }, } | { airline: { identifier: NA, }, } |
{ airline: { id: FE, }, } | { airline: { identifier: FE, }, } |
See more details here.
Supported in: Batch, Streaming
Extract right hand side of a string based on index.
Expression categories: String
Output type: String
Argument values:
Output: world!
See more details here.
Supported in: Batch, Streaming
Right-pad the string column to width of length with pad. If the length of the string is greater than the length provided, it will be trimmed.
Expression categories: String
Output type: String
Argument values:
Output: Hello world!***
See more details here.
Supported in: Batch, Streaming
Round number to 'scale' decimal places.
Expression categories: Numeric
Output type: Decimal | Double | Float
Argument values:
Output: 10.12
See more details here.
Supported in: Batch, Streaming
Takes the secant of an angle.
Expression categories: Numeric
Output type: Double
Argument values:
degrees
angle
angle | Output |
---|---|
0.0 | 1.0 |
90.0 | 1.633123935319537E16 |
180.0 | -1.0 |
See more details here.
Supported in: Batch, Streaming
Converts the first character of the first word to be uppercase.
Expression categories: String
Output type: String
Argument values:
Output: Hello world
See more details here.
Supported in: Batch, Streaming
Creates an array with numbers in range from start to end.
Expression categories: Array
Type variable bounds: T accepts Byte | Integer | Long | Short
Output type: Array<T>
Description: Sequences increase by 1 unless otherwise specified. Argument values:
Output: [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ]
See more details here.
Supported in: Batch, Streaming
This expression simplifies GeoJSON geometry by removing points within the given tolerance distance using a spherical model of the globe. Loops smaller than the tolerance may be removed entirely.
Expression categories: Geospatial
Output type: Geometry
Argument values:
Geometry
Tolerance
Geometry | Tolerance | Output |
---|---|---|
{"type":"LineString","coordinates":[[30.0,0.0],[35.0,0.0],[40.0,0.0]]} | 1000 | {"type":"LineString","coordinates":[[30.0,0.0],[40.0,0.0]]} |
{"type":"Polygon","coordinates":[[[-1.0,-1.0],[1.0,-1.0],[1.0,1.0],[0.0,1.0],[-1.0,1.0],[-1.0,-1.0]]]} | 12000 | {"type":"Polygon","coordinates":[[[-1.0,1.0],[1.0,1.0],[1.0,-1.0],[-1.0,-1.0],[-1.0,1.0]]]} |
{"type":"MultiLineString","coordinates":[[[0.0,0.0],[5.0,0.1],[10.0,0.0]], [[0.0,-5.0],[5.0,0.1],[10.0,5.0]]]} | 12000 | {"type":"MultiLineString","coordinates":[[[0.0,0.0],[10.0,0.0]],[[0.0,-5.0],[10.0,5.0]]]} |
{"type":"MultiPolygon","coordinates":[[[[-2.0,-2.0],[2.0,-2.0],[2.0,2.0],[0.0,2.1],[-2.0,2.0],[-2.0,... | 12000 | {"type":"MultiPolygon","coordinates":[[[[-2.0,2.0],[2.0,2.0],[2.0,-2.0],[-2.0,-2.0],[-2.0,2.0]], [[1... |
See more details here.
Supported in: Batch, Streaming
Takes the sine of an angle.
Expression categories: Numeric
Output type: Double
Argument values:
degrees
angle
angle | Output |
---|---|
0.0 | 0.0 |
90.0 | 1.0 |
180.0 | 0.0 |
See more details here.
Supported in: Batch, Streaming
Skip a given number of bytes in a binary column.
Expression categories: Binary
Output type: Binary
Argument values:
Output: aQ==
See more details here.
Supported in: Batch, Streaming
Returns the array sliced from the first position to the second position. First position must be 1 or higher. If second position is longer than the array, the entire rest of the array will be returned.
Expression categories: Array
Type variable bounds: T accepts AnyType
Output type: Array<T>
See more details here.
Supported in: Batch
Compute the soundex encoding (a phonetic representation) for a word.
Expression categories: String
Output type: String
Argument values:
input_string
input_string | Output |
---|---|
cat | C300 |
caat | C300 |
two | T000 |
too | T000 |
to | T000 |
four | F600 |
for | F600 |
fore | F600 |
fur | F600 |
meow | M000 |
me ow | M000 |
See more details here.
Supported in: Batch, Streaming
Split string on specified regex pattern.
Expression categories: String
Output type: Array<String>
Argument values:
string
string | Output |
---|---|
hello | [ hello ] |
hello world | [ hello, world ] |
hello there world | [ hello, there world ] |
See more details here.
Supported in: Batch, Streaming
Calculates the square root of a column.
Expression categories: Numeric
Output type: Double
Argument values:
Output: 3.0
See more details here.
Supported in: Batch, Streaming
Expression categories: Boolean, String
Output type: Boolean
Argument values:
Output: true
See more details here.
Supported in: Batch, Streaming
Extract the string after the first delimiter. Return full string if no matches are found.
Expression categories: String
Output type: String
Argument values:
Output: world
See more details here.
Supported in: Batch, Streaming
Extract the string before the first delimiter. Return the full string if no matches are found.
Expression categories: String
Output type: String
Argument values:
Output: ...
See more details here.
Supported in: Batch, Streaming
Expression categories: Boolean, String
Output type: Boolean
Argument values:
Output: true
See more details here.
Supported in: Batch, Streaming
Extract substring.
Expression categories: Numeric
Output type: String
Argument values:
string
start
length
string | start | length | Output |
---|---|---|---|
hello, world | 1 | 5 | hello |
hello, world | 8 | 5 | world |
hello, world | -5 | 5 | world |
See more details here.
Supported in: Batch, Streaming
Calculates the difference between a number and all input columns.
Expression categories: Numeric
Output type: Numeric
Argument values:
col_b
, col_c
]col_a
col_a | col_b | col_c | Output |
---|---|---|---|
5 | 3 | 2 | 0 |
2 | 4 | 0 | -2 |
-2 | -4 | -2 | 4 |
See more details here.
Supported in: Batch, Streaming
Subtract one number from another number.
Expression categories: Numeric
Output type: Numeric
Argument values:
col_a
col_b
col_a | col_b | Output |
---|---|---|
32 | 4 | 28 |
-5 | -3 | -2 |
See more details here.
Supported in: Batch, Streaming
Returns the difference in the given time unit.
Expression categories: Datetime
Output type: Long
Argument values:
HOURS
Output: 1
See more details here.
Supported in: Batch, Streaming
Returns the date that is 'value' days/weeks/months/quarter/years before 'start'.
Expression categories: Datetime
Output type: Date
Argument values:
DAYS
Output: 2022-04-03
See more details here.
Supported in: Batch, Streaming
Sums the elements contained within the array.
Expression categories: Array
Type variable bounds: T accepts DefiniteNumeric
Output type: T
Argument values:
Output: 6
See more details here.
Supported in: Batch, Streaming
Takes the tangent of an angle.
Expression categories: Numeric
Output type: Double
Argument values:
degrees
angle
angle | Output |
---|---|
0.0 | 0.0 |
90.0 | 1.633123935319537E16 |
180.0 | 0.0 |
See more details here.
Supported in: Batch, Streaming
Extract a series of text segments using sliding window segmentation.
Expression categories: String
Output type: Array<String>
See more details here.
Supported in: Batch
Converts text into embeddings.
Expression categories: String
Output type: Embedded vector
Description: Example embeddings for the word 'palantir'. Argument values:
text
text | Output |
---|---|
palantir | [ -0.019182289, -0.02127992, 0.009529043, -0.008066221, -0.0014429842, 0.019154688, -0.023556953, -0... |
See more details here.
Supported in: Batch, Streaming
Add value to timestamp in specified unit.
Expression categories: Datetime
Output type: Timestamp
Argument values:
MILLISECONDS
Output: 2022-02-01T00:00:00.002Z
See more details here.
Supported in: Batch
Creates an array with timestamps in range from start to end.
Expression categories: Datetime
Output type: Array<Timestamp>
Argument values:
end_time
start_time
DAYS
start_time | end_time | Output |
---|---|---|
2023-01-01T00:00:00Z | 2023-01-03T00:00:00Z | [ 2023-01-01T00:00:00Z, 2023-01-02T00:00:00Z, 2023-01-03T00:00:00Z ] |
2023-01-01T01:50:00Z | 2023-01-03T00:00:00Z | [ 2023-01-01T01:50:00Z, 2023-01-02T01:50:00Z ] |
See more details here.
Supported in: Batch, Streaming
Subtract value from timestamp in specified unit.
Expression categories: Datetime
Output type: Timestamp
Argument values:
MILLISECONDS
Output: 2022-02-01T23:59:59.998Z
See more details here.
Supported in: Batch, Streaming
Converts from timestamp in UTC to epoch milliseconds.
Expression categories: Cast, Datetime
Output type: Long
Argument values:
Output: 1664614800000
See more details here.
Supported in: Batch, Streaming
Converts from timestamp in UTC to epoch seconds.
Expression categories: Cast, Datetime
Output type: Long
Argument values:
Output: 1664614873
See more details here.
Supported in: Batch, Streaming
Converts the first character of each word to be uppercase and the rest lowercase.
Expression categories: String
Output type: String
Argument values:
Output: Hello World
See more details here.
Supported in: Batch
Transcribe audio files into json using cpu.
Expression categories: Media
Output type: String
See more details here.
Supported in: Batch
Transcribe audio files into json using gpu.
Expression categories: Media
Output type: String
See more details here.
Supported in: Batch
Transcribe audio files into text.
Expression categories: Media
Output type: String | Struct<ok, error>
See more details here.
Supported in: Batch, Streaming
Maps each element of an array using an expression. Note, array index starts at 1.
Expression categories: Array
Type variable bounds: T accepts AnyType
Output type: Array<T>
Argument values:
flight_number
element
,flight_number | Output |
---|---|
[ XB-134, MT-111 ] | [ XB, MT ] |
See more details here.
Supported in: Batch, Streaming
Transforms keys of a map by applying an expression to every key-value pair.
Expression categories: Map
Type variable bounds: K accepts AnyType**V accepts AnyType
Output type: Map<K, V>
Argument values:
key
,flight_number
flight_number | Output |
---|---|
{ MT-111 -> 2, XB-134 -> 1, } | { MT -> 2, XB -> 1, } |
See more details here.
Supported in: Batch
Transforms values of a map by applying an expression to every key-value pair.
Expression categories: Map
Type variable bounds: K accepts AnyType**V accepts AnyType
Output type: Map<K, V>
Argument values:
value
,flight_number
flight_number | Output |
---|---|
{ 1 -> XB-134, 2 -> MT-111, } | { 1 -> XB, 2 -> MT, } |
See more details here.
Supported in: Batch, Streaming
Trims whitespace at beginning and end of string. Whitespace is defined as characters in any of: 1) Unicode's \p{whitespace} set, 2) Java's String#trim() method, or 3) Java's Character#isWhitespace() method.
Expression categories: Data preparation, String
Output type: String
Argument values:
Output: hello world
See more details here.
Supported in: Batch
Returns the date rounded down to the nearest day/week/month/quarter/year.
Expression categories: Datetime
Output type: Date
See more details here.
Supported in: Batch
Returns the timestamp truncated to the specified unit.
Expression categories: Datetime
Output type: Timestamp
Argument values:
MILLISECONDS
Output: 2022-02-01T10:10:10.002Z
See more details here.
Supported in: Batch, Streaming
Uncompact H3 indices to the specified resolution. All input indices must be at a resolution less than or equal to the requested resolution or this transform will return null. If any of the input indices are invalid this transform will return null. Output indices are sorted in ascending order.
Expression categories: Geospatial
Output type: Array<H3 Index>
See more details here.
Supported in: Batch, Streaming
Perform unicode normalization as per Unicode Standard Annex #15.
Expression categories: Data preparation, String
Output type: String
Argument values:
string
nfkc
string | Output |
---|---|
123 | 123 |
イナゴ | イナゴ |
See more details here.
Supported in: Batch, Streaming
Returns a column of uniform random numbers drawn between 0 and 1. This is not deterministic and will not produce the same result on repeated builds, even when using a seed.
Expression categories: Numeric
Output type: Double
See more details here.
Supported in: Batch, Streaming
Returns a column of uuids. This is not deterministic and will not produce the same result on repeated builds. This is not the preferred way to build an id column and users should look into sha256 or others that are deterministic.
Expression categories: String
Output type: String
See more details here.
Supported in: Batch, Streaming
Converts all characters in string to uppercase.
Expression categories: String
Output type: String
Argument values:
Output: HELLO WORLD
See more details here.
Supported in: Batch, Streaming
Decodes a percent-encoded string to plain text.
Expression categories: Cast, String
Output type: String
Argument values:
string
string | Output |
---|---|
raw_string_with_no_special_characters | raw_string_with_no_special_characters |
test%2Fapi%3Fstring%3D3 | test/api?string=3 |
See more details here.
Supported in: Batch, Streaming
Percent-encodes a string to be sent in a url.
Expression categories: String
Output type: String
Argument values:
string
string | Output |
---|---|
raw_string_with_no_special_characters | raw_string_with_no_special_characters |
test/api?string=3 | test%2Fapi%3Fstring%3D3 |
See more details here.
Supported in: Batch
Call an LLM with a configurable prompt.
Expression categories: String
Output type: Array<AnyType> | Boolean | Date | Decimal | Double | Float | Integer | Long | Short | String | Struct | Struct<ok<AnyType> | Boolean | Date | Decimal | Double | Float | Integer | Long | Short | String | Struct | Timestamp, error> | Timestamp
Argument values:
prompt
prompt | Output |
---|---|
The food was great! | 5 |
See more details here.
Supported in: Batch, Streaming
Get a value from a map using a key.
Expression categories: Map
Type variable bounds: K accepts ComparableType**V accepts AnyType
Output type: V
Argument values:
Output: Hello
See more details here.
Supported in: Batch
Calculate the boolean 'and' of an aggregate. Nulls are considered false.
Expression categories: Aggregate
Output type: Boolean
Argument values:
values
Given input table:
values |
---|
true |
false |
true |
Outputs: false
See more details here.
Supported in: Batch
Calculate the boolean 'or' of an aggregate. Nulls are considered false.
Expression categories: Aggregate
Output type: Boolean
Argument values:
values
Given input table:
values |
---|
true |
false |
true |
Outputs: true
See more details here.
Supported in: Batch
Computes approximate median of values in the column.
Expression categories: Aggregate
Output type: Numeric
Argument values:
values
Given input table:
values |
---|
2 |
4 |
3 |
Outputs: 3
See more details here.
Supported in: Batch
Returns the approximate percentile of the expression which is the smallest value in the ordered expression values (sorted from least to greatest) such that no more than percentage of expression values is less than the value or equal to that value.
Expression categories: Aggregate
Output type: Array<Numeric> | Byte | Decimal | Double | Float | Integer | Long | Short
Argument values:
values
Given input table:
values |
---|
2 |
4 |
3 |
Outputs: 3
See more details here.
Supported in: Batch, Streaming
Collects an array of values within each group. Null values are ignored.
Expression categories: Aggregate
Type variable bounds: T accepts AnyType
Output type: Array<T>
Argument values:
factor
Given input table:
factor |
---|
2 |
2 |
3 |
Outputs: [ 2, 2, 3 ]
See more details here.
Supported in: Batch, Streaming
Collects an array of deduplicated values within each group. Null values are ignored.
Expression categories: Aggregate
Type variable bounds: T accepts ComparableType
Output type: Array<T>
Argument values:
factor
Given input table:
factor |
---|
2 |
2 |
3 |
Outputs: [ 2, 3 ]
See more details here.
Supported in: Batch, Streaming
Calculate the population covariance of values in two columns.
Expression categories: Aggregate
Output type: Double
Argument values:
left
right
Given input table:
left | right |
---|---|
1 | 5 |
2 | 4 |
3 | 3 |
4 | 2 |
5 | 1 |
Outputs: -2.0
See more details here.
Supported in: Batch
Given a column of GeoPoints and an ordering, return either a polygon or a line string by connecting the GeoPoints in the specified order. This function assumes that the data is tabular, with a single row representing an individual GeoPoint in a line string or in the shell of a polygon, along with a column specifying the order of those points. For a polygon this ordering should identify the points as you move counter-clockwise around the shell. Given an ordering of these points and a partition (grouping), the function constructs the required geometry for that partition by joining the GeoPoints in ascending order of the order-by column.
Expression categories: Geospatial
Output type: Geometry
Argument values:
geo_point
order
LINE_STRING
Given input table:
geo_point | order |
---|---|
{ latitude -> 0.0, longitude -> 0.0, } | 0 |
{ latitude -> 1.0, longitude -> 0.0, } | 1 |
{ latitude -> 1.0, longitude -> 1.0, } | 2 |
Outputs: {"type":"LineString","coordinates": [[0.0,0.0],[0.0, 1.0],[1.0,1.0]]}
See more details here.
Supported in: Batch
Returns the rank of rows within a window partition, without any gaps. In case of ties the rows get same rank. The difference between rank and dense_rank is that dense_rank leaves no gaps in ranking sequence when there are ties.
Expression categories: Aggregate
Output type: Integer
See more details here.
Supported in: Batch, Streaming
Calculate distinct number of values in column.
Expression categories: Aggregate
Output type: Long
Argument values:
values
Given input table:
values |
---|
2 |
4 |
3 |
Outputs: 3
See more details here.
Supported in: Batch
First item in the group. Note, if used within an aggregate or unordered window, the row selected will be non-deterministic.
Expression categories: Aggregate
Type variable bounds: T accepts AnyType
Output type: T
Argument values:
values
Given input table:
values |
---|
null |
2 |
4 |
3 |
Outputs: null
See more details here.
Supported in: Batch
Returns the envelope of all valid geometries in the given column. Invalid geometries are treated as null and ignored.
Expression categories: Geospatial
Output type: Geometry
Argument values:
geometry
Given input table:
geometry |
---|
{"type":"LineString","coordinates":[[1,0],[0,8.4]]} |
{"type":"Point","coordinates":[125.6, -92.3]} |
{"type":"Polygon","coordinates":[[[0,0],[1,6.3],[-6,1],[0,0]]]} |
Outputs: {"type":"Polygon","coordinates":[[[-6.0,-92.3],[-6.0,8.4],[125.6,8.4],[125.6,-92.3],[-6.0,-92.3]]]}
See more details here.
Supported in: Batch
Combines the grouped geometries to create a single geometry.
Expression categories: Geospatial
Output type: Geometry
Argument values:
geometry
Given input table:
geometry |
---|
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} |
{"type":"Polygon","coordinates":[[[0.5,0.0],[1.5,0.0],[1.5,1.0],[0.5,1.0],[0.5,0.0]]]} |
Outputs: {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[0.5,1.0],[1.0,1.0],[1.5,1.0],[1.5,0.0],[1.0,0.0],[0.5,0.0],[0.0,0.0]]]}
See more details here.
Supported in: Batch
Returns a struct containing the entire bounding box of all valid geometries in the given column. Invalid geometries are treated as null and ignored.
Expression categories: Geospatial
Output type: LatLonBoundingBox
Argument values:
geometry
Given input table:
geometry |
---|
{"type":"LineString","coordinates":[[1,0],[0,8.4]]} |
{"type":"Point","coordinates":[125.6, -92.3]} |
{"type":"Polygon","coordinates":[[[0,0],[1,6.3],[-6,1],[0,0]]]} |
Outputs: {
maxLat -> 8.4,
maxLon -> 125.6,
minLat -> -92.3,
minLon -> -6.0,
}
See more details here.
Supported in: Batch
Returns the value of the input at 'lag' before the current row in the window.
Expression categories: Aggregate
Type variable bounds: T accepts AnyType
Output type: T
See more details here.
Supported in: Batch, Streaming
Last item in the group. Note, if used within an aggregate or unordered window, the row selected will be non-deterministic.
Expression categories: Aggregate
Type variable bounds: T accepts AnyType
Output type: T
Argument values:
values
Given input table:
values |
---|
2 |
4 |
3 |
null |
Outputs: null
See more details here.
Supported in: Batch
Returns the value of the input at 'lead' after the current row in the window.
Expression categories: Aggregate
Type variable bounds: T accepts AnyType
Output type: T
See more details here.
Supported in: Batch
Calculate the linear regression gradient of the right-hand side (output variable) and the left-hand side (input variable).
Expression categories: Aggregate
Output type: Double
Argument values:
left
right
Given input table:
left | right |
---|---|
1 | 5 |
2 | 4 |
3 | 3 |
4 | 2 |
5 | 1 |
Outputs: -1.0
See more details here.
Supported in: Batch, Streaming
Calculate maximum value in column.
Expression categories: Numeric
Output type: ComparableType
Argument values:
values
Given input table:
values |
---|
2 |
4 |
3 |
Outputs: 4
See more details here.
Supported in: Streaming
This expression computes a max row according to the max column expression after applying the provided filter specification. If there is no maximum row, null will be returned.
Expression categories: Aggregate
Output type: AnyType
Argument values:
salary
salary
salary
,Given input table:
dep_name | salary |
---|---|
develop | 9900 |
develop | 4000 |
develop | 3000 |
Outputs: 4000
See more details here.
Supported in: Batch, Streaming
Calculate mean of values in column.
Expression categories: Numeric
Output type: Decimal | Double
Argument values:
values
Given input table:
values |
---|
2 |
4 |
3 |
Outputs: 3.0
See more details here.
Supported in: Batch, Streaming
Calculate minimum value in column.
Expression categories: Numeric
Output type: ComparableType
Argument values:
values
Given input table:
values |
---|
2 |
4 |
3 |
Outputs: 2
See more details here.
Supported in: Streaming
This expression computes a min row according to the min column expression after applying the provided filter specification. If there is no minimum row, null will be returned.
Expression categories: Aggregate
Output type: AnyType
Argument values:
salary
salary
salary
,Given input table:
dep_name | salary |
---|---|
develop | -999 |
develop | 4000 |
develop | 3000 |
Outputs: 3000
See more details here.
Supported in: Batch
Calculate mode of values in column.
Expression categories: Aggregate
Type variable bounds: String accepts String
Output type: String
Argument values:
values
Given input table:
values |
---|
a |
b |
b |
b |
c |
c |
d |
Outputs: b
See more details here.
Supported in: Batch
Returns the percentile of rows within a window partition. A draw is assigned the same percent.
Expression categories: Aggregate
Output type: Double
See more details here.
Supported in: Streaming
Apply an aggregate expression in a pivot context. The aggregation will run as a set of separate aggregations scoped to each distinct value of the pivot expression. The output is a map from pivot value to aggregate expression value.
Expression categories: Aggregate
Type variable bounds: K accepts ComparableType**V accepts AnyType
Output type: Map<K, V>
Argument values:
value
,pivot
Given input table:
pivot | value |
---|---|
a | 1 |
b | 2 |
a | 3 |
Outputs: {
a -> 4,
b -> 2,
}
See more details here.
Supported in: Batch
Calculates the product of all input columns.
Expression categories: Numeric
Output type: Double
Argument values:
factor
Given input table:
factor |
---|
2 |
4 |
3 |
Outputs: 24.0
See more details here.
Supported in: Batch
Returns the rank of rows within a window partition. In case of ties the rows get same rank. The difference between rank and dense_rank is that rank leaves gaps in ranking sequence when there are ties.
Expression categories: Aggregate
Output type: Integer
See more details here.
Supported in: Batch, Streaming
Counts the number of non null rows in a group.
Expression categories: Aggregate
Output type: Long
Argument values:
values
Given input table:
values |
---|
2 |
4 |
3 |
Outputs: 3
See more details here.
Supported in: Batch, Streaming
Returns a sequential number starting at 1 inside each partition.
Expression categories: Aggregate
Output type: Integer
See more details here.
Supported in: Batch, Streaming
Calculate the sample covariance of values in two columns.
Expression categories: Aggregate
Output type: Double
Argument values:
left
right
Given input table:
left | right |
---|---|
1 | 5 |
2 | 4 |
3 | 3 |
4 | 2 |
5 | 1 |
Outputs: -2.5
See more details here.
Supported in: Batch, Streaming
Calculate the sample variance of values in column.
Expression categories: Aggregate
Output type: Double
Argument values:
values
Given input table:
values |
---|
2 |
2 |
3 |
Outputs: 0.33333333333
See more details here.
Supported in: Batch
Calculate standard deviation of the values in column.
Expression categories: Numeric
Output type: Double
Argument values:
values
Given input table:
values |
---|
2 |
4 |
3 |
Outputs: 0.81649658092773
See more details here.
Supported in: Batch, Streaming
Sums the specified expression.
Expression categories: Numeric
Output type: Decimal | Double | Long
Argument values:
values
Given input table:
values |
---|
2 |
4 |
3 |
Outputs: 9
See more details here.
Supported in: Batch, Streaming
Calculate population variance of values in column.
Expression categories: Aggregate
Output type: Double
Argument values:
values
Given input table:
values |
---|
2 |
4 |
3 |
Outputs: 0.66666666667
See more details here.
Supported in: Batch, Streaming
Explode array into a row per value.
Expression categories: Array
Type variable bounds: T accepts AnyType
Output type: T
See more details here.
Supported in: Batch, Streaming
Explode array into a row per value as a struct containing the element's relative position in the array and the element itself.
Expression categories: Array
Type variable bounds: T accepts AnyType
Output type: Struct<Optional[position], Optional[element]>
See more details here.
Supported in: Batch, Streaming
Explode map into a row per key, value pair.
Expression categories: Map
Type variable bounds: TKey accepts AnyType**TValue accepts AnyType
Output type: Struct<Optional[key], Optional[value]>
See more details here.
Supported in: Batch
Performs the specified aggregations on the input dataset grouped by a set of columns.
Transform categories: Aggregate, Popular
Argument values:
factor
,tail_number
]Input:
tail_number | airline | miles | factor |
---|---|---|---|
XB-123 | foundry air | 124 | 2 |
MT-222 | new airline | 1123 | 5 |
XB-123 | foundry airline | 335 | 5 |
MT-222 | new air | 565 | 4 |
KK-452 | new air | 222 | 1 |
XB-123 | foundry airline | 1134 | 3 |
Output:
tail_number | factor |
---|---|
XB-123 | 10 |
MT-222 | 9 |
KK-452 | 1 |
See more details here.
Supported in: Batch
Aggregate expressions based on a condition statement.
Transform categories: Aggregate
See more details here.
Supported in: Streaming
Performs the specified aggregations on the data within a window, emitting outputs as specified by the provided trigger.
Transform categories: Aggregate
See more details here.
Supported in: Batch
Anti joins left and right dataset inputs together, removing all rows that match the provided condition.
Transform categories: Join
Argument values:
tail_number
,tail_number
,Inputs: ri.foundry.main.dataset.left
tail_number | airline | miles | factor |
---|---|---|---|
XB-123 | foundry air | 124 | 2 |
MT-222 | new airline | 1123 | 5 |
XB-123 | foundry airline | 335 | 5 |
MT-222 | new air | 565 | 4 |
KK-452 | new air | 222 | 1 |
PA-452 | new air | 212 | 2 |
XB-123 | foundry airline | 1134 | 2 |
ri.foundry.main.dataset.right
tail_number | home_airport |
---|---|
XB-123 | LHR |
MT-222 | CPH |
KK-452 | JFK |
JR-201 | IAD |
Output:
tail_number | airline |
---|---|
PA-452 | new air |
See more details here.
Supported in: Batch, Streaming
Transforms input dataset by applying a single expression.
Transform categories: Other
Argument values:
miles
,mile
,kilometer
,Input:
airline | miles |
---|---|
foundry airways | 2500 |
new air | 3000 |
Output:
kilometers | airline | miles |
---|---|---|
4023.36 | foundry airways | 2500 |
4828.03 | new air | 3000 |
See more details here.
Supported in: Batch
Extracts elements from an array into columns.
Transform categories: Array
Argument values:
stats
Input:
stats |
---|
[ 1000, 2 ] |
Output:
miles | id | stats |
---|---|---|
1000 | 2 | [ 1000, 2 ] |
See more details here.
Supported in: Streaming
Assigns timestamps and watermarks to the input, filtering out records where the timestamp is null.
Transform categories: Other
Argument values:
timestamp
Input:
timestamp | temperature | sensor_id |
---|---|---|
1969-12-31T23:59:50Z | 28 | sensor_1 |
1969-12-31T23:59:40Z | 30 | sensor_2 |
1969-12-31T23:59:35Z | 29 | sensor_1 |
Output:
timestamp | temperature | sensor_id |
---|---|---|
1969-12-31T23:59:50Z | 28 | sensor_1 |
1969-12-31T23:59:40Z | 30 | sensor_2 |
1969-12-31T23:59:35Z | 29 | sensor_1 |
See more details here.
Supported in: Batch
Operation to reduce the number of partitions. If say you have 1000 partitions andyou coalesce to 100 there will not be a shuffle, instead each of the 100 new partitions will claim 10 of the current partitions. If a larger number of partitions is requested, it will stay at the current number of partitions.
Transform categories: Other
See more details here.
Supported in: Batch
Computes the expression for new rows, the value for a given key will only ever be computed once, even across builds.
Transform categories: Other
See more details here.
Supported in: Batch
Produces a dataset containing media references and basic metadata for media items in a media set. Use this transform first to apply other media transforms.
Transform categories: File, Media
See more details here.
Supported in: Batch
Cross joins left and right dataset inputs together, matching all rows from each side against all rows from the other. The output is the cartesian product of the two datasets.
Transform categories: Join
Argument values:
Inputs: ri.foundry.main.dataset.left
tail_number | airline | miles | factor |
---|---|---|---|
XB-123 | foundry air | 124 | 2 |
MT-222 | new airline | 1123 | 5 |
PA-452 | new air | 212 | 2 |
ri.foundry.main.dataset.right
tail_number | home_airport |
---|---|
XB-123 | LHR |
MT-222 | CPH |
KK-452 | JFK |
JR-201 | IAD |
Output:
tail_number | airline | home_airport |
---|---|---|
XB-123 | foundry air | LHR |
XB-123 | foundry air | CPH |
XB-123 | foundry air | JFK |
XB-123 | foundry air | IAD |
MT-222 | new airline | LHR |
MT-222 | new airline | CPH |
MT-222 | new airline | JFK |
MT-222 | new airline | IAD |
PA-452 | new air | LHR |
PA-452 | new air | CPH |
PA-452 | new air | JFK |
PA-452 | new air | IAD |
See more details here.
Supported in: Batch
Computes the distribution of dates/timestamps in a specified column.
Transform categories: Datetime
See more details here.
Supported in: Batch, Streaming
Transforms input dataset by dropping the specified columns.
Transform categories: Popular
Argument values:
miles
}Input:
airline | miles | airports |
---|---|---|
foundry airways | 3000 | [ JFK, SFO ] |
Output:
airline | airports |
---|---|
foundry airways | [ JFK, SFO ] |
See more details here.
Supported in: Batch
Drops duplicate rows from the input.
Transform categories: Other
Argument values:
tail_number
}Input:
tail_number | airline | miles | factor |
---|---|---|---|
XB-123 | foundry air | 124 | 2 |
MT-222 | new airline | 1123 | 5 |
XB-123 | foundry airline | 335 | 5 |
MT-222 | new air | 565 | 4 |
KK-452 | new air | 222 | 1 |
XB-123 | foundry airline | 1134 | 3 |
Output:
tail_number | airline | miles | factor |
---|---|---|---|
XB-123 | foundry air | 124 | 2 |
MT-222 | new airline | 1123 | 5 |
KK-452 | new air | 222 | 1 |
See more details here.
Supported in: Batch
Creates an empty file.
Transform categories: Other
See more details here.
Supported in: Batch, Streaming
Creates an empty media set file with the given schema and snapshot read mode.
Transform categories: Other
See more details here.
Supported in: Batch, Streaming
Creates an empty table with the given schema and read mode.
Transform categories: Other
Argument values:
Inputs: Output:
flight_code | flight_number | airline |
---|
See more details here.
Supported in: Batch
Reads file metadata as rows from a dataset of files.
Transform categories: File
See more details here.
Supported in: Batch
Extracts many fields from a struct. Original struct will be dropped.
Transform categories: Struct
Argument values:
raw
Input:
raw |
---|
{ airline: { id: NA, name: new air, }, tail_no: NA-123, } |
{ airline: { id: FA, name: foundry airways, }, tail_no: FA-123, } |
Output:
airline | tail_number |
---|---|
new air | NA-123 |
foundry airways | FA-123 |
See more details here.
Supported in: Batch
Reads a dataset of files and parses each CSV file into rows.
Transform categories: File
See more details here.
Supported in: Batch
Reads a dataset of files and parses each GeoJSON file into rows. The output dataset will have a geometry column, and a column for each property listed by the user, apart from the _error and _file columns. If the user provides no properties to extract, the entire properties struct will be extracted into a properties column as a string. All GeoJSONs in the files must either be: a) multiline FeatureCollection: an entire file with one GeoJSON of type FeatureCollection b) single-line Feature: a file where every line is a fully valid GeoJSON of type Feature.
Transform categories: File, Geospatial
See more details here.
Supported in: Batch
Reads a dataset of files and parses each JSON file into rows.
Transform categories: File, String, Struct
See more details here.
Supported in: Batch
Reads a dataset of email files and parses each file into a row. Supported file extensions: .eml, .emltpl, and .msg.
Transform categories: File, Media
See more details here.
Supported in: Batch
Reads a dataset of text files and parses each file into a row.
Transform categories: File, String
See more details here.
Supported in: Batch
Reads a dataset of files and parses each XML file into rows.
Transform categories: File
See more details here.
Supported in: Batch
Reads a dataset of files and parses each shapefile into rows. All files except .shp, .shx and .dbf files will be ignored. This shapefile parser only supports point, polyline, polygon and multipoint geometry types. The output dataset will have a geometry column, and a column for each property listed by the user, apart from the _error and _file columns. If the user provides no properties to extract, the entire properties struct will be extracted into a properties column as a string.
Transform categories: File, Geospatial
See more details here.
Supported in: Batch, Streaming
Filters the input dataset based on the specified filter condition.
Transform categories: Popular
Argument values:
recently_serviced
Input:
recently_serviced | tail_number |
---|---|
true | KK-150 |
false | XB-120 |
true | MT-190 |
Output:
recently_serviced | tail_number |
---|---|
true | KK-150 |
true | MT-190 |
See more details here.
Supported in: Batch
Unions a set of datasets together on columns from the first dataset, adding nulls when columns are missing. Columns that are not present in the first dataset are removed.
Transform categories: Join
Argument values:
Inputs: ri.foundry.main.dataset.a
recently_serviced | tail_number | airline_code |
---|---|---|
true | KK-150 | KK |
false | XB-120 | XB |
true | MT-190 | MT |
ri.foundry.main.dataset.b
recently_serviced | tail_number | home_country |
---|---|---|
true | AA-200 | US |
true | BN-435 | UK |
true | BN-111 | UK |
Output:
recently_serviced | tail_number | airline_code |
---|---|---|
true | KK-150 | KK |
false | XB-120 | XB |
true | MT-190 | MT |
true | AA-200 | null |
true | BN-435 | null |
true | BN-111 | null |
See more details here.
Supported in: Batch, Streaming
Take all fields in a struct and turn them into columns in the output dataset.
Transform categories: Struct
Argument values:
raw
Input:
raw |
---|
{ airline: { id: NA, name: new air, }, tail_no: NA-123, } |
{ airline: { id: FA, name: foundry airways, }, tail_no: FA-123, } |
Output:
new_airline_name | new_airline_id | new_tail_no | raw |
---|---|---|---|
new air | NA | NA-123 | { airline: { id: NA, name: new air, }, tail_no: NA-123, } |
foundry airways | FA | FA-123 | { airline: { id: FA, name: foundry airways, }, tail_no: FA-123, } |
See more details here.
Supported in: Batch
Frequent pattern (fp) growth finds frequent patterns in your dataset.
Transform categories: Aggregate, Other
Argument values:
customer_attributes
Input:
customer_attributes |
---|
[ age_group: 20-30, country: Germany, gender: Female ] |
[ age_group: 20-30, country: Germany, gender: Male ] |
Output:
pattern | pattern_occurrence | total_count |
---|---|---|
[ country: Germany, age_group: 20-30 ] | 2 | 2 |
[ age_group: 20-30 ] | 2 | 2 |
[ country: Germany ] | 2 | 2 |
See more details here.
Supported in: Batch
Inner joins left and right datasets together based on the distance between input geometries. Internally converts geometries into the given projected coordinate reference system prior to the join and back to WGS84.
Transform categories: Geospatial, Join
Argument values:
geometryColLhs
, geometryCol
)Inputs: ri.foundry.main.dataset.left
geometryColLhs | lhs-1 |
---|---|
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"} | 42.0 |
{"coordinates": [55.0, 5.0], "type":"Point"} | 43.0 |
{"coordinates": [[25.0, 0.0], [0.0, 25.0]], "type":"LineString"} | 44.0 |
ri.foundry.main.dataset.right
geometryCol | col1 | arrayCol |
---|---|---|
{"coordinates": [[[20.0, 10.0], [27.0, 10.0], [27.0, 17.0], [20.0, 17.0], [20.0, 10.0]]], "type": "Polygon"} | rhsVal1 | [ 0.0, 1.0 ] |
{"coordinates": [[[21.0, 21.0], [27.0, 21.0], [27.0, 27.0], [21.0, 27.0], [21.0, 21.0]]], "type": "Polygon"} | rhsVal2 | [ 0.0, 1.0 ] |
Output:
geometryColLhs | lhs-1 | rhs_geometryCol | rhs_arrayCol |
---|---|---|---|
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"} | 42.0 | {"coordinates": [[[20.0, 10.0], [27.0, 10.0], [27.0, 17.0], [20.0, 17.0], [20.0, 10.0]]], "type": "Polygon"} | [ 0.0, 1.0 ] |
{"coordinates": [[25.0, 0.0], [0.0, 25.0]], "type":"LineString"} | 44.0 | {"coordinates": [[[20.0, 10.0], [27.0, 10.0], [27.0, 17.0], [20.0, 17.0], [20.0, 10.0]]], "type": "Polygon"} | [ 0.0, 1.0 ] |
See more details here.
Supported in: Batch
Left joins datasets together if the distance between input geometries is less than or equal to the specified distance. Internally converts geometries into the given projected coordinate reference system prior to the join and back to WGS84.
Transform categories: Geospatial, Join
Argument values:
geometryColLhs
, geometryColRhs
)Inputs: ri.foundry.main.dataset.left
geometryColLhs | lhs-1 |
---|---|
{"coordinates": [-112.14843750000001,33.440609443703586], "type":"Point"} | 42.0 |
null | 43.0 |
ri.foundry.main.dataset.right
geometryColRhs | rhs-1 |
---|---|
{"coordinates": [-112.14560508728029,33.44082430962016], "type":"Point"} | rhsVal1 |
{"coordinates": [-112.11796760559083,33.440895931474124], "type":"Point"} | rhsVal2 |
Output:
geometryColLhs | lhs-1 | geometryColRhs | rhs-1 |
---|---|---|---|
{"coordinates": [-112.14843750000001,33.440609443703586], "type":"Point"} | 42.0 | {"coordinates": [-112.14560508728029,33.44082430962016], "type":"Point"} | rhsVal1 |
null | 43.0 | null | null |
See more details here.
Supported in: Batch, Streaming
Inner joins left and right datasets together based on whether input geometries overlap. Includes just touching geometries in the results.
Transform categories: Geospatial, Join
Argument values:
geometryColLhs
, geometryColRhs
)Inputs: ri.foundry.main.dataset.left
geometryColLhs | col1Lhs |
---|---|
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"} | 42.0 |
ri.foundry.main.dataset.right
geometryColRhs | col1Rhs |
---|---|
{"coordinates": [[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], "type": "Polygon"} | rhsVal1 |
{"coordinates": [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]], "type": "Polygon"} | rhsVal2 |
{"coordinates": [0.0, 0.0], "type":"Point"} | rhsVal3 |
{"coordinates": [15.0, 15.0], "type":"Point"} | rhsVal4 |
{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"} | rhsVal5 |
{"coordinates": [[20.0, 20.0], [21.0, 23.0]], "type":"LineString"} | rhsVal6 |
{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"} | rhsVal7 |
{"coordinates": [[20.0, 20.0], [21.0, 23.0]], "type":"LineString"} | rhsVal8 |
{"coordinates": [[[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"} | rhsVal9 |
{"coordinates": [[[[170.0, 170.0], [190.0, 170.0], [190.0, 190.0], [170.0, 190.0], [170.0, 170.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"} | rhsVal10 |
Output:
geometryColLhs | col1Lhs | geometryColRhs | col1Rhs |
---|---|---|---|
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"} | 42.0 | {"coordinates": [[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], "type": "Polygon"} | rhsVal1 |
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"} | 42.0 | {"coordinates": [0.0, 0.0], "type":"Point"} | rhsVal3 |
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"} | 42.0 | {"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"} | rhsVal5 |
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"} | 42.0 | {"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"} | rhsVal7 |
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"} | 42.0 | {"coordinates": [[[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"} | rhsVal9 |
See more details here.
Supported in: Batch
Left joins input datasets based on whether input geometries overlap. Includes just touching geometries in the results.
Transform categories: Geospatial, Join
Argument values:
geometryColLhs
, geometryColRhs
)Inputs: ri.foundry.main.dataset.left
geometryColLhs | col1Lhs |
---|---|
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"} | 42.0 |
{"coordinates": [55.0, 5.0], "type":"Point"} | 43.0 |
ri.foundry.main.dataset.right
geometryColRhs | col1Rhs |
---|---|
{"coordinates": [[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], "type": "Polygon"} | rhsVal1 |
{"coordinates": [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]], "type": "Polygon"} | rhsVal2 |
{"coordinates": [0.0, 0.0], "type":"Point"} | rhsVal3 |
{"coordinates": [15.0, 15.0], "type":"Point"} | rhsVal4 |
{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"} | rhsVal5 |
{"coordinates": [[20.0, 20.0], [21.0, 23.0]], "type":"LineString"} | rhsVal6 |
{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"} | rhsVal7 |
{"coordinates": [[20.0, 20.0], [21.0, 23.0]], "type":"LineString"} | rhsVal8 |
{"coordinates": [[[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"} | rhsVal9 |
{"coordinates": [[[[170.0, 170.0], [190.0, 170.0], [190.0, 190.0], [170.0, 190.0], [170.0, 170.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"} | rhsVal10 |
Output:
geometryColLhs | col1Lhs | geometryColRhs | col1Rhs |
---|---|---|---|
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"} | 42.0 | {"coordinates": [[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], "type": "Polygon"} | rhsVal1 |
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"} | 42.0 | {"coordinates": [0.0, 0.0], "type":"Point"} | rhsVal3 |
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"} | 42.0 | {"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"} | rhsVal5 |
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"} | 42.0 | {"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"} | rhsVal7 |
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"} | 42.0 | {"coordinates": [[[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"} | rhsVal9 |
{"coordinates": [55.0, 5.0], "type":"Point"} | 43.0 | null | null |
See more details here.
Supported in: Batch
Inner joins left and right datasets together based on the distance between point geometries. The geometries must represent points, and may optionally include a z-coordinate. Internally converts geometries into the given projected coordinate reference system prior to the join and back to WGS84. Non-point geometries are ignored, and the entire right dataset must be able to fit into driver and executor memory. A 3 gb executor should be able to handle up to 4 million points in the neighbors dataset.
Transform categories: Geospatial, Join
Argument values:
geometryColLhs
, geometryCol
)Inputs: ri.foundry.main.dataset.left
geometryColLhs | lhs-1 |
---|---|
{"coordinates": [0.0, 0.0, 0.0], "type":"Point"} | 42.0 |
{"coordinates": [0.0, 0.0, 5.0], "type":"Point"} | 43.0 |
{"coordinates": [0.0, 0.0], "type":"Point"} | 44.0 |
ri.foundry.main.dataset.right
geometryCol | col1 | arrayCol |
---|---|---|
{"coordinates": [0.0, 0.0, 2.0], "type":"Point"} | rhsVal1 | [ 0.0, 1.0 ] |
{"coordinates": [0.0, 1.0], "type":"Point"} | rhsVal2 | [ 0.0, 1.0 ] |
Output:
geometryColLhs | lhs-1 | rhs_geometryCol | rhs_arrayCol |
---|---|---|---|
{"coordinates": [0.0, 0.0, 0.0], "type":"Point"} | 42.0 | {"coordinates": [0.0, 0.0, 2.0], "type":"Point"} | [ 0.0, 1.0 ] |
{"coordinates": [0.0, 0.0, 0.0], "type":"Point"} | 42.0 | {"coordinates": [0.0, 1.0], "type":"Point"} | [ 0.0, 1.0 ] |
{"coordinates": [0.0, 0.0, 5.0], "type":"Point"} | 43.0 | {"coordinates": [0.0, 0.0, 2.0], "type":"Point"} | [ 0.0, 1.0 ] |
{"coordinates": [0.0, 0.0, 5.0], "type":"Point"} | 43.0 | {"coordinates": [0.0, 1.0], "type":"Point"} | [ 0.0, 1.0 ] |
{"coordinates": [0.0, 0.0], "type":"Point"} | 44.0 | {"coordinates": [0.0, 0.0, 2.0], "type":"Point"} | [ 0.0, 1.0 ] |
{"coordinates": [0.0, 0.0], "type":"Point"} | 44.0 | {"coordinates": [0.0, 1.0], "type":"Point"} | [ 0.0, 1.0 ] |
See more details here.
Supported in: Batch
Inner joins left and right datasets together based on whether input geometries overlap. Returns a row containing all of the columns from both datasets if the join key column pair has geometries which intersect. Currently does not support joining on multiple join keys. Silently filters null join key geometry values. Left and right datasets must not have the same column names. Silently nullifies invalid GeoJSON in join columns.
Transform categories: Geospatial, Join
Argument values:
geometryColLhs
, geometryColRhs
)]Inputs: ri.foundry.main.dataset.left
geometryColLhs | lhs-1 |
---|---|
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"} | 42.0 |
ri.foundry.main.dataset.right
geometryColRhs | rhs-1 |
---|---|
{"coordinates": [[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], "type": "Polygon"} | rhsVal1 |
{"coordinates": [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]], "type": "Polygon"} | rhsVal2 |
{"coordinates": [0.0, 0.0], "type":"Point"} | rhsVal3 |
{"coordinates": [15.0, 15.0], "type":"Point"} | rhsVal4 |
{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"} | rhsVal5 |
{"coordinates": [[20.0, 20.0], [21.0, 23.0]], "type":"LineString"} | rhsVal6 |
{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"} | rhsVal7 |
{"coordinates": [[20.0, 20.0], [21.0, 23.0]], "type":"LineString"} | rhsVal8 |
{"coordinates": [[[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"} | rhsVal9 |
{"coordinates": [[[[170.0, 170.0], [190.0, 170.0], [190.0, 190.0], [170.0, 190.0], [170.0, 170.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"} | rhsVal10 |
Output:
geometryColLhs | lhs-1 | geometryColRhs | rhs-1 |
---|---|---|---|
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"} | 42.0 | {"coordinates": [[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], "type": "Polygon"} | rhsVal1 |
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"} | 42.0 | {"coordinates": [0.0, 0.0], "type":"Point"} | rhsVal3 |
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"} | 42.0 | {"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"} | rhsVal5 |
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"} | 42.0 | {"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"} | rhsVal7 |
{"coordinates": [[[0.0, 0.0], [10.0, 0.0], [10.0, 10.0], [0.0, 10.0], [0.0, 0.0]]], "type": "Polygon"} | 42.0 | {"coordinates": [[[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"} | rhsVal9 |
See more details here.
Supported in: Batch
Selects the k closest points from the neighbors dataset for each valid input geometry from the base dataset. Internally converts the input datasets to the given coordinate reference system, and back to WGS84. The entire neighbors dataset must be able to fit into driver and executor memory. A 3 gb executor should be able to handle up to 1 million points in the neighbors dataset.
Transform categories: Geospatial, Join
Argument values:
geometryCol
, geometryCol
)Inputs: ri.foundry.main.dataset.left
geometryCol | lhsCol |
---|---|
{"coordinates": [-112.14843750000001,33.440609443703586], "type":"Point"} | 42.0 |
ri.foundry.main.dataset.right
geometryCol | col |
---|---|
{ latitude: 33.440609443703586, longitude: -112.14843750000001, } | rhsVal1 |
{ latitude: 33.44082430962016, longitude: -112.14560508728029, } | rhsVal2 |
{ latitude: 33.440895931474124, longitude: -112.11796760559083, } | rhsVal3 |
Output:
geometryCol | lhsCol | rhs_geometryCol | rhs_col |
---|---|---|---|
{"coordinates": [-112.14843750000001,33.440609443703586], "type":"Point"} | 42.0 | { latitude: 33.440609443703586, longitude: -112.14843750000001, } | rhsVal1 |
{"coordinates": [-112.14843750000001,33.440609443703586], "type":"Point"} | 42.0 | { latitude: 33.44082430962016, longitude: -112.14560508728029, } | rhsVal2 |
See more details here.
Supported in: Batch
Selects the k closest points from the neighbors dataset for each valid input geometry from the base dataset. Internally converts the input datasets to the given coordinate reference system, and back to WGS84. The entire neighbors dataset must be able to fit into driver and executor memory. A 3 gb executor should be able to handle up to 1 million points in the neighbors dataset.
Transform categories: Geospatial, Join
Argument values:
geometryCol
, geometryCol
)Inputs: ri.foundry.main.dataset.left
geometryCol | lhsCol |
---|---|
{"coordinates": [-112.14843750000001,33.440609443703586], "type":"Point"} | 42.0 |
ri.foundry.main.dataset.right
geometryCol | col |
---|---|
{ latitude: 33.440609443703586, longitude: -112.14843750000001, } | rhsVal1 |
{ latitude: 33.44082430962016, longitude: -112.14560508728029, } | rhsVal2 |
{ latitude: 33.440895931474124, longitude: -112.11796760559083, } | rhsVal3 |
Output:
geometryCol | lhsCol | rhs_geometryCol | rhs_col |
---|---|---|---|
{"coordinates": [-112.14843750000001,33.440609443703586], "type":"Point"} | 42.0 | { latitude: 33.440609443703586, longitude: -112.14843750000001, } | rhsVal1 |
{"coordinates": [-112.14843750000001,33.440609443703586], "type":"Point"} | 42.0 | { latitude: 33.44082430962016, longitude: -112.14560508728029, } | rhsVal2 |
See more details here.
Supported in: Batch
Produces a dataset containing media references and basic metadata for files in a dataset.
Transform categories: File
See more details here.
Supported in: Streaming
Detects when a record hasn't been seen for a configurable amount of time for a set of keys.
Transform categories: Other
See more details here.
Supported in: Batch
Joins two datasets together, keeping only rows that satisfy the provided condition from each table.
Transform categories: Join
Argument values:
tail_number
,tail_number
,Inputs: ri.foundry.main.dataset.left
tail_number | airline | miles | factor |
---|---|---|---|
XB-123 | foundry air | 124 | 2 |
MT-222 | new airline | 1123 | 5 |
XB-123 | foundry airline | 335 | 5 |
MT-222 | new air | 565 | 4 |
KK-452 | new air | 222 | 1 |
PA-452 | new air | 212 | 2 |
XB-123 | foundry airline | 1134 | 2 |
ri.foundry.main.dataset.right
tail_number | home_airport |
---|---|
XB-123 | LHR |
MT-222 | CPH |
KK-452 | JFK |
JR-201 | IAD |
Output:
tail_number | airline | home_airport |
---|---|---|
XB-123 | foundry air | LHR |
MT-222 | new airline | CPH |
XB-123 | foundry airline | LHR |
MT-222 | new air | CPH |
KK-452 | new air | JFK |
XB-123 | foundry airline | LHR |
See more details here.
Supported in: Batch, Streaming
Joins left and right dataset inputs together.
Transform categories: Join
See more details here.
Supported in: Batch
K-means clustering is an unsupervised machine learning algorithm. It groups dataset vectors into k clusters. The k value is determined by computing the best silhouette score of the specified range between minimum k and maximum k. Number of k values defines how many k values should be tried within this range, inclusive of the boundaries.
Transform categories: Other
Argument values:
feature_column
Input:
feature_column |
---|
[ 0.05, 3.1, 2.3 ] |
[ 1.0, 3.1, 2.3 ] |
[ 1.0, 3.5, 2.3 ] |
[ 19.0, 12.3, -1.4 ] |
Output:
feature_column | cluster_id |
---|---|
[ 1.0, 3.1, 2.3 ] | 0 |
[ 1.0, 3.5, 2.3 ] | 0 |
[ 19.0, 12.3, -1.4 ] | 1 |
[ 0.05, 3.1, 2.3 ] | 2 |
See more details here.
Supported in: Batch
Return the K nearest rows from the right dataset for each row in the left dataset, based on the distance measure.
Transform categories: Join
Argument values:
airline
,fuzzy_airline
,Inputs: ri.foundry.main.dataset.left
tail_number | airline | miles | factor |
---|---|---|---|
XB-123 | foundry air | 124 | 2 |
MT-222 | new airline | 1123 | 5 |
PA-452 | new air | 212 | 2 |
ri.foundry.main.dataset.right
fuzzy_airline | home_airport |
---|---|
air | LHR |
new airline | CPH |
new plane | JFK |
old air | IAD |
Output:
rank | distance | tail_number | airline | fuzzy_airline | home_airport |
---|---|---|---|---|---|
1 | 3 | PA-452 | new air | old air | IAD |
2 | 4 | PA-452 | new air | air | LHR |
2 | 4 | PA-452 | new air | new airline | CPH |
2 | 4 | PA-452 | new air | new plane | JFK |
1 | 0 | MT-222 | new airline | new airline | CPH |
2 | 4 | MT-222 | new airline | new plane | JFK |
1 | 5 | XB-123 | foundry air | old air | IAD |
2 | 8 | XB-123 | foundry air | air | LHR |
See more details here.
Supported in: Batch
Keep duplicate rows from the input.
Transform categories: Other
Argument values:
tail_number
}Input:
tail_number | airline | miles | factor |
---|---|---|---|
XB-123 | foundry air | 124 | 2 |
MT-222 | new airline | 1123 | 5 |
XB-123 | foundry airline | 335 | 5 |
MT-222 | new air | 565 | 4 |
KK-452 | new air | 222 | 1 |
XB-123 | foundry airline | 1134 | 3 |
Output:
tail_number | airline | miles | factor |
---|---|---|---|
XB-123 | foundry air | 124 | 2 |
MT-222 | new airline | 1123 | 5 |
XB-123 | foundry airline | 335 | 5 |
MT-222 | new air | 565 | 4 |
XB-123 | foundry airline | 1134 | 3 |
See more details here.
Supported in: Streaming
Keys the input by the provided key by columns. Note that this does not re-sort the data and only maintains per key ordering from the point the keys are set. Re-keying data may be unsafe in that if the newly keyed data was depending on any specific ordering then we can't guarantee that ordering if it wasn't already maintained by the previous keying. Additionally sets the primary key if cdc (change data capture) mode is enabled. Primary key defines columns that indicate which rows are updates, deletes, and the ordering of when read as a current view.
Transform categories: Other
See more details here.
Supported in: Batch
Joins two datasets together, keeping all rows from the left table and only rows which satisfy the provided condition from the right table.
Transform categories: Join
Argument values:
tail_number
,tail_number
,Inputs: ri.foundry.main.dataset.left
tail_number | airline | miles | factor |
---|---|---|---|
XB-123 | foundry air | 124 | 2 |
MT-222 | new airline | 1123 | 5 |
XB-123 | foundry airline | 335 | 5 |
MT-222 | new air | 565 | 4 |
KK-452 | new air | 222 | 1 |
PA-452 | new air | 212 | 2 |
XB-123 | foundry airline | 1134 | 2 |
ri.foundry.main.dataset.right
tail_number | home_airport |
---|---|
XB-123 | LHR |
MT-222 | CPH |
KK-452 | JFK |
JR-201 | IAD |
Output:
tail_number | airline | home_airport |
---|---|---|
XB-123 | foundry air | LHR |
MT-222 | new airline | CPH |
XB-123 | foundry airline | LHR |
MT-222 | new air | CPH |
KK-452 | new air | JFK |
PA-452 | new air | null |
XB-123 | foundry airline | LHR |
See more details here.
Supported in: Streaming
Joins two datasets together, keeping all rows from the left table and only matching rows from the right table.
Transform categories: Join
Argument values:
tail_number
, tail_number
)]Inputs: ri.foundry.main.dataset.left
tail_number | airline | miles | factor |
---|---|---|---|
XB-123 | foundry air | 124 | 2 |
MT-222 | new airline | 1123 | 5 |
XB-123 | foundry airline | 335 | 5 |
MT-222 | new air | 565 | 4 |
KK-452 | new air | 222 | 1 |
PA-452 | new air | 212 | 2 |
XB-123 | foundry airline | 1134 | 2 |
ri.foundry.main.dataset.right
tail_number | home_airport |
---|---|
XB-123 | LHR |
MT-222 | CPH |
KK-452 | JFK |
JR-201 | IAD |
Output:
tail_number | airline | home_airport |
---|---|---|
XB-123 | foundry air | LHR |
MT-222 | new airline | CPH |
XB-123 | foundry airline | LHR |
MT-222 | new air | CPH |
KK-452 | new air | JFK |
PA-452 | new air | null |
XB-123 | foundry airline | LHR |
See more details here.
Supported in: Batch, Streaming
Uses manually entered table data to create an output.
Transform categories: Other
Argument values:
Inputs: Output:
flight_code | flight_number | airline |
---|---|---|
112 | XB-123 | foundry airlines |
533 | MT-444 | foundry airlines |
934 | KK-123 | new air |
See more details here.
Supported in: Batch
Replaces values from the target columns in the source dataset with values in the mapping dataset.
Transform categories: Join
Type variable bounds: T1 accepts AnyType**T2 accepts AnyType
Argument values:
flight_code
flight_no
, next_flight
]flight_number
Inputs: ri.foundry.main.dataset.input
flight_no | next_flight | departure_time |
---|---|---|
533 | 112 | 2022-01-20T10:45:00Z |
934 | 533 | 2022-01-20T11:20:00Z |
222 | 934 | 2022-01-20T11:20:00Z |
ri.foundry.main.dataset.mapping
flight_code | flight_number | airline |
---|---|---|
112 | XB-123 | foundry airlines |
533 | MT-444 | foundry airlines |
934 | KK-123 | new air |
Output:
flight_no | next_flight | departure_time |
---|---|---|
MT-444 | XB-123 | 2022-01-20T10:45:00Z |
KK-123 | MT-444 | 2022-01-20T11:20:00Z |
unknown | KK-123 | 2022-01-20T11:20:00Z |
See more details here.
Supported in: Batch
Unions a set of datasets together on the intersection of their column names, columns that are not present in all input datasets are removed.
Transform categories: Join
Argument values:
Inputs: ri.foundry.main.dataset.a
recently_serviced | tail_number |
---|---|
true | KK-150 |
false | XB-120 |
true | MT-190 |
ri.foundry.main.dataset.b
recently_serviced | tail_number | airline_code |
---|---|---|
true | AA-200 | AA |
true | BN-435 | BN |
true | BN-111 | BN |
Output:
recently_serviced | tail_number |
---|---|
true | KK-150 |
false | XB-120 |
true | MT-190 |
true | AA-200 |
true | BN-435 |
true | BN-111 |
See more details here.
Supported in: Batch, Streaming
Normalizes column names to use lower_snake_case.
Transform categories: Data preparation
Argument values:
Input:
recentlyServiced | tailNumber | _airlineCode |
---|---|---|
true | KK-150 | KK |
false | XB-120 | XB |
true | MT-190 | MT |
Output:
recently_serviced | tail_number | airline_code |
---|---|---|
true | KK-150 | KK |
false | XB-120 | XB |
true | MT-190 | MT |
See more details here.
Supported in: Batch
Computes the distribution of numeric values in a specified column.
Transform categories: Numeric
See more details here.
Supported in: Streaming
Rows from the left & right inputs which meet all of the match conditions and are within the caching window, along with unmatched rows from both inputs.
Transform categories: Join
See more details here.
Supported in: Streaming
Joins left and right dataset inputs together, caching the record with the highest event time from each side for use in subsequent joins. Processing time of a record is used as a tiebreaker. In the case of a time results are optimistically emitted if there's no value to join against.
Transform categories: Join
See more details here.
Supported in: Batch
Outer joins the provided dataset inputs together, keeping all rows from both datasets. Columns have nulls when there is no row satisfying the provided condition.
Transform categories: Join
Argument values:
tail_number
,tail_number
,Inputs: ri.foundry.main.dataset.left
tail_number | airline | miles | factor |
---|---|---|---|
XB-123 | foundry air | 124 | 2 |
MT-222 | new airline | 1123 | 5 |
XB-123 | foundry airline | 335 | 5 |
MT-222 | new air | 565 | 4 |
KK-452 | new air | 222 | 1 |
PA-452 | new air | 212 | 2 |
XB-123 | foundry airline | 1134 | 2 |
ri.foundry.main.dataset.right
tail_number | home_airport |
---|---|
XB-123 | LHR |
MT-222 | CPH |
KK-452 | JFK |
JR-201 | IAD |
Output:
tail_number | airline | home_airport |
---|---|---|
XB-123 | foundry air | LHR |
MT-222 | new airline | CPH |
XB-123 | foundry airline | LHR |
MT-222 | new air | CPH |
KK-452 | new air | JFK |
PA-452 | new air | null |
XB-123 | foundry airline | LHR |
JR-201 | null | IAD |
See more details here.
Supported in: Batch
Performs the specified aggregations on the input dataset grouped by a set of columns. Unique values to pivot on must be provided such that the output schema is known ahead of runtime. This improves runtime stability over time.
Transform categories: Aggregate, Popular
Type variable bounds: T accepts Boolean | Byte | Integer | Long | Short | String
Argument values:
miles
,airline
]airport
Input:
airline | airport | miles |
---|---|---|
foundry airways | JFK | 1002345 |
foundry airways | LHR | 2221324 |
new air | SFO | 21356673 |
new air | JFK | 12323456 |
foundry airways | LHR | 12542352 |
new air | JFK | 12232355 |
Output:
airline | new_york_miles | london_miles |
---|---|---|
foundry airways | 1002345.0 | 7381838.0 |
new air | 1.22779055E7 | null |
See more details here.
Supported in: Batch, Streaming
Transforms input dataset either by selecting columns or applying functions to columns.
Transform categories: Other
Argument values:
airlin
,Input:
airlin | miles |
---|---|
foundry airways | 2500 |
new air | 3000 |
Output:
airline |
---|
foundry airways |
new air |
See more details here.
Supported in: Batch, Streaming
Transforms input dataset either by selecting columns or applying functions to columns.
Transform categories: Popular
See more details here.
Supported in: Batch, Streaming
Performs the specified aggregations on the data within the window. Emits one row each time a new row is received.
Transform categories: Aggregate
See more details here.
Supported in: Batch, Streaming
Renames a set of columns.
Transform categories: Data preparation, Popular
Argument values:
recently_serviced
, does_not_require_service)]Input:
recently_serviced | tail_number | airline_code |
---|---|---|
true | KK-150 | KK |
false | XB-120 | XB |
true | MT-190 | MT |
Output:
does_not_require_service | tail_number | airline_code |
---|---|---|
true | KK-150 | KK |
false | XB-120 | XB |
true | MT-190 | MT |
See more details here.
Supported in: Batch
Forces a shuffle of the data based on optionally provided partitioning columns and a resulting number of partitions. If these are not provided, the partitioning will be determined automatically.
Transform categories: Other
See more details here.
Supported in: Batch
Performs the specified aggregations on the input dataset at different levels of granularity, providing both intermediate and super aggregates.
Transform categories: Aggregate
Argument values:
price
,city
, model
]Input:
city | model | price | store |
---|---|---|---|
London | new phone | 900.0 | MegaMart |
London | new phone | 850.75 | AA |
London | new phone | 870.75 | ABC Zone |
San Francisco | new phone | 1000.0 | Prescos |
San Francisco | new phone | 950.25 | XZY Force |
San Francisco | new phone | 1105.7 | Phone Mart |
London | forestX 20 | 750.1 | MegaMart |
London | forestX 20 | 690.0 | AA |
London | forestX 20 | 730.0 | ABC Zone |
San Francisco | forestX 20 | 890.4 | Prescos |
San Francisco | forestX 20 | 900.1 | XZY Force |
San Francisco | forestX 20 | 1050.75 | Phone Mart |
Output:
city | model | mean_price |
---|---|---|
London | new phone | 873.8333333333334 |
London | forestX 20 | 723.3666666666667 |
London | null | 798.6 |
San Francisco | new phone | 1018.65 |
San Francisco | forestX 20 | 947.0833333333334 |
San Francisco | null | 982.8666666666667 |
null | null | 890.7333333333335 |
See more details here.
Supported in: Batch
Estimates the size of a single row in the JVM.
Transform categories: Other
See more details here.
Supported in: Batch, Streaming
Selects a set of columns from the input dataset.
Transform categories: Popular
See more details here.
Supported in: Batch
Semi joins left and right dataset inputs together. This removes all rows that don't match the join condition.
Transform categories: Join
Argument values:
tail_number
,tail_number
,Inputs: ri.foundry.main.dataset.left
tail_number | airline | miles | factor |
---|---|---|---|
XB-123 | foundry air | 124 | 2 |
MT-222 | new airline | 1123 | 5 |
XB-123 | foundry airline | 335 | 5 |
MT-222 | new air | 565 | 4 |
KK-452 | new air | 222 | 1 |
PA-452 | new air | 212 | 2 |
XB-123 | foundry airline | 1134 | 2 |
ri.foundry.main.dataset.right
tail_number | home_airport |
---|---|
XB-123 | LHR |
MT-222 | CPH |
KK-452 | JFK |
JR-201 | IAD |
Output:
tail_number | airline | miles | factor |
---|---|---|---|
XB-123 | foundry air | 124 | 2 |
MT-222 | new airline | 1123 | 5 |
XB-123 | foundry airline | 335 | 5 |
MT-222 | new air | 565 | 4 |
KK-452 | new air | 222 | 1 |
XB-123 | foundry airline | 1134 | 2 |
See more details here.
Supported in: Batch
Transforms input dataset either by selecting columns or applying functions to columns.
Transform categories: Other
Argument values:
b
, DESCENDING
)]Input:
a | b |
---|---|
1 | 2 |
3 | 4 |
5 | 6 |
Output:
a | b |
---|---|
5 | 6 |
3 | 4 |
1 | 2 |
See more details here.
Supported in: Batch, Streaming
Insert a text description between your transformations. This does not transform the input data in any way.
Transform categories: Other
See more details here.
Supported in: Streaming
Drops duplicate rows from the input for given column subset, rows seen will expire after configured amount of event time. Row that arrive late by an amount greater than the configured amount of event time will always be dropped. Partitions by keys specified. Each drop duplicates will be computed separately for distinct key column values.
Transform categories: Other
See more details here.
Supported in: Streaming
Drops rows with the same values for all key columns that are out of order. A row is out of order if it would have come before an already received row with the same key values based on sort columns and directions. Two rows are compared by evaluating the first sort column and direction first, and then moving on to the next sort column and direction if and only if there was a tie, and so on until order is determined or all sort columns are tied in which case the rows are equal. The current maximum for each key is stored until no new rows have been seen for that key for an event time greater than or equal to the expiry. After a key has received no new rows for greater or equal to the expiry time, any new row for that key will be never be dropped, and will always be stored as the new current maximum.
Transform categories: Other
See more details here.
Supported in: Streaming
Emits rows by key in ascending event time order, allowing for late arriving records up until at least the allowed lateness. Records arriving after the allowed lateness plus some small buffer interval will be dropped.
Transform categories: Other
See more details here.
Supported in: Batch
Picks the top rows in each sorted partition.
Transform categories: Aggregate
Argument values:
airline
}airport
, DESCENDING
), (miles
, ASCENDING
)]Input:
airline | airport | miles |
---|---|---|
foundry airways | JFK | 1002345 |
foundry airways | LHR | 2221324 |
new air | SFO | 21356673 |
new air | JFK | 12323456 |
foundry airways | LHR | 12542352 |
new air | JFK | 12232355 |
Output:
airline | airport | miles |
---|---|---|
foundry airways | LHR | 2221324 |
new air | SFO | 21356673 |
See more details here.
Supported in: Batch, Streaming
Unions a set of datasets together on matching column names.
Transform categories: Join
Argument values:
Inputs: ri.foundry.main.dataset.a
recently_serviced | tail_number | airline_code |
---|---|---|
true | KK-150 | KK |
false | XB-120 | XB |
true | MT-190 | MT |
ri.foundry.main.dataset.b
recently_serviced | tail_number | airline_code |
---|---|---|
true | AA-200 | AA |
true | BN-435 | BN |
true | BN-111 | BN |
Output:
recently_serviced | tail_number | airline_code |
---|---|---|
true | KK-150 | KK |
false | XB-120 | XB |
true | MT-190 | MT |
true | AA-200 | AA |
true | BN-435 | BN |
true | BN-111 | BN |
See more details here.
Supported in: Batch, Streaming
Unpivot is the opposite operation of pivot. This converts multiple columns into rows, transforming data from a wide format to a long format. To do so it creates two new columns: one containing the original column names as values, and another containing the corresponding data values. All other columns that are not unpivoted are kept as is.
Transform categories: Aggregate, Popular
Type variable bounds: T accepts AnyType
Argument values:
new_york_miles
, london_miles
]Input:
airline | new_york_miles | london_miles |
---|---|---|
foundry airways | 1000 | 6000 |
new air | null | 8000 |
Output:
city | miles | airline |
---|---|---|
new_york_miles | 1000 | foundry airways |
london_miles | 6000 | foundry airways |
new_york_miles | null | new air |
london_miles | 8000 | new air |
See more details here.
Supported in: Batch, Streaming
Unions a set of datasets together on the superset of their column names, adding nulls when columns are missing.
Transform categories: Join
Argument values:
Inputs: ri.foundry.main.dataset.a
recently_serviced | tail_number |
---|---|
true | KK-150 |
false | XB-120 |
true | MT-190 |
ri.foundry.main.dataset.b
recently_serviced | tail_number | airline_code |
---|---|---|
true | AA-200 | AA |
true | BN-435 | BN |
true | BN-111 | BN |
Output:
recently_serviced | tail_number | airline_code |
---|---|---|
true | KK-150 | null |
false | XB-120 | null |
true | MT-190 | null |
true | AA-200 | AA |
true | BN-435 | BN |
true | BN-111 | BN |
See more details here.
Supported in: Batch
Performs the specified aggregations on the input dataset grouped by a set of columns.
Transform categories: Aggregate, Popular
See more details here.