For an orientation to the Expression board, see the guide on using the expression board. This document is a resource for using Contour's rich expression language and can be used as a reference for types, operations and functions.
You can use the following data types in expressions:
Strings are expressed in single quotes:
'This is a string'
This is a string
Escape single quotes within the string by adding a second single quote:
'I''m a string'
I’m a string
Concatenate columns as strings with the ||
operator:
"Gender" || "MaritalStatus"
You can also concatenate columns with constants.
'$' || ("salaryColumn"/1000.0) || 'k'
Integers are non-floating point numbers:
5
Doubles are floating point numbers:
5.7
Boolean variables can be either true
or false
. Boolean variables are case-insensitive.
You can cast strings in the format YYYY-MM or YYYY-MM-DD as dates:
CAST('2016-12-06' AS DATE)
You can also cast strings as timestamps:
CAST('2016-12-06` `01:12:34' AS TIMESTAMP)
When performing operations with timestamps, it is easiest to first cast the timestamp to a long, or to a double if you need millisecond precision. This gives you the number of seconds in UNIX time.
For example, say you have two timestamp columns: “start” and “end.” To determine the number of minutes between the two times, you can use the following expression:
(CAST("end" as LONG) - CAST("start" as LONG)) / 60
.
Casting longs to timestamps assumes that that long values are in seconds. If your data is in milliseconds, divide by 1000 prior to casting.
The following sections give an overview over the operations that you can use in expressions and their precedence:
Add, subtract, multiply, and divide numeric columns (integers or doubles) by other numeric columns or by constants: +
, -
, *
, /
"tipAmount" / "fare"
"diameter" * 3.14
Compare columns of any type against each other and against constants with >
(greater than), >=
(greater than or equal to), <
(less than), <=
(less than or equal to). Results in a column of booleans.
"age" > "averageAge"
"totalDistance" < 5
Check whether a row has the same value in two columns with ==
, or different values with !=
.
"cityOfBirth" == "cityOfResidence"
You can also compare column values with constants.
"Gender" == 'M'
When comparing two strings, lexicographical ordering is used. Note that this can be counterintuitive when comparing strings that contain numbers. For example:
'Alligator' < 'Boat'
=> TRUE because the encoding of 'A' is less than that of 'B''Alliance' < 'Alligator'
=> TRUE because 'a' is less than 'g''Zoo' < 'alpha'
=> TRUE because 'Z' is less than 'a''Zoo' < 'Zoologist'
=> TRUE because if one string is the prefix of the other, the shorter one is considered to be lesser'1' < '2'
=> TRUE because the string character '1' is less than the string character '2''10' < '2'
=> TRUE because its first string character '1' is less than the string character '2'; note that this is a comparison of the strings '10' and '2', not the integers 10 and 2'10' < '20'
=> TRUE because '1' is less than '2'In the Contour expression language, equality comparisons with NULL
have the below behavior.
NULL = 'DATA'
=> FALSENULL != 'DATA'
=> TRUENULL = NULL
=> TRUENULL != NULL
=> FALSEHowever, in Spark SQL, all of these expressions return NULL
. When translating between Contour expressions and Spark SQL, it is important to consider this difference in behavior in order to produce consistent results.
Derive a new column of booleans by evaluating one or more columns with boolean logic.
"Age" >=70 AND "Gender" = 'M'
Casting lets you change the type of a column or expression. You can cast to booleans, integers, doubles, dates, and timestamps.
CAST("startDate" AS DATE)
CAST("startTime" AS TIMESTAMP)
CAST("numParticipants" AS INTEGER)
Case statements let you evaluate multiple possibilities in one expression. Each statement is evaluated in order, and the THEN
statement is executed for the first one to evaluate to true.
Copied!1 2 3 4 5
CASE WHEN "age" < 13 THEN 'Child' WHEN "age" > 19 THEN 'Adult' ELSE 'Teenager' END
Copied!1 2 3 4 5
CASE MaritalStatus WHEN 'S' THEN 'Single' WHEN 'M' THEN 'Married' ELSE 'Unknown' END
Search for a specified pattern in a column. See the documentation on SQL LIKE ↗ to learn more.
Check whether a value IS NULL
or IS NOT NULL
. Returns a boolean.
The precedence of operations within expressions is detailed in the following list, from highest precedence to the lowest. Operations that are on the same line all have the same precedence.
The following is a complete list of functions available in column expressions.
More info and examples on deriving relative date is available in the reference documentation on deriving relative dates.
More info and examples on Array functions is available in the reference documentation on array functions.
null
.When using ROW_NUMBER
, FIRST
, LAST
, ARRAY_AGG
, or ARRAY_AGG_DISTINCT
, in a window function, be careful of nondeterminism. Imagine we are partitioning by column A and ordering by column B. If for the same value of column A, there are multiple rows with the same value of column B, the results of these window functions may be non-deterministic -- they may produce different results given the same input data and logic.
Window functions need an OVER clause; more info and examples in the reference documentation on window functions.
Aggregate functions can be used in aggregation expressions and window functions.
$.field
$['field']