SQL dialect

Foundry SQL is generally a subset of Spark SQL with ANSI compliance.

Quick reference

Common operations

Copied!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 -- Create a table CREATE TABLE `/path/to/table` (id INT, name STRING); -- Insert data INSERT INTO `/path/to/table` VALUES (1, 'foo'), (2, 'bar'); -- Basic query SELECT * FROM `/path/to/table` WHERE id = 1; -- Join tables SELECT a.*, b.name FROM `/path/to/table1` a LEFT JOIN `/path/to/table2` b ON a.id = b.id; -- Aggregate SELECT category, COUNT(*), AVG(price) FROM `/path/to/products` GROUP BY category HAVING COUNT(*) > 10; -- Window function SELECT name, salary, RANK() OVER (ORDER BY salary DESC) as rank FROM `/path/to/employees`; -- CTE (Common Table Expression) WITH summary AS ( SELECT category, SUM(amount) as total FROM `/path/to/sales` GROUP BY category ) SELECT * FROM summary WHERE total > 1000;

Commonly used functions

CategoryFunctions
Aggregationcount(), sum(), avg(), min(), max(), count(DISTINCT col), collect_set()
Stringupper(), lower(), trim(), concat(), concat_ws(), contains(), split(), substr()
Date/Timecurrent_date(), current_timestamp(), date_add(), date_diff(), to_date(), year(), month(), day()
Mathabs(), round(), ceil(), floor(), pow(), sqrt(), mod()
Arrayarray(), array_contains(), array_size(), array_distinct(), explode(), flatten()
ConditionalCASE WHEN ... THEN ... END, coalesce(), regexp_like(), regexp_extract()
WindowROW_NUMBER(), RANK(), DENSE_RANK(), LAG(), LEAD()

Table references

Copied!
1 2 3 4 5 6 7 8 -- By path SELECT * FROM `/path/to/table`; -- By RID SELECT * FROM `ri.rid.for.table.1234`; -- With branch SELECT * FROM `/path/to/table`.branch_master;

Key syntax notes

  • Subqueries must be wrapped in parentheses.
  • Created tables are Iceberg by default but can be overridden with USING <format> syntax.
  • Only Iceberg tables support INSERT, UPDATE, and DELETE.

Table creation

CREATE [ OR REPLACE ] [ ( col_name1 col_type1, ... ) ] [ USING table_format ] [ AS select_statement ]

If not specified, the table format will be Iceberg. Valid table formats include:

  • Iceberg: iceberg
  • Parquet: parquet
  • Avro: avro
Copied!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 -- examples CREATE TABLE `/path/to/table` AS SELECT * FROM `/path/to/table`; CREATE TABLE `/path/to/table` USING parquet AS SELECT * FROM `/path/to/table`; CREATE TABLE `/path/to/table` ( id INT, name STRING ); CREATE OR REPLACE TABLE `/path/to/table` (id INT, name STRING);

Table alteration

Iceberg tables support data appends with INSERT, UPDATE, and DELETE statements. Non-Iceberg tables do not currently support table alteration.

Insert

INSERT [ INTO ] table_identifier { VALUES ( { value | NULL } [ , ... ] ) [ , ( ... ) ] | query }
Copied!
1 2 3 4 -- examples INSERT INTO `/path/to/table` VALUES (1, 'foo'), (2, 'bar'); INSERT INTO `/path/to/table` SELECT id, name FROM `/path/to/table2`;

Update

UPDATE table_identifier
SET column_name = value [, ...]
[ WHERE condition ]
Copied!
1 2 3 4 5 6 7 -- examples UPDATE `/path/to/table` SET col = 'new value'; UPDATE `/path/to/table` SET col = 'new value' WHERE col = 'old value';

Delete

DELETE FROM table_identifier WHERE condition
Copied!
1 DELETE FROM `/path/to/table` WHERE col = 'to delete'

Alter metadata

Branches can be created on Iceberg tables with the ALTER TABLE syntax.

ALTER TABLE table_identifier CREATE BRANCH branch_name
Copied!
1 ALTER TABLE `/path/to/table` CREATE BRANCH feature_branch

Query composition

[ WITH with_query [ , ... ] ]
select_statement
  [ { UNION | INTERSECT | EXCEPT } select_statement, ... ]
  [ ORDER BY { expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] } ]
  [ WINDOW { window_name AS ( window_spec ) [, ...] } ]
  [ LIMIT { ALL | expression } ]

WITH clause (common table expressions)

Where with_query is:

cte_name [ ( column_name [, ...] ) ] AS ( select_statement )
Copied!
1 2 3 4 5 6 7 -- examples WITH sales_summary (cust_id, total_sales, order_count) AS ( SELECT customer_id, SUM(total_amount), COUNT(*) FROM `/data/orders` GROUP BY customer_id ) SELECT * FROM sales_summary WHERE total_sales > 10000;

SELECT statement

Where select_statement is:

SELECT [ ALL | DISTINCT ] { [ named_expression | regex_column_names | * ] [, ...] }
  FROM { from_item [, ...] }
  [ WHERE boolean_expression ]
  [ GROUP BY { expression | ROLLUP(...) | CUBE(...) | GROUPING SETS(...) } [ , ... ] ]
  [ HAVING boolean_expression ]
  [ LIMIT { ALL | expression } [ OFFSET expression ] ]

Named expressions

Where named_expression is one of:

Expression with optional alias:

expression [ AS alias ]
Copied!
1 2 3 4 -- examples SELECT count(*) AS total_count FROM `/path/to/table`; SELECT name, salary * 1.1 AS adjusted_salary FROM `/path/to/employees`;

Window function:

window_function() OVER ( window_spec )
window_function() OVER window_name

Where window_spec is:

[ PARTITION BY expression [, ...] ]
[ ORDER BY expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] ]
Copied!
1 2 3 4 5 6 7 8 9 10 11 12 13 -- examples SELECT name, ROW_NUMBER() OVER (ORDER BY salary DESC) as rank FROM `/path/to/employees`; SELECT category, product_id, ROW_NUMBER() OVER (PARTITION BY category ORDER BY price DESC) as category_rank FROM `/path/to/products`; -- Using named window SELECT customer_id, order_date, total, ROW_NUMBER() OVER w as order_sequence FROM `/path/to/orders` WINDOW w AS (PARTITION BY customer_id ORDER BY order_date);

FROM clause

Where from_item is one of:

Table reference

Tables can be referenced by filesystem path or resource identifier (RID). Both support optional branch specifications for querying specific Iceberg branches. Branch names that contain special characters need to be escaped with backticks.

{ `/path/to/table`[.branch_<branch name>] | `ri.rid.for.table.1234`[.branch_<branch name>] }
Copied!
1 2 3 4 5 6 7 8 9 10 -- examples SELECT * FROM `/path/to/table`; SELECT * FROM `/path/to/table`.branch_master; SELECT * FROM `/path/to/table` WHERE id = 1; SELECT * FROM `ri.rid.for.table.1234`; SELECT * FROM `ri.rid.for.table`.`branch_abc\xyz`;

Time travel queries to a specific version are possible with the VERSION AS OF syntax using a snapshot ID for Iceberg tables and a transaction RID for Foundry datasets.

Copied!
1 2 3 4 -- Examples SELECT * FROM `/path/to/table` VERSION AS OF `ri.foundry.main.transaction.1234`; SELECT * FROM `/path/to/table` VERSION AS OF 1234567;

Other time travel syntax is not currently supported.

Join

relation
  { CROSS | LEFT [ OUTER ] | RIGHT [ OUTER ] | FULL [ OUTER ] }
  JOIN relation
  ON boolean_expression
Copied!
1 2 3 4 -- examples SELECT * FROM `/path/to/table` LEFT JOIN `/path/to/table2` ON id = id2; SELECT * FROM `/path/to/table` RIGHT OUTER JOIN `/path/to/table2` ON id = id2;

Subquery

( select_statement )
Copied!
1 2 3 4 5 6 7 8 9 -- examples SELECT * FROM (SELECT * FROM `/path/to/table`); SELECT * FROM (SELECT * FROM `/path/to/table`) WHERE id = 1; SELECT * FROM ( VALUES (9001, 1001, 1, DATE '2023-06-01', 'credit_card', 1329.98, 'completed', TIMESTAMP '2023-06-01 11:00:00') ) AS orders(order_id, customer_id, product_id, order_date, payment_method, total, status, created_at);

Subqueries must be wrapped in parentheses. This is enforced in Foundry even though it is not always required in Spark SQL.

Complete query examples

Copied!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 -- Complex query with joins and filtering SELECT DISTINCT o.order_id, c.name, o.total_amount, p.category FROM `/data/orders` o LEFT JOIN `/data/customers` c ON o.customer_id = c.id CROSS JOIN (SELECT product_id, category FROM `/data/products` WHERE category = 'Electronics') p ON o.product_id = p.product_id WHERE o.order_date > DATE '2024-01-01' AND o.total_amount > 1000.0 ORDER BY o.total_amount DESC LIMIT 100; -- Aggregation with HAVING clause SELECT c.name, COUNT(o.order_id) as order_count, SUM(o.total_amount) as total_spent FROM `/data/customers` c LEFT JOIN `/data/orders` o ON c.id = o.customer_id GROUP BY c.name HAVING COUNT(o.order_id) > 0; -- UNION query with multiple joins SELECT emp_id, name, dept, 'Active' AS status FROM `/hr/employees` e LEFT OUTER JOIN `/hr/departments` d ON e.dept_id = d.id WHERE e.active = true UNION SELECT emp_id, name, dept, 'Inactive' AS status FROM (SELECT * FROM `/hr/employees` WHERE termination_date < current_timestamp()) e RIGHT JOIN `/hr/departments` d ON e.dept_id = d.id WHERE e.active = false ORDER BY dept ASC, status DESC NULLS LAST;

Iceberg references

Foundry SQL supports Iceberg references to query metadata. Review the official documentation ↗.

Copied!
1 2 3 4 -- examples SELECT * FROM `/path/to/table/`.files; SELECT * FROM `/path/to/table/`.snapshots;

Functions

Functions are available to transforms data in more complex ways. Refer to the Spark SQL documentation ↗ for detailed information on each function below.

Array functions

FunctionDescriptionExampleFurnaceOntology
array(expr, ...)Create an array with given elementsSELECT array(1, 2, 3)SupportedSupported
array_distinct(array)Remove duplicate values from arraySELECT array_distinct(tags) FROM tableSupportedNot supported
array_intersect(array1, array2)Elements in both arrays without duplicatesSELECT array_intersect(arr1, arr2) FROM tableSupportedNot supported
array_union(array1, array2)Elements in either array without duplicatesSELECT array_union(arr1, arr2) FROM tableSupportedNot supported
array_except(array1, array2)Elements in array1 but not in array2SELECT array_except(arr1, arr2) FROM tableSupportedNot supported
array_join(array, delimiter [, nullReplacement])Concatenate array elements with delimiterSELECT array_join(tags, ',') FROM tableSupportedNot supported
array_max(array)Maximum value in arraySELECT array_max(ARRAY(1, 5, 3))SupportedNot supported
array_min(array)Minimum value in arraySELECT array_min(ARRAY(1, 5, 3))SupportedNot supported
array_position(array, element)Position of first occurrence (1-based, 0 if not found)SELECT array_position(tags, 'item') FROM tableSupportedNot supported
array_remove(array, element)Remove all matching elementsSELECT array_remove(tags, 'old') FROM tableSupportedNot supported
array_size(array)Number of elements in arraySELECT array_size(tags) FROM tableSupportedNot supported
array_contains(array, value)True if array contains valueSELECT array_contains(tags, 'new') FROM tableSupportedNot supported
array_repeat(element, count)Create array with element repeated count timesSELECT array_repeat('x', 5)SupportedNot supported
arrays_overlap(array1, array2)True if arrays have common non-null elementsSELECT arrays_overlap(arr1, arr2) FROM tableSupportedNot supported
flatten(arrayOfArrays)Flatten nested arrays into single arraySELECT flatten(nested_arrays) FROM tableSupportedNot supported
sequence(start, stop, step)Generate sequence of integersSELECT sequence(1, 10, 2)SupportedNot supported
shuffle(array)Random permutation of arraySELECT shuffle(tags) FROM tableSupportedNot supported
slice(array, start, length)Subset of array from start with lengthSELECT slice(tags, 1, 2) FROM tableSupportedNot supported
sort_array(array[, ascendingOrder])Sort array in ascending (default) or descending orderSELECT sort_array(tags) FROM tableSupportedNot supported
get(array, index)Returns element at given index (0-based)SELECT get(tags, 0) FROM tableSupportedNot supported
explode(array)Generates a new row for each element in the arraySELECT explode(tags) FROM tableSupportedNot supported

Map functions

FunctionDescriptionExampleFurnaceOntology
map(key1, value1, key2, value2, ...)Create a map from key-value pairsSELECT map('a', 1, 'b', 2)SupportedNot supported

Struct functions

FunctionDescriptionExampleFurnaceOntology
struct(expr1, expr2, ...)Create a struct from expressionsSELECT struct('red', 'large', 2.0)SupportedNot supported

Date and timestamp functions

FunctionDescriptionExampleFurnaceOntology
current_timestamp()Returns current timestampSELECT current_timestamp()SupportedSupported
current_date()Returns current dateSELECT current_date()SupportedSupported
date_add(expr, num_days)Add days to datedate_add(order_date, 7)SupportedNot supported
date_sub(expr, num_days)Subtract days from datedate_sub(order_date, 7)SupportedNot supported
from_unixtime(expr [, format])Convert Unix timestamp to timestampfrom_unixtime(1609459200)SupportedNot supported
unix_timestamp([expr] [, format])Convert timestamp to Unix timestampunix_timestamp(current_timestamp())SupportedNot supported
day(date)Extract day of month from dateSELECT day(order_date) FROM tableSupportedSupported
hour(timestamp)Extract hour from timestampSELECT hour(order_timestamp) FROM tableSupportedNot supported
minute(timestamp)Extract minute from timestampSELECT minute(order_timestamp) FROM tableSupportedNot supported
month(date)Extract month from dateSELECT month(order_date) FROM tableSupportedSupported
quarter(date)Extract quarter from dateSELECT quarter(order_date) FROM tableSupportedSupported
second(timestamp)Extract second from timestampSELECT second(order_timestamp) FROM tableSupportedNot supported
year(date)Extract year from dateSELECT year(order_date) FROM tableSupportedSupported
to_date(str [, format])Convert string to dateSELECT to_date('2024-01-15')SupportedNot supported
date_diff(date1, date2)Subtract two datesSELECT date_diff('2020-01-01', '2020-01-02')SupportedSupported

Mathematical functions

FunctionDescriptionExampleFurnaceOntology
abs(x)Absolute valueSELECT abs(-5)SupportedSupported
acos(x)Arc cosineSELECT acos(0.5)SupportedNot supported
asin(x)Arc sineSELECT asin(0.5)SupportedNot supported
atan(x)Arc tangentSELECT atan(1)SupportedNot supported
atan2(y, x)Arc tangent of y/xSELECT atan2(1, 1)SupportedNot supported
cbrt(x)Cube rootSELECT cbrt(27)SupportedNot supported
ceil(x)Round up to nearest integerSELECT ceil(3.7)SupportedNot supported
ceiling(x)Round up to nearest integerSELECT ceiling(3.7)SupportedNot supported
cos(x)CosineSELECT cos(pi())SupportedNot supported
cosh(x)Hyperbolic cosineSELECT cosh(0)SupportedNot supported
degrees(x)Convert radians to degreesSELECT degrees(pi())SupportedNot supported
e()Euler's numberSELECT e()SupportedNot supported
exp(x)e raised to power xSELECT exp(1)SupportedNot supported
floor(x)Round down to nearest integerSELECT floor(3.7)SupportedNot supported
ln(x)Natural logarithmSELECT ln(e())SupportedNot supported
log(base, x)Logarithm with specified baseSELECT log(2, 8)SupportedNot supported
log10(x)Base 10 logarithmSELECT log10(100)SupportedNot supported
log2(x)Base 2 logarithmSELECT log2(8)SupportedNot supported
mod(n, m)Modulus (remainder)SELECT mod(10, 3)SupportedNot supported
pi()Pi constantSELECT pi()SupportedNot supported
pow(x, p)x raised to power pSELECT pow(2, 3)SupportedNot supported
power(x, p)x raised to power pSELECT power(2, 3)SupportedNot supported
radians(x)Convert degrees to radiansSELECT radians(180)SupportedNot supported
rand()Random value between 0 and 1SELECT rand()SupportedNot supported
random()Random value between 0 and 1SELECT random()SupportedNot supported
round(x, d)Round to d decimal placesSELECT round(3.7, 1)SupportedNot supported
sign(x)Sign function (-1, 0, 1)SELECT sign(-5)SupportedNot supported
sin(x)SineSELECT sin(pi() / 2)SupportedNot supported
sinh(x)Hyperbolic sineSELECT sinh(0)SupportedNot supported
sqrt(x)Square rootSELECT sqrt(16)SupportedNot supported
tan(x)TangentSELECT tan(pi() / 4)SupportedNot supported
tanh(x)Hyperbolic tangentSELECT tanh(0)SupportedNot supported

String functions

FunctionDescriptionExampleFurnaceOntology
chr(n)Returns character from Unicode code pointSELECT chr(65)SupportedNot supported
char(n)Returns character from code point (alias for chr)SELECT char(65)SupportedNot supported
contains(str, substr)True if string contains substringSELECT contains('hello', 'ell')SupportedNot supported
concat_ws(sep, str1, ...)Concatenate strings with separatorSELECT concat_ws(',', 'a', 'b')SupportedNot supported
concat(str1, ...)Concatenate stringsSELECT concat('a', 'b')SupportedSupported
length(str)Length of string in charactersSELECT length('hello')SupportedSupported
char_length(str)Length of string (alias for length)SELECT char_length('hello')SupportedSupported
character_length(str)Length of string (alias for length)SELECT character_length('hello')SupportedSupported
lower(str)Convert to lowercaseSELECT lower('HELLO')SupportedSupported
upper(str)Convert to uppercaseSELECT upper('hello')SupportedSupported
lpad(str, len, pad)Left pad string to lengthSELECT lpad('hi', 5, 'x')SupportedNot supported
rpad(str, len, pad)Right pad string to lengthSELECT rpad('hi', 5, 'x')SupportedNot supported
ltrim(str)Remove leading whitespaceSELECT ltrim(' hello')SupportedNot supported
rtrim(str)Remove trailing whitespaceSELECT rtrim('hello ')SupportedNot supported
trim(str)Remove leading and trailing whitespaceSELECT trim(' hello ')SupportedSupported
replace(str, search, replace)Replace all occurrencesSELECT replace('hello', 'l', 'x')SupportedSupported
substr(str, pos, len)Extract substringSELECT substr('hello', 2, 3)SupportedSupported
substring(str, pos, len)Extract substringSELECT substring('hello', 2, 3)SupportedSupported
split(str, delimiter)Split string into arraySELECT split('a,b,c', ',')SupportedNot supported
split_part(str, delimiter, index)Get part from split string (1-based)SELECT split_part('a,b,c', ',', 2)SupportedNot supported
instr(str, substr)Find substring position (1-based, 0 if not found)SELECT instr('hello', 'll')SupportedNot supported
position(substr IN str)Find substring position (SQL standard syntax)SELECT position('ll' IN 'hello')SupportedNot supported
startswith(str, prefix)Check if string starts with prefixSELECT startswith('hello', 'he')SupportedNot supported
luhn_check(str)Validate string using Luhn algorithmSELECT luhn_check('79927398713')SupportedNot supported

Conditional functions

FunctionDescriptionExampleFurnaceOntology
coalesce(expr1, expr2, ...)Returns first non-null expressionSELECT coalesce(col, 'default')SupportedNot supported
regexp_like(str, pattern)Test if string matches regex patternSELECT regexp_like('hello123', '[0-9]+')SupportedNot supported
regexp_extract(str, pattern)Extract first substring matching patternSELECT regexp_extract('hello123', '[0-9]+')SupportedSupported
regexp_extract_all(str, pattern)Extract all substrings matching patternSELECT regexp_extract_all('a1b2c3', '[0-9]+')SupportedNot supported
regexp_replace(str, pattern, replacement)Replace all matches with replacement stringSELECT regexp_replace('hello123', '[0-9]+', 'X')SupportedSupported
regexp_count(str, pattern)Count occurrences of pattern in stringSELECT regexp_count('a1b2c3', '[0-9]+')SupportedNot supported

Conversion functions

FunctionDescriptionExampleFurnaceOntology
cast(expr AS type)Convert value to specified typecast(count() AS float)SupportedSupported
try_cast(expr AS type)Convert value to specified type, returns NULL on failuretry_cast('abc' AS int)SupportedNot supported

Aggregate functions

FunctionDescriptionExampleFurnaceOntology
count()Count all rowsSELECT count() FROM tableSupportedSupported
count(DISTINCT col)Count distinct values in columnSELECT count(DISTINCT id) FROM tableSupportedSupported if single property
sum(col)Sum values in columnSELECT sum(cost) FROM tableSupportedSupported
avg(col)Mean average values in columnSELECT avg(weight) FROM tableSupportedSupported
min(col)Minimum value in columnSELECT min(price) FROM tableSupportedSupported
max(col)Maximum value in columnSELECT max(price) FROM tableSupportedSupported
min_by(x, y)Value of x associated with minimum value of ySELECT min_by(name, age) FROM tableSupportedNot supported
max_by(x, y)Value of x associated with maximum value of ySELECT max_by(name, age) FROM tableSupportedNot supported
count_if(condition)Count rows where condition is trueSELECT count_if(age > 18) FROM tableSupportedNot supported
bool_and(condition)True if all values are trueSELECT bool_and(active) FROM tableSupportedNot supported
bool_or(condition)True if any value is trueSELECT bool_or(active) FROM tableSupportedNot supported
bit_and(col)Bitwise AND of all non-null integer valuesSELECT bit_and(flags) FROM tableSupportedNot supported
bit_or(col)Bitwise OR of all non-null integer valuesSELECT bit_or(flags) FROM tableSupportedNot supported
bit_xor(col)Bitwise XOR of all non-null integer valuesSELECT bit_xor(flags) FROM tableSupportedNot supported
every(condition)True if all values are true (alias for bool_and)SELECT every(active) FROM tableSupportedNot supported
stddev(col)Sample standard deviationSELECT stddev(salary) FROM tableSupportedSupported
stddev_pop(col)Population standard deviationSELECT stddev_pop(salary) FROM tableSupportedSupported
stddev_samp(col)Sample standard deviation (alias for stddev)SELECT stddev_samp(salary) FROM tableSupportedSupported
variance(col)Sample varianceSELECT variance(salary) FROM tableSupportedNot supported
var_pop(col)Population varianceSELECT var_pop(salary) FROM tableSupportedNot supported
var_samp(col)Sample varianceSELECT var_samp(salary) FROM tableSupportedNot supported
corr(y, x)Pearson correlation coefficientSELECT corr(sales, ads) FROM tableSupportedNot supported
covar_pop(y, x)Population covarianceSELECT covar_pop(y, x) FROM tableSupportedNot supported
covar_samp(y, x)Sample covarianceSELECT covar_samp(y, x) FROM tableSupportedNot supported
skewness(col)SkewnessSELECT skewness(values) FROM tableSupportedNot supported
regr_intercept(y, x)Linear regression interceptSELECT regr_intercept(y, x) FROM tableSupportedNot supported
regr_slope(y, x)Linear regression slopeSELECT regr_slope(y, x) FROM tableSupportedNot supported
grouping(col)Returns 1 if column is aggregated, 0 otherwise (for use with ROLLUP/CUBE/GROUPING SETS)SELECT grouping(category) FROM table GROUP BY ROLLUP(category)SupportedNot supported
collect_set(col)Returns an array of unique values from columnSELECT collect_set(name) FROM table GROUP BY categorySupportedNot supported

Window functions

FunctionDescriptionExampleFurnaceOntology
ROW_NUMBER()Assigns sequential row number within partitionROW_NUMBER() OVER (ORDER BY col)SupportedSupported
RANK()Rank with gaps for tiesRANK() OVER (ORDER BY salary DESC)SupportedSupported
DENSE_RANK()Rank without gaps for tiesDENSE_RANK() OVER (ORDER BY salary DESC)SupportedSupported
PERCENT_RANK()Percentage ranking of value in groupPERCENT_RANK() OVER (ORDER BY salary)SupportedNot supported
CUME_DIST()Cumulative distribution of valueCUME_DIST() OVER (ORDER BY salary)SupportedNot supported
NTILE(n)Divide rows into n bucketsNTILE(4) OVER (ORDER BY salary)SupportedNot supported
LAG(expr[, offset[, default]])Value from previous rowLAG(salary, 1) OVER (ORDER BY date)SupportedSupported
LEAD(expr[, offset[, default]])Value from next rowLEAD(salary, 1) OVER (ORDER BY date)SupportedSupported
FIRST_VALUE(expr)First value in window frameFIRST_VALUE(salary) OVER (ORDER BY date)SupportedSupported
LAST_VALUE(expr)Last value in window frameLAST_VALUE(salary) OVER (ORDER BY date)SupportedSupported
NTH_VALUE(expr, n)Nth value in window frameNTH_VALUE(salary, 2) OVER (ORDER BY date)SupportedSupported
Aggregate functionsStandard aggregates (count, sum, avg, min, max, stddev, variance, etc.) can be used as window functionsSUM(amount) OVER (PARTITION BY customer_id)SupportedOnly count, sum, min max, avg

Window functions require an OVER clause with optional PARTITION BY and ORDER BY:

Copied!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 -- examples SELECT id, name, ROW_NUMBER() OVER (ORDER BY name) as row_num FROM `/path/to/table`; SELECT category, name, price, ROW_NUMBER() OVER (PARTITION BY category ORDER BY price DESC) as rank_in_category FROM `/path/to/products`; SELECT order_id, customer_id, order_date, total_amount, SUM(total_amount) OVER (PARTITION BY customer_id ORDER BY order_date) as running_total, AVG(total_amount) OVER (PARTITION BY customer_id) as customer_avg, COUNT(*) OVER (PARTITION BY customer_id) as customer_order_count FROM `/path/to/orders`; SELECT id, amount, ROW_NUMBER() OVER w as sequence_num FROM `/path/to/transactions` WINDOW w AS (PARTITION BY customer_id ORDER BY transaction_date);

Misc functions

FunctionDescriptionExampleFurnaceOntology
zorder()Iceberg Z-order optimization (Iceberg only)CALL zorder('table', 'col1,col2')SupportedNot supported

Geospatial functions

FunctionDescriptionExampleFurnaceOntology
st_point(x, y)Creates a point geometry from X, Y coordinatesSELECT st_point(-122.4, 37.8)SupportedNot supported
st_contains(geom1, geom2)Returns true if geom1 contains geom2WHERE st_contains(boundary, location)SupportedNot supported
st_equals(geom1, geom2)Returns true if geometries are spatially equalSELECT st_equals(point1, point2)SupportedNot supported
st_geomfromgeojson(json)Creates geometry from GeoJSON stringSELECT st_geomfromgeojson('{"type":"Point","coordinates":[0,0]}')SupportedNot supported
st_astext(geom)Returns geometry as Well-Known Text (WKT) stringSELECT st_astext(st_point(0, 0))SupportedNot supported
st_h3cellids(geom, level, fullCover)Returns an array of H3 cells that cover the geometrySELECT st_h3cellids(st_point(0, 0), 8, true)SupportedNot supported

Geometry columns cannot be returned directly. Use st_astext() to convert to WKT format.

Arithmetic operators

OperatorDescriptionExampleFurnaceOntology
+PlusSELECT count() + 5SupportedSupported
-MinusSELECT count() - 1SupportedSupported
*TimesSELECT sum(weight * 2)SupportedSupported
/DivideSELECT (avg(a) + avg(b)) / 2SupportedSupported

Boolean operators

OperatorDescriptionExampleFurnaceOntology
=Equal toWHERE id = 1SupportedSupported
<>, !=Not equal toWHERE status <> 'inactive'SupportedSupported
<Less thanWHERE age < 18SupportedSupported
<=Less than or equal toWHERE price <= 100.0SupportedSupported
>Greater thanWHERE salary > 50000SupportedSupported
>=Greater than or equal toWHERE score >= 90SupportedSupported
ANDLogical ANDWHERE active = true AND verified = trueSupportedSupported
ORLogical ORWHERE status = 'new' OR status = 'pending'SupportedSupported
NOTLogical NOTWHERE NOT deletedSupportedSupported
INValue in listWHERE category IN ('A', 'B', 'C')SupportedSupported
LIKEPattern matchingWHERE name LIKE 'John%'SupportedSupported
EXISTSTrue if subquery returns rowsWHERE EXISTS (SELECT 1 FROM orders WHERE orders.customer_id = customers.id)SupportedSupported

Other expressions

ExpressionDescriptionExampleFurnaceOntology
CASE ... WHENSwitch expressionSELECT CASE WHEN mark >= 70 THEN 'A' WHEN mark >= 60 THEN 'B' ELSE 'C' ENDSupportedSupported
INTERVALTime interval literalSELECT order_date + INTERVAL 30 DAY, WHERE created_at > current_timestamp() - INTERVAL 1 HOURSupportedNot supported

The IF(condition, trueValue, falseValue) function is not supported. Use CASE WHEN condition THEN trueValue ELSE falseValue END instead.

Case-when statements

Copied!
1 2 3 4 CASE WHEN condition THEN result [WHEN condition THEN result] [ELSE result] END

Date/time intervals

Copied!
1 2 3 4 5 6 7 8 9 10 11 INTERVAL { yearMonthIntervalQualifier | dayTimeIntervalQualifier } yearMonthIntervalQualifier { YEAR [TO MONTH] | MONTH } dayTimeIntervalQualifier { DAY [TO { HOUR | MINUTE | SECOND } ] | HOUR [TO { MINUTE | SECOND } ] | MINUTE [TO SECOND] | SECOND }
Copied!
1 2 3 4 5 -- examples INTERVAL '1' YEAR INTERVAL '6' MONTH INTERVAL '25' DAY INTERVAL '6' YEAR TO MONTH

Data types

TypeSpark equivalentFurnaceOntology
BOOLEANBOOLEANSupportedSupported
SHORTSHORTSupportedSupported
INTINTSupportedSupported
LONGLONGSupportedSupported
FLOATFLOATSupportedSupported
DOUBLEDOUBLESupportedSupported
DATEDATESupportedSupported
TIMESTAMPTIMESTAMPSupportedSupported
TIMESTAMP_NTZTIMESTAMP_NTZSupportedSupported
STRINGSTRINGSupportedSupported
BINARYBINARYSupportedNot supported
DECIMALDECIMALSupportedSupported
MAP<key, value>MAP<key, value>SupportedNot supported
ARRAY<element_type>ARRAY<element_type>SupportedSupported
STRUCT<field1_name: field1_type, field2_name: field2_type, …>STRUCT<field1_name: field1_type, field2_name: field2_type, …>SupportedNot supported
Copied!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 CREATE TABLE `/path/to/table` ( col1 BOOLEAN, col2 SHORT, col3 INT, col4 LONG, col5 FLOAT, col6 DOUBLE, col7 DATE, col8 TIMESTAMP, col9 TIMESTAMP_NTZ, col10 STRING, col11 BINARY, col12 DECIMAL(10, 2), col13 MAP<STRING, STRING>, col14 ARRAY<STRING>, col15 STRUCT<col1: STRING, col2: STRING> );