Foundry SQL is generally a subset of Spark SQL with ANSI compliance.
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32-- Create a table CREATE TABLE `/path/to/table` (id INT, name STRING); -- Insert data INSERT INTO `/path/to/table` VALUES (1, 'foo'), (2, 'bar'); -- Basic query SELECT * FROM `/path/to/table` WHERE id = 1; -- Join tables SELECT a.*, b.name FROM `/path/to/table1` a LEFT JOIN `/path/to/table2` b ON a.id = b.id; -- Aggregate SELECT category, COUNT(*), AVG(price) FROM `/path/to/products` GROUP BY category HAVING COUNT(*) > 10; -- Window function SELECT name, salary, RANK() OVER (ORDER BY salary DESC) as rank FROM `/path/to/employees`; -- CTE (Common Table Expression) WITH summary AS ( SELECT category, SUM(amount) as total FROM `/path/to/sales` GROUP BY category ) SELECT * FROM summary WHERE total > 1000;
| Category | Functions |
|---|---|
| Aggregation | count(), sum(), avg(), min(), max(), count(DISTINCT col), collect_set() |
| String | upper(), lower(), trim(), concat(), concat_ws(), contains(), split(), substr() |
| Date/Time | current_date(), current_timestamp(), date_add(), date_diff(), to_date(), year(), month(), day() |
| Math | abs(), round(), ceil(), floor(), pow(), sqrt(), mod() |
| Array | array(), array_contains(), array_size(), array_distinct(), explode(), flatten() |
| Conditional | CASE WHEN ... THEN ... END, coalesce(), regexp_like(), regexp_extract() |
| Window | ROW_NUMBER(), RANK(), DENSE_RANK(), LAG(), LEAD() |
Copied!1 2 3 4 5 6 7 8-- By path SELECT * FROM `/path/to/table`; -- By RID SELECT * FROM `ri.rid.for.table.1234`; -- With branch SELECT * FROM `/path/to/table`.branch_master;
USING <format> syntax.INSERT, UPDATE, and DELETE.CREATE [ OR REPLACE ] [ ( col_name1 col_type1, ... ) ] [ USING table_format ] [ AS select_statement ]
If not specified, the table format will be Iceberg. Valid table formats include:
icebergparquetavroCopied!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16-- examples CREATE TABLE `/path/to/table` AS SELECT * FROM `/path/to/table`; CREATE TABLE `/path/to/table` USING parquet AS SELECT * FROM `/path/to/table`; CREATE TABLE `/path/to/table` ( id INT, name STRING ); CREATE OR REPLACE TABLE `/path/to/table` (id INT, name STRING);
Iceberg tables support data appends with INSERT, UPDATE, and DELETE statements. Non-Iceberg tables do not currently support table alteration.
INSERT [ INTO ] table_identifier { VALUES ( { value | NULL } [ , ... ] ) [ , ( ... ) ] | query }
Copied!1 2 3 4-- examples INSERT INTO `/path/to/table` VALUES (1, 'foo'), (2, 'bar'); INSERT INTO `/path/to/table` SELECT id, name FROM `/path/to/table2`;
UPDATE table_identifier
SET column_name = value [, ...]
[ WHERE condition ]
Copied!1 2 3 4 5 6 7-- examples UPDATE `/path/to/table` SET col = 'new value'; UPDATE `/path/to/table` SET col = 'new value' WHERE col = 'old value';
DELETE FROM table_identifier WHERE condition
Copied!1DELETE FROM `/path/to/table` WHERE col = 'to delete'
Branches can be created on Iceberg tables with the ALTER TABLE syntax.
ALTER TABLE table_identifier CREATE BRANCH branch_name
Copied!1ALTER TABLE `/path/to/table` CREATE BRANCH feature_branch
[ WITH with_query [ , ... ] ]
select_statement
[ { UNION | INTERSECT | EXCEPT } select_statement, ... ]
[ ORDER BY { expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] } ]
[ WINDOW { window_name AS ( window_spec ) [, ...] } ]
[ LIMIT { ALL | expression } ]
Where with_query is:
cte_name [ ( column_name [, ...] ) ] AS ( select_statement )
Copied!1 2 3 4 5 6 7-- examples WITH sales_summary (cust_id, total_sales, order_count) AS ( SELECT customer_id, SUM(total_amount), COUNT(*) FROM `/data/orders` GROUP BY customer_id ) SELECT * FROM sales_summary WHERE total_sales > 10000;
Where select_statement is:
SELECT [ ALL | DISTINCT ] { [ named_expression | regex_column_names | * ] [, ...] }
FROM { from_item [, ...] }
[ WHERE boolean_expression ]
[ GROUP BY { expression | ROLLUP(...) | CUBE(...) | GROUPING SETS(...) } [ , ... ] ]
[ HAVING boolean_expression ]
[ LIMIT { ALL | expression } [ OFFSET expression ] ]
Where named_expression is one of:
Expression with optional alias:
expression [ AS alias ]
Copied!1 2 3 4-- examples SELECT count(*) AS total_count FROM `/path/to/table`; SELECT name, salary * 1.1 AS adjusted_salary FROM `/path/to/employees`;
Window function:
window_function() OVER ( window_spec )
window_function() OVER window_name
Where window_spec is:
[ PARTITION BY expression [, ...] ]
[ ORDER BY expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] ]
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13-- examples SELECT name, ROW_NUMBER() OVER (ORDER BY salary DESC) as rank FROM `/path/to/employees`; SELECT category, product_id, ROW_NUMBER() OVER (PARTITION BY category ORDER BY price DESC) as category_rank FROM `/path/to/products`; -- Using named window SELECT customer_id, order_date, total, ROW_NUMBER() OVER w as order_sequence FROM `/path/to/orders` WINDOW w AS (PARTITION BY customer_id ORDER BY order_date);
Where from_item is one of:
Tables can be referenced by filesystem path or resource identifier (RID). Both support optional branch specifications for querying specific Iceberg branches. Branch names that contain special characters need to be escaped with backticks.
{ `/path/to/table`[.branch_<branch name>] | `ri.rid.for.table.1234`[.branch_<branch name>] }
Copied!1 2 3 4 5 6 7 8 9 10-- examples SELECT * FROM `/path/to/table`; SELECT * FROM `/path/to/table`.branch_master; SELECT * FROM `/path/to/table` WHERE id = 1; SELECT * FROM `ri.rid.for.table.1234`; SELECT * FROM `ri.rid.for.table`.`branch_abc\xyz`;
Time travel queries to a specific version are possible with the VERSION AS OF syntax using a snapshot ID for Iceberg tables and a transaction RID for Foundry datasets.
Copied!1 2 3 4-- Examples SELECT * FROM `/path/to/table` VERSION AS OF `ri.foundry.main.transaction.1234`; SELECT * FROM `/path/to/table` VERSION AS OF 1234567;
Other time travel syntax is not currently supported.
relation
{ CROSS | LEFT [ OUTER ] | RIGHT [ OUTER ] | FULL [ OUTER ] }
JOIN relation
ON boolean_expression
Copied!1 2 3 4-- examples SELECT * FROM `/path/to/table` LEFT JOIN `/path/to/table2` ON id = id2; SELECT * FROM `/path/to/table` RIGHT OUTER JOIN `/path/to/table2` ON id = id2;
( select_statement )
Copied!1 2 3 4 5 6 7 8 9-- examples SELECT * FROM (SELECT * FROM `/path/to/table`); SELECT * FROM (SELECT * FROM `/path/to/table`) WHERE id = 1; SELECT * FROM ( VALUES (9001, 1001, 1, DATE '2023-06-01', 'credit_card', 1329.98, 'completed', TIMESTAMP '2023-06-01 11:00:00') ) AS orders(order_id, customer_id, product_id, order_date, payment_method, total, status, created_at);
Subqueries must be wrapped in parentheses. This is enforced in Foundry even though it is not always required in Spark SQL.
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37-- Complex query with joins and filtering SELECT DISTINCT o.order_id, c.name, o.total_amount, p.category FROM `/data/orders` o LEFT JOIN `/data/customers` c ON o.customer_id = c.id CROSS JOIN (SELECT product_id, category FROM `/data/products` WHERE category = 'Electronics') p ON o.product_id = p.product_id WHERE o.order_date > DATE '2024-01-01' AND o.total_amount > 1000.0 ORDER BY o.total_amount DESC LIMIT 100; -- Aggregation with HAVING clause SELECT c.name, COUNT(o.order_id) as order_count, SUM(o.total_amount) as total_spent FROM `/data/customers` c LEFT JOIN `/data/orders` o ON c.id = o.customer_id GROUP BY c.name HAVING COUNT(o.order_id) > 0; -- UNION query with multiple joins SELECT emp_id, name, dept, 'Active' AS status FROM `/hr/employees` e LEFT OUTER JOIN `/hr/departments` d ON e.dept_id = d.id WHERE e.active = true UNION SELECT emp_id, name, dept, 'Inactive' AS status FROM (SELECT * FROM `/hr/employees` WHERE termination_date < current_timestamp()) e RIGHT JOIN `/hr/departments` d ON e.dept_id = d.id WHERE e.active = false ORDER BY dept ASC, status DESC NULLS LAST;
Foundry SQL supports Iceberg references to query metadata. Review the official documentation ↗.
Copied!1 2 3 4-- examples SELECT * FROM `/path/to/table/`.files; SELECT * FROM `/path/to/table/`.snapshots;
Functions are available to transforms data in more complex ways. Refer to the Spark SQL documentation ↗ for detailed information on each function below.
| Function | Description | Example | Furnace | Ontology |
|---|---|---|---|---|
array(expr, ...) | Create an array with given elements | SELECT array(1, 2, 3) | Supported | Supported |
array_distinct(array) | Remove duplicate values from array | SELECT array_distinct(tags) FROM table | Supported | Not supported |
array_intersect(array1, array2) | Elements in both arrays without duplicates | SELECT array_intersect(arr1, arr2) FROM table | Supported | Not supported |
array_union(array1, array2) | Elements in either array without duplicates | SELECT array_union(arr1, arr2) FROM table | Supported | Not supported |
array_except(array1, array2) | Elements in array1 but not in array2 | SELECT array_except(arr1, arr2) FROM table | Supported | Not supported |
array_join(array, delimiter [, nullReplacement]) | Concatenate array elements with delimiter | SELECT array_join(tags, ',') FROM table | Supported | Not supported |
array_max(array) | Maximum value in array | SELECT array_max(ARRAY(1, 5, 3)) | Supported | Not supported |
array_min(array) | Minimum value in array | SELECT array_min(ARRAY(1, 5, 3)) | Supported | Not supported |
array_position(array, element) | Position of first occurrence (1-based, 0 if not found) | SELECT array_position(tags, 'item') FROM table | Supported | Not supported |
array_remove(array, element) | Remove all matching elements | SELECT array_remove(tags, 'old') FROM table | Supported | Not supported |
array_size(array) | Number of elements in array | SELECT array_size(tags) FROM table | Supported | Not supported |
array_contains(array, value) | True if array contains value | SELECT array_contains(tags, 'new') FROM table | Supported | Not supported |
array_repeat(element, count) | Create array with element repeated count times | SELECT array_repeat('x', 5) | Supported | Not supported |
arrays_overlap(array1, array2) | True if arrays have common non-null elements | SELECT arrays_overlap(arr1, arr2) FROM table | Supported | Not supported |
flatten(arrayOfArrays) | Flatten nested arrays into single array | SELECT flatten(nested_arrays) FROM table | Supported | Not supported |
sequence(start, stop, step) | Generate sequence of integers | SELECT sequence(1, 10, 2) | Supported | Not supported |
shuffle(array) | Random permutation of array | SELECT shuffle(tags) FROM table | Supported | Not supported |
slice(array, start, length) | Subset of array from start with length | SELECT slice(tags, 1, 2) FROM table | Supported | Not supported |
sort_array(array[, ascendingOrder]) | Sort array in ascending (default) or descending order | SELECT sort_array(tags) FROM table | Supported | Not supported |
get(array, index) | Returns element at given index (0-based) | SELECT get(tags, 0) FROM table | Supported | Not supported |
explode(array) | Generates a new row for each element in the array | SELECT explode(tags) FROM table | Supported | Not supported |
| Function | Description | Example | Furnace | Ontology |
|---|---|---|---|---|
map(key1, value1, key2, value2, ...) | Create a map from key-value pairs | SELECT map('a', 1, 'b', 2) | Supported | Not supported |
| Function | Description | Example | Furnace | Ontology |
|---|---|---|---|---|
struct(expr1, expr2, ...) | Create a struct from expressions | SELECT struct('red', 'large', 2.0) | Supported | Not supported |
| Function | Description | Example | Furnace | Ontology |
|---|---|---|---|---|
current_timestamp() | Returns current timestamp | SELECT current_timestamp() | Supported | Supported |
current_date() | Returns current date | SELECT current_date() | Supported | Supported |
date_add(expr, num_days) | Add days to date | date_add(order_date, 7) | Supported | Not supported |
date_sub(expr, num_days) | Subtract days from date | date_sub(order_date, 7) | Supported | Not supported |
from_unixtime(expr [, format]) | Convert Unix timestamp to timestamp | from_unixtime(1609459200) | Supported | Not supported |
unix_timestamp([expr] [, format]) | Convert timestamp to Unix timestamp | unix_timestamp(current_timestamp()) | Supported | Not supported |
day(date) | Extract day of month from date | SELECT day(order_date) FROM table | Supported | Supported |
hour(timestamp) | Extract hour from timestamp | SELECT hour(order_timestamp) FROM table | Supported | Not supported |
minute(timestamp) | Extract minute from timestamp | SELECT minute(order_timestamp) FROM table | Supported | Not supported |
month(date) | Extract month from date | SELECT month(order_date) FROM table | Supported | Supported |
quarter(date) | Extract quarter from date | SELECT quarter(order_date) FROM table | Supported | Supported |
second(timestamp) | Extract second from timestamp | SELECT second(order_timestamp) FROM table | Supported | Not supported |
year(date) | Extract year from date | SELECT year(order_date) FROM table | Supported | Supported |
to_date(str [, format]) | Convert string to date | SELECT to_date('2024-01-15') | Supported | Not supported |
date_diff(date1, date2) | Subtract two dates | SELECT date_diff('2020-01-01', '2020-01-02') | Supported | Supported |
| Function | Description | Example | Furnace | Ontology |
|---|---|---|---|---|
abs(x) | Absolute value | SELECT abs(-5) | Supported | Supported |
acos(x) | Arc cosine | SELECT acos(0.5) | Supported | Not supported |
asin(x) | Arc sine | SELECT asin(0.5) | Supported | Not supported |
atan(x) | Arc tangent | SELECT atan(1) | Supported | Not supported |
atan2(y, x) | Arc tangent of y/x | SELECT atan2(1, 1) | Supported | Not supported |
cbrt(x) | Cube root | SELECT cbrt(27) | Supported | Not supported |
ceil(x) | Round up to nearest integer | SELECT ceil(3.7) | Supported | Not supported |
ceiling(x) | Round up to nearest integer | SELECT ceiling(3.7) | Supported | Not supported |
cos(x) | Cosine | SELECT cos(pi()) | Supported | Not supported |
cosh(x) | Hyperbolic cosine | SELECT cosh(0) | Supported | Not supported |
degrees(x) | Convert radians to degrees | SELECT degrees(pi()) | Supported | Not supported |
e() | Euler's number | SELECT e() | Supported | Not supported |
exp(x) | e raised to power x | SELECT exp(1) | Supported | Not supported |
floor(x) | Round down to nearest integer | SELECT floor(3.7) | Supported | Not supported |
ln(x) | Natural logarithm | SELECT ln(e()) | Supported | Not supported |
log(base, x) | Logarithm with specified base | SELECT log(2, 8) | Supported | Not supported |
log10(x) | Base 10 logarithm | SELECT log10(100) | Supported | Not supported |
log2(x) | Base 2 logarithm | SELECT log2(8) | Supported | Not supported |
mod(n, m) | Modulus (remainder) | SELECT mod(10, 3) | Supported | Not supported |
pi() | Pi constant | SELECT pi() | Supported | Not supported |
pow(x, p) | x raised to power p | SELECT pow(2, 3) | Supported | Not supported |
power(x, p) | x raised to power p | SELECT power(2, 3) | Supported | Not supported |
radians(x) | Convert degrees to radians | SELECT radians(180) | Supported | Not supported |
rand() | Random value between 0 and 1 | SELECT rand() | Supported | Not supported |
random() | Random value between 0 and 1 | SELECT random() | Supported | Not supported |
round(x, d) | Round to d decimal places | SELECT round(3.7, 1) | Supported | Not supported |
sign(x) | Sign function (-1, 0, 1) | SELECT sign(-5) | Supported | Not supported |
sin(x) | Sine | SELECT sin(pi() / 2) | Supported | Not supported |
sinh(x) | Hyperbolic sine | SELECT sinh(0) | Supported | Not supported |
sqrt(x) | Square root | SELECT sqrt(16) | Supported | Not supported |
tan(x) | Tangent | SELECT tan(pi() / 4) | Supported | Not supported |
tanh(x) | Hyperbolic tangent | SELECT tanh(0) | Supported | Not supported |
| Function | Description | Example | Furnace | Ontology |
|---|---|---|---|---|
chr(n) | Returns character from Unicode code point | SELECT chr(65) | Supported | Not supported |
char(n) | Returns character from code point (alias for chr) | SELECT char(65) | Supported | Not supported |
contains(str, substr) | True if string contains substring | SELECT contains('hello', 'ell') | Supported | Not supported |
concat_ws(sep, str1, ...) | Concatenate strings with separator | SELECT concat_ws(',', 'a', 'b') | Supported | Not supported |
concat(str1, ...) | Concatenate strings | SELECT concat('a', 'b') | Supported | Supported |
length(str) | Length of string in characters | SELECT length('hello') | Supported | Supported |
char_length(str) | Length of string (alias for length) | SELECT char_length('hello') | Supported | Supported |
character_length(str) | Length of string (alias for length) | SELECT character_length('hello') | Supported | Supported |
lower(str) | Convert to lowercase | SELECT lower('HELLO') | Supported | Supported |
upper(str) | Convert to uppercase | SELECT upper('hello') | Supported | Supported |
lpad(str, len, pad) | Left pad string to length | SELECT lpad('hi', 5, 'x') | Supported | Not supported |
rpad(str, len, pad) | Right pad string to length | SELECT rpad('hi', 5, 'x') | Supported | Not supported |
ltrim(str) | Remove leading whitespace | SELECT ltrim(' hello') | Supported | Not supported |
rtrim(str) | Remove trailing whitespace | SELECT rtrim('hello ') | Supported | Not supported |
trim(str) | Remove leading and trailing whitespace | SELECT trim(' hello ') | Supported | Supported |
replace(str, search, replace) | Replace all occurrences | SELECT replace('hello', 'l', 'x') | Supported | Supported |
substr(str, pos, len) | Extract substring | SELECT substr('hello', 2, 3) | Supported | Supported |
substring(str, pos, len) | Extract substring | SELECT substring('hello', 2, 3) | Supported | Supported |
split(str, delimiter) | Split string into array | SELECT split('a,b,c', ',') | Supported | Not supported |
split_part(str, delimiter, index) | Get part from split string (1-based) | SELECT split_part('a,b,c', ',', 2) | Supported | Not supported |
instr(str, substr) | Find substring position (1-based, 0 if not found) | SELECT instr('hello', 'll') | Supported | Not supported |
position(substr IN str) | Find substring position (SQL standard syntax) | SELECT position('ll' IN 'hello') | Supported | Not supported |
startswith(str, prefix) | Check if string starts with prefix | SELECT startswith('hello', 'he') | Supported | Not supported |
luhn_check(str) | Validate string using Luhn algorithm | SELECT luhn_check('79927398713') | Supported | Not supported |
| Function | Description | Example | Furnace | Ontology |
|---|---|---|---|---|
coalesce(expr1, expr2, ...) | Returns first non-null expression | SELECT coalesce(col, 'default') | Supported | Not supported |
regexp_like(str, pattern) | Test if string matches regex pattern | SELECT regexp_like('hello123', '[0-9]+') | Supported | Not supported |
regexp_extract(str, pattern) | Extract first substring matching pattern | SELECT regexp_extract('hello123', '[0-9]+') | Supported | Supported |
regexp_extract_all(str, pattern) | Extract all substrings matching pattern | SELECT regexp_extract_all('a1b2c3', '[0-9]+') | Supported | Not supported |
regexp_replace(str, pattern, replacement) | Replace all matches with replacement string | SELECT regexp_replace('hello123', '[0-9]+', 'X') | Supported | Supported |
regexp_count(str, pattern) | Count occurrences of pattern in string | SELECT regexp_count('a1b2c3', '[0-9]+') | Supported | Not supported |
| Function | Description | Example | Furnace | Ontology |
|---|---|---|---|---|
cast(expr AS type) | Convert value to specified type | cast(count() AS float) | Supported | Supported |
try_cast(expr AS type) | Convert value to specified type, returns NULL on failure | try_cast('abc' AS int) | Supported | Not supported |
| Function | Description | Example | Furnace | Ontology |
|---|---|---|---|---|
count() | Count all rows | SELECT count() FROM table | Supported | Supported |
count(DISTINCT col) | Count distinct values in column | SELECT count(DISTINCT id) FROM table | Supported | Supported if single property |
sum(col) | Sum values in column | SELECT sum(cost) FROM table | Supported | Supported |
avg(col) | Mean average values in column | SELECT avg(weight) FROM table | Supported | Supported |
min(col) | Minimum value in column | SELECT min(price) FROM table | Supported | Supported |
max(col) | Maximum value in column | SELECT max(price) FROM table | Supported | Supported |
min_by(x, y) | Value of x associated with minimum value of y | SELECT min_by(name, age) FROM table | Supported | Not supported |
max_by(x, y) | Value of x associated with maximum value of y | SELECT max_by(name, age) FROM table | Supported | Not supported |
count_if(condition) | Count rows where condition is true | SELECT count_if(age > 18) FROM table | Supported | Not supported |
bool_and(condition) | True if all values are true | SELECT bool_and(active) FROM table | Supported | Not supported |
bool_or(condition) | True if any value is true | SELECT bool_or(active) FROM table | Supported | Not supported |
bit_and(col) | Bitwise AND of all non-null integer values | SELECT bit_and(flags) FROM table | Supported | Not supported |
bit_or(col) | Bitwise OR of all non-null integer values | SELECT bit_or(flags) FROM table | Supported | Not supported |
bit_xor(col) | Bitwise XOR of all non-null integer values | SELECT bit_xor(flags) FROM table | Supported | Not supported |
every(condition) | True if all values are true (alias for bool_and) | SELECT every(active) FROM table | Supported | Not supported |
stddev(col) | Sample standard deviation | SELECT stddev(salary) FROM table | Supported | Supported |
stddev_pop(col) | Population standard deviation | SELECT stddev_pop(salary) FROM table | Supported | Supported |
stddev_samp(col) | Sample standard deviation (alias for stddev) | SELECT stddev_samp(salary) FROM table | Supported | Supported |
variance(col) | Sample variance | SELECT variance(salary) FROM table | Supported | Not supported |
var_pop(col) | Population variance | SELECT var_pop(salary) FROM table | Supported | Not supported |
var_samp(col) | Sample variance | SELECT var_samp(salary) FROM table | Supported | Not supported |
corr(y, x) | Pearson correlation coefficient | SELECT corr(sales, ads) FROM table | Supported | Not supported |
covar_pop(y, x) | Population covariance | SELECT covar_pop(y, x) FROM table | Supported | Not supported |
covar_samp(y, x) | Sample covariance | SELECT covar_samp(y, x) FROM table | Supported | Not supported |
skewness(col) | Skewness | SELECT skewness(values) FROM table | Supported | Not supported |
regr_intercept(y, x) | Linear regression intercept | SELECT regr_intercept(y, x) FROM table | Supported | Not supported |
regr_slope(y, x) | Linear regression slope | SELECT regr_slope(y, x) FROM table | Supported | Not supported |
grouping(col) | Returns 1 if column is aggregated, 0 otherwise (for use with ROLLUP/CUBE/GROUPING SETS) | SELECT grouping(category) FROM table GROUP BY ROLLUP(category) | Supported | Not supported |
collect_set(col) | Returns an array of unique values from column | SELECT collect_set(name) FROM table GROUP BY category | Supported | Not supported |
| Function | Description | Example | Furnace | Ontology |
|---|---|---|---|---|
ROW_NUMBER() | Assigns sequential row number within partition | ROW_NUMBER() OVER (ORDER BY col) | Supported | Supported |
RANK() | Rank with gaps for ties | RANK() OVER (ORDER BY salary DESC) | Supported | Supported |
DENSE_RANK() | Rank without gaps for ties | DENSE_RANK() OVER (ORDER BY salary DESC) | Supported | Supported |
PERCENT_RANK() | Percentage ranking of value in group | PERCENT_RANK() OVER (ORDER BY salary) | Supported | Not supported |
CUME_DIST() | Cumulative distribution of value | CUME_DIST() OVER (ORDER BY salary) | Supported | Not supported |
NTILE(n) | Divide rows into n buckets | NTILE(4) OVER (ORDER BY salary) | Supported | Not supported |
LAG(expr[, offset[, default]]) | Value from previous row | LAG(salary, 1) OVER (ORDER BY date) | Supported | Supported |
LEAD(expr[, offset[, default]]) | Value from next row | LEAD(salary, 1) OVER (ORDER BY date) | Supported | Supported |
FIRST_VALUE(expr) | First value in window frame | FIRST_VALUE(salary) OVER (ORDER BY date) | Supported | Supported |
LAST_VALUE(expr) | Last value in window frame | LAST_VALUE(salary) OVER (ORDER BY date) | Supported | Supported |
NTH_VALUE(expr, n) | Nth value in window frame | NTH_VALUE(salary, 2) OVER (ORDER BY date) | Supported | Supported |
| Aggregate functions | Standard aggregates (count, sum, avg, min, max, stddev, variance, etc.) can be used as window functions | SUM(amount) OVER (PARTITION BY customer_id) | Supported | Only count, sum, min max, avg |
Window functions require an OVER clause with optional PARTITION BY and ORDER BY:
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18-- examples SELECT id, name, ROW_NUMBER() OVER (ORDER BY name) as row_num FROM `/path/to/table`; SELECT category, name, price, ROW_NUMBER() OVER (PARTITION BY category ORDER BY price DESC) as rank_in_category FROM `/path/to/products`; SELECT order_id, customer_id, order_date, total_amount, SUM(total_amount) OVER (PARTITION BY customer_id ORDER BY order_date) as running_total, AVG(total_amount) OVER (PARTITION BY customer_id) as customer_avg, COUNT(*) OVER (PARTITION BY customer_id) as customer_order_count FROM `/path/to/orders`; SELECT id, amount, ROW_NUMBER() OVER w as sequence_num FROM `/path/to/transactions` WINDOW w AS (PARTITION BY customer_id ORDER BY transaction_date);
| Function | Description | Example | Furnace | Ontology |
|---|---|---|---|---|
zorder() | Iceberg Z-order optimization (Iceberg only) | CALL zorder('table', 'col1,col2') | Supported | Not supported |
| Function | Description | Example | Furnace | Ontology |
|---|---|---|---|---|
st_point(x, y) | Creates a point geometry from X, Y coordinates | SELECT st_point(-122.4, 37.8) | Supported | Not supported |
st_contains(geom1, geom2) | Returns true if geom1 contains geom2 | WHERE st_contains(boundary, location) | Supported | Not supported |
st_equals(geom1, geom2) | Returns true if geometries are spatially equal | SELECT st_equals(point1, point2) | Supported | Not supported |
st_geomfromgeojson(json) | Creates geometry from GeoJSON string | SELECT st_geomfromgeojson('{"type":"Point","coordinates":[0,0]}') | Supported | Not supported |
st_astext(geom) | Returns geometry as Well-Known Text (WKT) string | SELECT st_astext(st_point(0, 0)) | Supported | Not supported |
st_h3cellids(geom, level, fullCover) | Returns an array of H3 cells that cover the geometry | SELECT st_h3cellids(st_point(0, 0), 8, true) | Supported | Not supported |
Geometry columns cannot be returned directly. Use st_astext() to convert to WKT format.
| Operator | Description | Example | Furnace | Ontology |
|---|---|---|---|---|
+ | Plus | SELECT count() + 5 | Supported | Supported |
- | Minus | SELECT count() - 1 | Supported | Supported |
* | Times | SELECT sum(weight * 2) | Supported | Supported |
/ | Divide | SELECT (avg(a) + avg(b)) / 2 | Supported | Supported |
| Operator | Description | Example | Furnace | Ontology |
|---|---|---|---|---|
= | Equal to | WHERE id = 1 | Supported | Supported |
<>, != | Not equal to | WHERE status <> 'inactive' | Supported | Supported |
< | Less than | WHERE age < 18 | Supported | Supported |
<= | Less than or equal to | WHERE price <= 100.0 | Supported | Supported |
> | Greater than | WHERE salary > 50000 | Supported | Supported |
>= | Greater than or equal to | WHERE score >= 90 | Supported | Supported |
AND | Logical AND | WHERE active = true AND verified = true | Supported | Supported |
OR | Logical OR | WHERE status = 'new' OR status = 'pending' | Supported | Supported |
NOT | Logical NOT | WHERE NOT deleted | Supported | Supported |
IN | Value in list | WHERE category IN ('A', 'B', 'C') | Supported | Supported |
LIKE | Pattern matching | WHERE name LIKE 'John%' | Supported | Supported |
EXISTS | True if subquery returns rows | WHERE EXISTS (SELECT 1 FROM orders WHERE orders.customer_id = customers.id) | Supported | Supported |
| Expression | Description | Example | Furnace | Ontology |
|---|---|---|---|---|
CASE ... WHEN | Switch expression | SELECT CASE WHEN mark >= 70 THEN 'A' WHEN mark >= 60 THEN 'B' ELSE 'C' END | Supported | Supported |
INTERVAL | Time interval literal | SELECT order_date + INTERVAL 30 DAY, WHERE created_at > current_timestamp() - INTERVAL 1 HOUR | Supported | Not supported |
The IF(condition, trueValue, falseValue) function is not supported. Use CASE WHEN condition THEN trueValue ELSE falseValue END instead.
Copied!1 2 3 4CASE WHEN condition THEN result [WHEN condition THEN result] [ELSE result] END
Copied!1 2 3 4 5 6 7 8 9 10 11INTERVAL { yearMonthIntervalQualifier | dayTimeIntervalQualifier } yearMonthIntervalQualifier { YEAR [TO MONTH] | MONTH } dayTimeIntervalQualifier { DAY [TO { HOUR | MINUTE | SECOND } ] | HOUR [TO { MINUTE | SECOND } ] | MINUTE [TO SECOND] | SECOND }
Copied!1 2 3 4 5-- examples INTERVAL '1' YEAR INTERVAL '6' MONTH INTERVAL '25' DAY INTERVAL '6' YEAR TO MONTH
| Type | Spark equivalent | Furnace | Ontology |
|---|---|---|---|
BOOLEAN | BOOLEAN | Supported | Supported |
SHORT | SHORT | Supported | Supported |
INT | INT | Supported | Supported |
LONG | LONG | Supported | Supported |
FLOAT | FLOAT | Supported | Supported |
DOUBLE | DOUBLE | Supported | Supported |
DATE | DATE | Supported | Supported |
TIMESTAMP | TIMESTAMP | Supported | Supported |
TIMESTAMP_NTZ | TIMESTAMP_NTZ | Supported | Supported |
STRING | STRING | Supported | Supported |
BINARY | BINARY | Supported | Not supported |
DECIMAL | DECIMAL | Supported | Supported |
MAP<key, value> | MAP<key, value> | Supported | Not supported |
ARRAY<element_type> | ARRAY<element_type> | Supported | Supported |
STRUCT<field1_name: field1_type, field2_name: field2_type, …> | STRUCT<field1_name: field1_type, field2_name: field2_type, …> | Supported | Not supported |
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17CREATE TABLE `/path/to/table` ( col1 BOOLEAN, col2 SHORT, col3 INT, col4 LONG, col5 FLOAT, col6 DOUBLE, col7 DATE, col8 TIMESTAMP, col9 TIMESTAMP_NTZ, col10 STRING, col11 BINARY, col12 DECIMAL(10, 2), col13 MAP<STRING, STRING>, col14 ARRAY<STRING>, col15 STRUCT<col1: STRING, col2: STRING> );