Connect Foundry to Snowflake to read and sync data between Snowflake and Foundry.
Capability | Status |
---|---|
Exploration | ๐ข Generally available |
Bulk import | ๐ข Generally available |
Incremental | ๐ข Generally available |
Virtual tables | ๐ข Generally available |
Compute pushdown | ๐ข Generally available |
Export tasks | ๐ก Sunset |
Table Exports | ๐ก Beta |
Learn more about setting up a connector in Foundry.
Option | Required? | Description |
---|---|---|
Account identifier | Yes | This is the identifier that precedes ".snowflakecomputing.com". See Snowflake's official documentation โ for more details. |
Roles | No | This is the default role to be used by the connection in case the credentials provided have access to multiple roles. |
Database | Yes | Specify a default database to use once connected. |
Schema | No | Option to specify a default schema to use once connected. If not specified, all schemas will be available that are in-scope of the credentials. |
Warehouse | No* | The virtual warehouse to use once connected. In the case of registered virtual tables, this will be used for any source-side compute. |
Credentials | Yes | Option 1: Username and password Provide a username and password. We recommend the use of service credentials rather than individual user credentials. Option 2: Key pair authentication Provide a username and private key. See Snowflake's official documentation โ for details on configuring key pair authentication. Note that only unencrypted private keys are supported. Foundry will encrypt and store the private key securely. Option 3: External OAuth (OIDC) Follow the displayed source system configuration instructions to set up External OAuth. See Snowflakeโs official documentation โ for details on External OAuth and our documentation for details on how OIDC works with Foundry. For all credential options, ensure that the provided user and role has usage privileges on the target database(s) and schema(s), as well as select privileges on the target table(s). When registering virtual tables, the user and their role should also have usage privileges on the warehouse. |
Network Connectivity | Yes** | Run SELECT SYSTEM$ALLOWLIST() in Snowflake and ensure that at least the entries for SNOWFLAKE_DEPLOYMENT and STAGE are added as egress policies in Foundry. Refer to the networking section below for more details. |
* Warehouse details are optional for syncing Foundry datasets, but required for registering virtual tables.
** Network egress policies are required for direct connections, but not for agent-based connections.
To enable direct connections between Snowflake and Foundry, the appropriate egress policies must be added when setting up the source in the Data Connection application.
To identify the hostnames and port numbers of your Snowflake account to be allowlisted, you can run the following command in your Snowflake console. Ensure that at least the entries for SNOWFLAKE_DEPLOYMENT
and STAGE
are added as egress policies in Foundry.
Copied!1 2 3 4
SELECT t.VALUE:type::VARCHAR as type, t.VALUE:host::VARCHAR as host, t.VALUE:port as port FROM TABLE(FLATTEN(input => PARSE_JSON(SYSTEM$ALLOWLIST()))) AS t;
See Snowflake's official documentation โ for additional information on identifying hostnames and port numbers to allowlist.
Connections from Foundry to Snowflake normally come from the default public gateway IPs for your environment. However, traffic within the same cloud provider (for example, AWS-AWS or Azure-Azure) may use different routing, and require establishing a connection via PrivateLink. See below for the additional setup required per cloud provider, or contact your Palantir representative for additional guidance.
If your Snowflake instance is configured to route internal S3 stage traffic through a VPCE โ, the Snowflake JDBC driver must be manually configured to not use the custom VPCE domain. Otherwise, the driver will be routed to the custom VPCE domain (which is inaccessible from Foundry's VPC) and will fail connections to URLs with the format of <bucketname>.bucket.vpce-<vpceid>.s3.<region>.vpce.amazonaws.com
.
You can manually configure this by adding a JDBC connection property in the Connection details of your instance, with a key of S3_STAGE_VPCE_DNS_NAME
and an empty value field (the equivalent of setting it to null
).
The S3 stage traffic will then be routed through the AWS S3 Gateway Endpoint (<bucketname>.bucket.s3.<region>.vpce.amazonaws.com
) which maintains private connectivity so traffic will not be routed through the public internet.
Review our PrivateLink egress documentation โ for more information.
For egress policies that depend on an S3 bucket in the same region as your Foundry instance, ensure you have completed the additional configuration steps detailed in our Amazon S3 bucket policy documentation for the affected bucket(s).
The Snowflake JDBC driver used for the Foundry Snowflake connector may attempt to connect directly to an underlying โinternal stageโ storage bucket when fetching data. For Snowflake hosted on Azure, because Azure-hosted Foundry enrollments route traffic over Azure service endpoints, network connectivity from Foundry to the underlying stage buckets must be explicitly allow-listed by following the instructions below.
You will need the following information about your Azure-hosted Snowflake warehouse to establish network connectivity to Foundry:
Use the SYSTEM$ALLOWLIST
command to get the full list of domains that may be required to successfully connect.
For the Azure storage bucket returned from the SYSTEM$ALLOWLIST
command, you will also need to retrieve the storage account identifier.
ENABLE_INTERNAL_STAGES_PRIVATELINK
โ parameter to TRUE
for the account.SYSTEM$GET_PRIVATELINK_CONFIG()
โ function, which returns a field called privatelink-internal-stage
containing the Azure storage account resource identifier.
A full Azure Storage account resource identifier will be in the following format:
/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.Storage/storageAccounts/{storageAccountName}
More information on how to find an Azure Storage account resource ID directly in the Azure console can be found in the Azure documentation โ. Restricting cross-account network traffic using VNET rules and the storage account identifier is in line with Microsoftโs published best practices โ, and should be used for all connections to Azure-hosted Snowflake warehouses from within Azure compute instances.
Now that you have gathered the required information about your Snowflake warehouse, you can create the required policies needed to enable Foundry access to your Snowflake data.
Create a standard egress policy for the Azure storage internal stage, and attach it to your Snowflake source.
SYSTEM$ALLOWLIST
command, and not just the storage bucket domain.Create an Azure storage policy, pasting in the storage account resource identifier.
Navigate back to your Snowflake source in Data Connection and confirm you can explore the source and run syncs.
This section provides additional details around using virtual tables with a Snowflake source. This section is not applicable when syncing to Foundry datasets.
The table below highlights the virtual table capabilities that are supported for Snowflake.
Capability | Status |
---|---|
Bulk registration | ๐ก Beta |
Automatic registration | ๐ข Generally available |
Table inputs | ๐ข Generally available: tables, views, materialized views in Code Repositories, Pipeline Builder |
Table outputs | ๐ข Generally available: tables in Code Repositories, Pipeline Builder |
Incremental pipelines | ๐ข Generally available: APPEND only [1] |
Compute pushdown | ๐ข Generally available |
Consult the virtual tables documentation for details on the supported Foundry workflows where Snowflake tables can be used as inputs or outputs.
[1] To enable incremental support for pipelines backed by Snowflake virtual tables, ensure that Change Tracking โ and Time Travel โ are enabled for the appropriate retention period. This functionality relies on CHANGES โ The current
and added
read modes in Python Transforms are supported. These will expose the relevant rows of the change feed based on the METADATA$ACTION
column. The METADATA$ACTION
, METADATA$ISUPDATE
, METADATA$ROW_ID
columns will be made available in Python Transforms.
When using virtual tables, remember the following source configuration requirements:
See the Connection Details section above for more details.
Foundry offers the ability to push down compute to Snowflake when using virtual tables. Virtual table inputs leverage the Snowflake Spark connector โ which has built-in support for predicate pushdown.
When using Snowflake virtual tables registered to the same source as inputs and outputs to a pipeline, it is possible to fully federate compute to Snowflake. This feature is currently available in Python transforms. See the Python documentation for details on how to push down compute to Snowflake.
Note that columns of type array
โ, object
โ, and variant
โ will be parsed by Foundry as type string
. This is due to the source's variable typing.
For example, the Snowflake array [ 1, 2, 3 ]
would be interpreted by Foundry as the string "[1,2,3]"
.
See Snowflake's official documentation โfor more details.