Virtual tables allow you to query tables in supported data platforms without first storing the data in a Foundry dataset.
A virtual table acts as a pointer to a table in a source system outside of Foundry. Virtual tables abstract away the underlying source system and storage formats, enabling you to build workflows that combine data from different source systems seamlessly. Virtual tables can also be combined with datasets stored in Foundry as part of a flexible architecture where data need not be consolidated in one place.
A virtual table is defined by:
As with any resource in Foundry, virtual tables are governed by Foundry's security and permissions model and can be opened or used in various Foundry applications.
The following sources support virtual tables. Refer to the source documentation for more details on how to configure the connection as well as the supported capabilities.
Source | Status | Supported Formats | Manual Registration | Automatic Registration |
---|---|---|---|---|
Amazon S3 | 🟢 Generally available | Avro ↗, Delta ↗, Iceberg ↗, Parquet ↗ | ✔️ | |
Azure Data Lake Storage Gen2 (Azure Blob Storage) | 🟢 Generally available | Avro ↗, Delta ↗, Iceberg ↗, Parquet ↗ | ✔️ | |
BigQuery | 🟢 Generally available | Table, View, Materialized View | ✔️ | ✔️ |
Google Cloud Storage | 🟢 Generally available | Avro ↗, Delta ↗, Iceberg ↗, Parquet ↗ | ✔️ | |
Snowflake | 🟢 Generally available | Table, View, Materialized View | ✔️ | ✔️ |
An Iceberg catalog is required to load virtual tables backed by an Apache Iceberg table. To learn more about Iceberg catalogs, see the Apache Iceberg documentation ↗. Virtual tables support different catalog options depending on the source being used. The table below highlights the supported catalogs. Refer to the source documentation for more details on how to configure each catalog.
Source | AWS Glue | Object Storage | Unity Catalog |
---|---|---|---|
Amazon S3 | 🟢 Generally available | 🟢 Generally available | 🟢 Generally available |
Azure Data Lake Storage Gen2 (Azure Blob Storage) | 🔴 Not available | 🟢 Generally available | 🟢 Generally available |
Google Cloud Storage | 🔴 Not available | 🟢 Generally available | 🔴 Not available |
Virtual tables are supported as inputs in the following applications and workflows:
Supported application | Supported workflow | Not supported |
---|---|---|
Data Connection | Configure source Register virtual tables | Agent-based connections |
Contour | Analyze in Contour | Save as dataset |
Ontology | Object creation via Pipeline Builder | Object creation via Ontology Manager |
Data Lineage | View Foundry lineage | |
Pipeline Builder | Input to pipeline Object & dataset output Snapshot builds Incremental builds (append-only) | Streaming builds |
Code Repositories | Python transforms Snapshot builds Incremental builds (append-only) | Java transforms SQL transforms |
Note that some source types may not support all these capabilities. Refer to the source-specific documentation for more details. Learn more about how to configure a source when using virtual tables in Code Repositories.
In general, virtual tables can be used to back most common Foundry workflows by either:
Sources supporting virtual tables are set up in the Data Connection application. Select the source that you want to use, then navigate to the Virtual tables tab in the source configuration. Follow the source-documentation and any requirements listed there for using virtual tables.
All sources support manual registration, which lets you register individual tables from the source system in Foundry. Some sources additionally support automatic registration, which will periodically register all tables in the source that are accessible to the configured credentials in a designated project.
When using manual registration, you can select Create virtual table, browse available tables in the source system, and select individual tables to register. Unless you choose a different location, these will be registered into the Foundry location configured in the source's connection settings.
When enabling auto-registration, you create a new Foundry project where virtual tables will be created automatically. The folder hierarchy in this project will mirror the structure of the source system, and be periodically updated as new tables are created in the source. When source tables are deleted, related virtual tables won't be auto-deleted in the project, but accessing them won't load any data.
To enable auto-registration, you must have project creation permissions in Foundry.
The project is managed by Foundry, and users cannot manually create or update resources in it. Virtual tables registered in this project can be imported into other projects for use in workflow development.
Enabling auto-registration allows setting permissions and access to the project, which can later be managed by the project owner using the access sidebar.
When virtual tables are used in Code Repositories, the transforms consuming them will automatically obtain network egress based on the egress policies configured on the source. The credentials configured on the source will necessarily be made available to connect to the source. This is similar behavior to External Transforms.
The following settings must be enabled on the source:
Once a source has been configured and imported into a code repository, virtual tables can be used as inputs to a Python Transform in the same way a dataset would be used, using transforms.api.Input
. Incremental computation has a consistent API to that of datasets and is supported by a subset of sources. Refer to the source-specific documentation for more information.
The decision to use virtual tables vs. sync to Foundry datasets depends on your architecture goals and the target workflow to be supported. We recommend considering the appropriate integration pattern on a workflow-by-workflow basis. The two approaches can be used in conjunction to complement one another.
Below are some considerations to keep in mind about the potential benefits, drawbacks, and limitations of using virtual tables vs. syncing data to datasets.
Virtual tables provide a number of benefits, including:
Virtual tables may not be the best choice in all circumstances. Some considerations include:
Limitations of virtual tables include:
For queries run directly on virtual tables, compute may be split between Foundry and the source system. The specific behavior depends on the query and the degree of pushdown computation supported by the source system. Refer to the source-specific documentation for more information.