Virtual tables

Virtual tables allow you to query tables in supported data platforms without first storing the data in a Foundry dataset.

A virtual table acts as a pointer to a table in a source system outside of Foundry. Virtual tables abstract away the underlying source system and storage formats, enabling you to build workflows that combine data from different source systems seamlessly. Virtual tables can also be combined with datasets stored in Foundry as part of a flexible architecture where data need not be consolidated in one place.

Virtual tables diagram

A virtual table is defined by:

  • A connection to the source storage system (for example, a source URL or credentials). This connection is established by setting up a source in Foundry's data connection application.
  • A locator which identifies the table in the source system (for example, the database, schema, and table name).

As with any resource in Foundry, virtual tables are governed by Foundry's security and permissions model and can be opened or used in various Foundry applications.

Supported sources

The following sources support virtual tables. Refer to the source documentation for more details on how to configure the connection as well as the supported capabilities.

SourceStatusSupported FormatsManual RegistrationAutomatic Registration
Amazon S3🟢 Generally availableAvro ↗, Delta ↗, Iceberg ↗, Parquet ↗✔️
Azure Data Lake Storage Gen2 (Azure Blob Storage)🟢 Generally availableAvro ↗, Delta ↗, Iceberg ↗, Parquet ↗✔️
BigQuery🟢 Generally availableTable, View, Materialized View✔️✔️
Google Cloud Storage🟢 Generally availableAvro ↗, Delta ↗, Iceberg ↗, Parquet ↗✔️
Snowflake🟢 Generally availableTable, View, Materialized View✔️✔️

Iceberg catalogs

An Iceberg catalog is required to load virtual tables backed by an Apache Iceberg table. To learn more about Iceberg catalogs, see the Apache Iceberg documentation ↗. Virtual tables support different catalog options depending on the source being used. The table below highlights the supported catalogs. Refer to the source documentation for more details on how to configure each catalog.

SourceAWS GlueObject StorageUnity Catalog
Amazon S3🟢 Generally available🟢 Generally available🟢 Generally available
Azure Data Lake Storage Gen2 (Azure Blob Storage)🔴 Not available🟢 Generally available🟢 Generally available
Google Cloud Storage🔴 Not available🟢 Generally available🔴 Not available

Supported Foundry workflows

Virtual tables are supported as inputs in the following applications and workflows:

Supported applicationSupported workflowNot supported
Data ConnectionConfigure source
Register virtual tables
Agent-based connections
ContourAnalyze in ContourSave as dataset
OntologyObject creation via Pipeline BuilderObject creation via Ontology Manager
Data LineageView Foundry lineage
Pipeline BuilderInput to pipeline
Object & dataset output
Snapshot builds
Incremental builds (append-only)
Streaming builds
Code RepositoriesPython transforms

Snapshot builds
Incremental builds (append-only)
Java transforms
SQL transforms

Note that some source types may not support all these capabilities. Refer to the source-specific documentation for more details. Learn more about how to configure a source when using virtual tables in Code Repositories.

In general, virtual tables can be used to back most common Foundry workflows by either:

  • Directly interacting with the virtual table as described above, or
  • Creating a transformation pipeline backed by a virtual table that outputs Foundry datasets or objects. These outputs can be used as normal in the platform.

Set up a connection for a virtual table

Sources supporting virtual tables are set up in the Data Connection application. Select the source that you want to use, then navigate to the Virtual tables tab in the source configuration. Follow the source-documentation and any requirements listed there for using virtual tables.

virtual table registration

Manual vs. auto-registration for virtual tables

All sources support manual registration, which lets you register individual tables from the source system in Foundry. Some sources additionally support automatic registration, which will periodically register all tables in the source that are accessible to the configured credentials in a designated project.

When using manual registration, you can select Create virtual table, browse available tables in the source system, and select individual tables to register. Unless you choose a different location, these will be registered into the Foundry location configured in the source's connection settings.

virtual table manual registration

When enabling auto-registration, you create a new Foundry project where virtual tables will be created automatically. The folder hierarchy in this project will mirror the structure of the source system, and be periodically updated as new tables are created in the source. When source tables are deleted, related virtual tables won't be auto-deleted in the project, but accessing them won't load any data.

To enable auto-registration, you must have project creation permissions in Foundry.

Virtual table auto registration screen

The project is managed by Foundry, and users cannot manually create or update resources in it. Virtual tables registered in this project can be imported into other projects for use in workflow development.

Enabling auto-registration allows setting permissions and access to the project, which can later be managed by the project owner using the access sidebar.

Resources screenshot showing virtual table project

Virtual tables in Code Repositories

When virtual tables are used in Code Repositories, the transforms consuming them will automatically obtain network egress based on the egress policies configured on the source. The credentials configured on the source will necessarily be made available to connect to the source. This is similar behavior to External Transforms.

The following settings must be enabled on the source:

  1. Code imports: This allows the source to be imported and used in a code repository. Further details of this setting and how to enable it can be found here.
  2. Export controls: This controls which security markings and organizations will be allowed on inputs to a Python Transform using a virtual table. Further details of this setting and how to enable it can be found here.

Once a source has been configured and imported into a code repository, virtual tables can be used as inputs to a Python Transform in the same way a dataset would be used, using transforms.api.Input. Incremental computation has a consistent API to that of datasets and is supported by a subset of sources. Refer to the source-specific documentation for more information.

Using virtual tables vs syncing to datasets

The decision to use virtual tables vs. sync to Foundry datasets depends on your architecture goals and the target workflow to be supported. We recommend considering the appropriate integration pattern on a workflow-by-workflow basis. The two approaches can be used in conjunction to complement one another.

Below are some considerations to keep in mind about the potential benefits, drawbacks, and limitations of using virtual tables vs. syncing data to datasets.

Benefits of using virtual tables

Virtual tables provide a number of benefits, including:

  • Reduction of duplicate storage by not storing source data in Foundry. Note that Foundry will still store data for any downstream-created resources, such as datasets and objects that are outputs from Foundry pipelines.
  • Queries can be pushed down to the source system to limit total data transfer. Note that availability of pushdown compute varies by source system and query type.
  • Virtual tables may be especially beneficial for very large tables where duplicative storage costs become material.
  • With virtual tables, data is queried directly upon use, without the need to synchronize data or consider potential for data staleness.
  • Virtual tables provide optionality to help align Foundry implementation with target architecture patterns.

Drawbacks of using virtual tables

Virtual tables may not be the best choice in all circumstances. Some considerations include:

  • Interactive performance may be slower than working with data stored in Foundry datasets.
  • Compute usage may increase depending on the types of queries being run on the virtual table. For example, tables that are used as an input into a scheduled pipeline may generate limited compute compared to tables that are frequently accessed interactively in Contour analyses.
  • Virtual tables do not benefit from Foundry dataset capabilities such as dataset versioning or branching.

Limitations of using virtual tables

Limitations of virtual tables include:

  • Virtual tables are not available for all sources.
  • Virtual tables require a direct connection to the source. Connections using an agent are not supported.
  • Not all Foundry applications and features support using virtual tables as inputs. However, any materialized resources created downstream of virtual tables, such as datasets and object outputs from pipelines, are fully supported across the Foundry ecosystem as usual.

Compute for queries on virtual tables

For queries run directly on virtual tables, compute may be split between Foundry and the source system. The specific behavior depends on the query and the degree of pushdown computation supported by the source system. Refer to the source-specific documentation for more information.