Documentation

Data connectivity & integrationPythonVirtual tables and compute pushdownOverview

Virtual tables overview

Virtual tables allow you to query and write to tables in supported data platforms without storing the data in Foundry.

You can interact with tables in Python transforms using the transforms-tables library.

Prerequisites

To use virtual tables in a Python transform, you must:

Upgrade your Python repository to the latest version.
Install transforms-tables from the Libraries tab.

Transforms that use the use_external_systems decorator are not compatible with virtual tables. Switch to source-based external transforms or split your transform into multiple transforms, one that uses virtual tables and one that uses the use_external_systems decorator.

When virtual tables are used in Code Repositories, the transforms consuming them will automatically obtain network egress based on the egress policies configured on the source.

The following settings must be enabled on the source:

Code imports: This allows the source to be imported and used in a code repository. Further details of this setting can be found in documentation on importing a source into code.
Export controls: This controls which security markings and organizations will be allowed on inputs to a Python transform using a virtual table. Review documentation on configuring export controls on the source.

Tables used as inputs and outputs in a Python transform do not need to come from the same source, or even the same platform. By default, Python transforms will use Foundry's compute to read and write tables which allows querying across different external systems. Refer to the Compute pushdown section of this documentation for details on fully federating compute to an external system.

API overview

The transforms-tables library provides TableInput and TableOutput parameters to interact with virtual tables in Python transforms. These behave in a similar way to the Input and Output parameters used when using Foundry datasets. However, writing to a virtual table requires additional configuration to specify the source and the location within the external system where the table will be stored. The virtual table will be created during checks, as with dataset outputs. Once created, the extra configuration for the source and table metadata can be removed from the TableOutput and replaced with the table RID to be more concise.

Copied!1
2
3
4
5
6
7
8
9
10
11
12
13
from transforms.api import transform
from transforms.tables import TableInput, TableOutput, TableTransformInput, TableTransformOutput, SnowflakeTable

@transform(
    source_table=TableInput(alias: str),
    output_table=TableOutput(
        alias: str,  # the Compass path where the virtual table should be registered
        source: str,  # the source RID where the output table should be written
        table: Table,  # the locator defining where the output table should be stored in the external system
    ),
)
def compute(source_table: TableTransformInput, output_table: TableTransformOutput):
    ...  # normal transforms API

The table: Table parameter in TableOutput defines where the output table should be created in the external system.

Source and location cannot be changed after creation

Once a virtual table has been created, it is not possible to change the source or location. Modifying the source or location will cause checks to fail.

The available Table subclasses are:

BigQueryTable(project: str, dataset: str, table: str)
DatabricksTable(catalog: str, schema: str, table: str, format: Optional[str], location: Optional[str])
DeltaTable(path: str)
FilesTable(path: str, format: str)
IcebergTable(table: str, warehouse_path: Optional[str])
SnowflakeTable(database: str, schema: str, table: str)

You must use the appropriate class based on the type of source you are connecting to. Refer to the documentation below for more information.

`transforms.tables.BigQuery`

Configures an output table in Google BigQuery ↗.

Constructor:

Copied!1
BigQueryTable(project: str, dataset: str, table: str)

Parameter	Type	Description	Optional
`project`	`str`	The Google Cloud project ID where the BigQuery dataset resides.	No
`dataset`	`str`	The name of the BigQuery dataset containing the table.	No
`table`	`str`	The name of the BigQuery table.	No

`transforms.tables.DatabricksTable`

Configures an output table in Databricks ↗. Note that writing to tables in Databricks requires external access in Unity Catalog to be set up. Review the Databricks section of the virtual tables documentation for more information.

Constructor:

Copied!1
2
3
4
5
6
7
DatabricksTable(
    catalog: str,
    schema: str,
    table: str,
    format: Optional[str],
    location: Optional[str] = None
)

Parameter	Type	Description	Optional
`catalog`	`str`	The Databricks catalog name.	No
`schema`	`str`	The schema (database) name within the catalog.	No
`table`	`str`	The table name.	No
`format`	`Optional[str]`	The file format (`delta`, `iceberg`). Defaults to `delta`.	Yes
`location`	`Optional[str]`	The storage location for an external table (for example, `abfss://<bucket-path>/<table-directory>`).	Yes

The following types of table can be written to in Databricks:

External Delta table: The location parameter should specify the directory in cloud storage where the table should be stored. The format parameter defaults to delta so is not strictly required. Refer to the official Databricks documentation ↗ for more information on external tables.
Managed Iceberg table: The format parameter should be set to iceberg. The location where the table will be stored is determined by Unity Catalog, so the location parameter is not required. Refer to the official Databricks documentation ↗ for more information on managed Iceberg tables.

`transforms.tables.DeltaTable`

Configures an output table in Delta Lake ↗.

Constructor:

Copied!1
DeltaTable(path: str)

Parameter	Type	Description	Optional
`path`	`str`	The storage path to the Delta table.	No

Can be used with Azure Blob Filesystem, Google Cloud Storage, or Amazon S3 sources.

`transforms.tables.FilesTable`

Configures an output table stored in Avro, CSV, or Parquet format in a cloud storage location.

Constructor:

Copied!1
FilesTable(path: str, format: str)

Parameter	Type	Description	Optional
`path`	`str`	The path to the folder .	No
`format`	`str`	The file format (`avro`, `csv`, `parquet`).	No

Can be used with Azure Blob Filesystem, Google Cloud Storage, or Amazon S3 sources.

`transforms.tables.IcebergTable`

Configures an output table in an Apache Iceberg ↗ catalog.

Constructor:

Copied!1
IcebergTable(table: str, warehouse_path: Optional[str] = None)

Parameter	Type	Description	Optional
`table`	`str`	The full table identifier (for example, db.table or catalog notation).	No
`warehouse_path`	`Optional[str]`	The warehouse storage path for the Iceberg table.	Yes

Refer to the Iceberg catalogs section of this documentation for more information on supported sources.

`transforms.tables.SnowflakeTable`

Configures an output table in Snowflake ↗.

Constructor:

Copied!1
SnowflakeTable(database: str, schema: str, table: str)

Parameter	Type	Description	Optional
`database`	`str`	The Snowflake database name.	No
`schema`	`str`	The schema name within the database.	No
`table`	`str`	The table name.	No

File template configuration wizard [Beta]

Beta

Virtual table outputs in the file template configuration wizard are in the beta phase of development. Functionality may change during active development.

Virtual table inputs and outputs can be configured in the Code Repositories file template configuration wizard using the virtual table template variable type. When creating virtual table outputs, the wizard will walk you through selecting an output source to write to, along with a Foundry location for the virtual table.

Configuring a virtual table output in the "Output table source" dialog.

Compute pushdown

Tables backed by a BigQuery, Databricks, or Snowflake connection can push Foundry authored transforms to BigQuery, Databricks, or SnowflakeDB. Known as "compute pushdown", this allows for the use of Foundry's pipeline management, data lineage, and security functionality on top of data warehouse compute. Use virtual table inputs and outputs to push down compute.

Compute pushdown architecture diagram.

View source-specific compute pushdown documentation with code examples here:

←

PREVIOUSLightweight transforms / Examples of Lightweight transforms

NEXTBigQuery compute pushdown

→