Search documentation
karat

+

K

User Documentation ↗

transforms.api.LightweightOutput

class transforms.api.LightweightOutput(alias, rid, branch=None)

The output object passed to user code at runtime.

Its aim is to mimic a subset of the transforms.api.TransformOutput API, while providing access to the underlying foundry.transforms.Dataset.

property alias

The alias of the dataset this parameter is associated with.

arrow()

A PyArrow table containing the full view of the dataset.

property branch

The branch of the dataset this parameter is associated with.

dataframe()

A pandas DataFrame containing the full view of the dataset.

filesystem()

Access the filesystem.

Construct a FoundryDataSidecarFileSystem object for accessing the dataset’s files directly.

pandas()

A pandas DataFrame containing the full view of the dataset.

path()

Download the dataset’s underlying files and return a path to them.

property path_for_object_store_write_table

Returns a virtual object store path to a bucket that will be mapped into the output transaction. This does not point directly at a bucket in cloud storage, but rather at a local S3 proxy to allow query engines to perform async, optimized IO against the data.

property path_for_write_table

Return the path for the dataset’s files to be used with write_table.

polars(lazy=False)

A Polars DataFrame or LazyFrame containing the full view of the dataset.

  • Parameters: lazy (bool , optional) – Whether to return a LazyFrame or DataFrame. Defaults to False.

put_metadata(column_descriptions=None)

Method to finalize a dataset after uploading raw Parquet files. This will infer and upload a Foundry Schema from the uploaded Parquet (overwriting it if it already exists), and update column description metadata on the dataset.

This method must be called after one or more Parquet files have been uploaded to the output dataset so that a schema can be inferred. This method will throw an error if it is called before a successful file upload.

  • Parameters: column_descriptions (Dict [str , str ] , optional) – Map of column names to their string descriptions. This map is intersected with the columns of the DataFrame, and must include descriptions no longer than 800 characters.

read_unstaged_dataset_as_polars_lazy()

Read the local version of the dataset as a Polars LazyFrame.

This method is used when computing expectations on the dataset. It must happen before the dataset is committed, since expectations can abort the build if failed.

property rid

The unique resource identifier of the dataset this parameter is associated with.

set_mode(mode)

Set the mode for the output dataset.

  • Parameters: mode (str) –

    The write mode, one of replace, modify, or append. In modify mode, anything written is appended to the dataset, this may also override existing files. In append mode, anything written is appended to the dataset, and will not override existing files. In replace mode, anything written replaces the dataset.

    The write mode cannot be changed after data is written.

property transaction_rid

The transaction on the output dataset.

write_dataframe(df, column_description=None, column_descriptions=None)

Write a DataFrame of any supported type to the dataset.

For compatibility reasons, both column_description and column_descriptions are accepted. However, only one of them can be provided at the same time.

  • Parameters:
    • dfpd.DataFrame, pa.Table, pl.DataFrame, pl.LazyFrame, or pathlib.Path with the data to upload.
    • column_description (Dict [str , str ] , optional) – Deprecated, use column_descriptions instead.
    • column_descriptions (Dict [str , str ] , optional) – Map of column names to their string descriptions. This map is intersected with the columns of the DataFrame, and must include descriptions no longer than 800 characters.
  • Returns: None

write_pandas(df, column_description=None, column_descriptions=None)

Write the given pandas.DataFrame to the dataset.

For compatibility reasons, both column_description and column_descriptions are accepted. However, only one of them can be provided at the same time.

write_table(df, column_description=None, column_descriptions=None)

Write a pandas DataFrame, Arrow Table, Polars DataFrame or LazyFrame, to a Foundry dataset.

This has three operations: uploading the df itself to the dataset, inferring a schema and putting it to the dataset (overwriting it if it already exists), and updating column description metadata. To update only the metadata without uploading data, use put_metadata() instead.

For compatibility reasons, both column_description and column_descriptions are accepted. However, only one of them can be provided at the same time.

  • Parameters:
    • dfpd.DataFrame, pa.Table, pl.DataFrame, pl.LazyFrame, duckdb.DuckDBPyRelation or pathlib.Path with the data to upload, or None to just infer a schema from data previously written in the transaction.
    • column_description (Dict [str , str ] , optional) – Deprecated, use column_descriptions instead.
    • column_descriptions (Dict [str , str ] , optional) – Map of column names to their string descriptions. This map is intersected with the columns of the DataFrame, and must include descriptions no longer than 800 characters.
  • Returns: None