Transformstransforms-pythontransforms.apilightweight

transforms.api.lightweight

transforms.api.lightweight(_maybe_function=None, *, cpu_cores=2, memory_mb=None, memory_gb=None, gpu_type=None, container_image=None, container_tag=None, container_shell_command=None)

Deprecated since version 3.85.0:: Use transforms.api.transform.using() instead.

Turn a transform into a lightweight transform.

In order to use this decorator, foundry-transforms-lib-python must be added as a dependency.

A lightweight transform is a transform that runs without Spark, on a single node. Lightweight transforms are faster and more cost-effective for small to medium-sized datasets. Lightweight transforms also provide more methods for accessing datasets; however, they only support a subset of the API of regular transforms, including pandas and the filesystem API. For more information, see the Python transforms documentation ↗.

Parameters:
- cpu_cores (float ↗ , optional) – The number of CPU cores to request for the transform’s container. Can be a fraction.
- memory_mb (float ↗ , optional) – The amount of memory to request for the container, in MB.
- memory_gb (float ↗ , optional) – The amount of memory to request for the container, in GB. The default is 16 GB.
- gpu_type (str ↗ , optional) – The type of GPU to allocate for the transform.
- container_image (str ↗ , optional) – The image to use for the transform’s container.
- container_tag (str ↗ , optional) – The image tag to use for the transform’s container.
- container_shell_command (str ↗ , optional) – The shell command to execute inside the container after it has started. When left unspecified, a default command is generated resulting in executing the decorated transform. The default values is available through the decorated transform’s default_container_entrypoint property.

Notes

Either memory_gb or memory_mb can be specified, but not both.

In case any of container_image, container_tag or container_shell_command is set, both container_image and container_tag must be set. If container_shell_command is not set, a default entrypoint will be used, which will bootstrap a Python environment and execute the user code specified in the transform.

Specifying the container_* arguments is referred to as a bring-your-own-container (BYOC) workflow. In this case, the main guarantees are that all files from the user’s code repository will be available inside $USER_WORKING_DIR/user_code at runtime, and that a Python environment will be available as well.

The container_image must be available from an artifacts-backing repository of the code repository. For more details, refer to the BYOC documentation ↗.

An example of a valid container_image’s Dockerfile is shown below:

Copied!1
2
3
4
5
6
FROM ubuntu:latest

RUN apt update && apt install -y coreutils curl sed

RUN useradd --uid 5001 user
USER 5001

Examples

Copied!1
2
3
4
5
6
7
>>> @lightweight
... @transform(
...     my_input=Input('/input'),
...     my_output=Output('/output')
... )
... def compute_func(my_input, my_output):
...     my_output.write_pandas(my_input.pandas())

Copied!1
2
3
4
5
6
7
8
9
10
>>> @lightweight()
... @transform(
...    my_input=Input('/input'),
...    my_output=Output('/output')
... )
... def compute_func(my_input, my_output):
...     for file in my_input.filesystem().ls():
...         with my_input.filesystem().open(file.path) as f1:
...             with my_output.filesystem().open(file.path, "w") as f2:
...                 f2.write(f1.read())

Copied!1
2
3
4
5
6
7
>>> @lightweight(cpu_cores=8, memory_gb=3.5, gpu_type='NVIDIA_T4')
... @transform_pandas(
...     Output('/output'),
...     my_input=Input('/input')
... )
... def compute_func(my_input):
...     return my_input

Copied!1
2
3
4
5
>>> @lightweight(container_image='my-image', container_tag='0.0.1')
... @transform(my_output=Output('ri...my_output'))
... def run_data_generator_executable(my_output):
...     os.system('$USER_WORKING_DIR/data_generator')
...     my_output.write_table(pd.read_csv('data.csv'))

←

PREVIOUSIntegerParam

NEXTLightweightContext

→