Transformstransforms-pythontransforms.apitransform_df

transforms.api.transform_df

transforms.api.transform_df(output, **inputs)

The transform_df decorator is used to construct a Transform object from a compute function that accepts and returns pyspark.sql.DataFrame ↗ objects. Similar to the transform() decorator, the input names become the compute function’s parameter names. However, a transform_df accepts only a single Output spec as a positional argument. The return value of the compute function is also a DataFrame ↗ that is automatically written out to the single output dataset.

Copied!1
2
3
4
5
6
7
8
>>> @transform_df(
...     Output('/path/to/output/dataset'),  # An unnamed Output spec
...     first_input=Input('/path/to/first/input/dataset'),
...     second_input=Input('/path/to/second/input/dataset'),
... )
... def my_compute_function(first_input, second_input):
...     # type: (pyspark.sql.DataFrame, pyspark.sql.DataFrame) -> pyspark.sql.DataFrame
...     return first_input.union(second_input)

Parameters:
- output (Output) – The single Output spec for the transform.
- **inputs (Input) – kwargs comprised of named Input specs.

←

PREVIOUSTableTransformInput

NEXTtransform_pandas

→