Register the wrapped compute function as a DataFrame transform.
The transform_df decorator is used to construct a Transform object from a compute function that accepts and returns pyspark.sql.DataFrame ↗ objects. Similar to the transform() decorator, the input names become the compute function’s parameter names. However, a transform_df accepts only a single Output spec as a positional argument. The return value of the compute function is also a DataFrame ↗ that is automatically written out to the single output dataset.
Copied!1 2 3 4 5 6 7 8>>> @transform_df( ... Output('/path/to/output/dataset'), # An unnamed Output spec ... first_input=Input('/path/to/first/input/dataset'), ... second_input=Input('/path/to/second/input/dataset'), ... ) ... def my_compute_function(first_input, second_input): ... # type: (pyspark.sql.DataFrame, pyspark.sql.DataFrame) -> pyspark.sql.DataFrame ... return first_input.union(second_input)