You may want to apply custom Spark properties to your Transforms jobs.
To apply the Spark properties to a specific job:
You can learn more about the characteristics of the default Spark profiles available in the Spark Profiles Reference section.
Note also the Recommended best practices for adjusting Spark profiles.
Specifying custom Spark profiles is supported in all languages. In all of the cases below, settings are evaluated from left to right. If multiple profiles specify the same setting, the one closer to the end of the list will take precedence.
You can reference the profile1
and profile2
profiles in your Python code by using the configure
decorator to wrap your Transform object. This decorator takes in a profile
parameter that refers to the list of your custom Transforms profiles:
Copied!1 2 3 4 5 6 7 8 9 10 11 12
from transforms.api import configure @configure(profile=['profile1', 'profile2']) @transform( # your input dataset(s) my_input=Input("/path/to/input/dataset"), # your output dataset my_ouput=Output("/path/to/output/dataset"), ) # your data transformation code def my_compute_function(my_input): return my_input
Auto-registered transforms can reference the profile1
and profile2
profiles in your Java code by using the TransformProfiles
annotation in your compute function. This annotation takes in a parameter that refers to the array of your custom Spark profiles:
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
import com.palantir.transforms.lang.java.api.TransformProfiles; /** * This is an example low-level Transform intended for automatic registration. */ public final class LowLevel { @Compute @TransformProfiles({ "profile1", "profile2" }) public void myComputeFunction( @Input("/path/to/input/dataset") FoundryInput myInput, @Output("/path/to/output/dataset") FoundryOutput myOutput) { Dataset<Row> limited = myInput.asDataFrame().read().limit(10); myOutput.getDataFrameWriter(limited).write(); } }
Alternatively, if you’re using manual registration, you can use the builder
method transformProfiles()
:
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
public final class MyPipelineDefiner implements PipelineDefiner { @Override public void define(Pipeline pipeline) { LowLevelTransform lowLevelManualTransform = LowLevelTransform.builder() .transformProfiles(ImmutableList.of("profile1", "profile2")) // Pass in the compute function to use. Here, "LowLevelManualFunction" corresponds // to the class name for a compute function for a low-level Transform. .computeFunctionInstance(new LowLevelManualFunction()) .putParameterToInputAlias("myInput", "/path/to/input/dataset") .putParameterToOutputAlias("myOutput", "/path/to/output/dataset") .build(); pipeline.register(lowLevelManualTransform); } }
You can reference the profile1
and profile2
profiles in your SQL code by setting the foundry_transform_profiles
property for your table:
Copied!1 2
CREATE TABLE `/path/to/output` TBLPROPERTIES (foundry_transform_profiles = 'profile1, profile2') AS SELECT * FROM `/path/to/input`
Here is another example using alternative SQL syntax:
Copied!1 2
CREATE TABLE `/path/to/output` USING foo_bar OPTIONS (foundry_transform_profiles = 'profile1, profile2') AS SELECT * FROM `/path/to/input`;
Note that specifying custom Transforms Profiles is not currently supported in ANSI SQL mode.