The instructions below step through a simple Java data transformation. If you are just getting started with data transformation, consider going through the batch pipeline tutorial for Pipeline Builder or Code Repositories first.
Follow the steps here to get started writing your first Java transformation:
Create a new Transforms Java repository. Navigate to a Project, select + New > Repository, and select Java under Language template.
Download this sample dataset: Download titanic.zip
. Import this dataset into Foundry.
Navigate to your repository. Your data transformation code goes in myproject/datasets/HighLevelAutoTransform.java
. The sample code in this file is commented out, so make sure to un-comment it before moving on.
Update the input dataset by replacing /path/to/input/dataset
with the full path to your titanic
dataset.
Update the output dataset by replacing /path/to/output/dataset
with the full path to your desired output dataset location.
Let’s modify the default transformation code to filter the titanic
dataset based on gender to get all female passengers. Update your data transformation code in my_compute_function
:
Copied!1 2 3 4 5 6 7
@Compute // Replace this with the full path to your output dataset. @Output("/path/to/output/dataset") // Replace this with the full path to your "titanic" dataset. public Dataset<Row> myComputeFunction(@Input("/path/to/input/dataset") Dataset<Row> myInput) { return myInput.filter(myInput.col("Sex").equalTo("female")); }
After you successfully commit your changes to your branch, you can open and build your output dataset!
This example defines a high-level Transform that uses automatic registration. For more information about the different types of data transformations supported in Transforms Java as well as an explanation of the template project structure and included files, refer to this documentation.