5C. [Repositories] Multiple Outputs with Data Transforms5. Multi Output Transforms

5 - Multi-output Transforms

This content is also available at learn.palantir.com ↗ and is presented here for accessibility purposes.

📖 Task Introduction

You can also specify multiple outputs using the transform() decorator. In this task, you'll create a multi-output transform that filters for just those passengers whose flyer_status is Platinum and then creates separate datasets for flight alerts of each priority (high, medium, low).

This method does not run the entire input through the full logic each time as you saw with the for loop in the generated transform. You'll only need to filter to Platinum passengers once, after which you can share the filtered dataframe to create multiple output datasets.

🔨 Task Instructions

  1. Create a new branch from Master called yourName/feature/multi_output.

  2. Right click on your /output folder in your repository Files and add a new file called flight_alerts_by_priority.py.

  3. Replace the default code in your new Python transform file with the code block below.

    from transforms.api import transform, Input, Output
    
    
    # Pass multiple Output specifications to the transform() decorator to split the input:
    @transform(
        source_df=Input("${flight_alerts_joined_passengers_RID}"),
        high=Output("/${space}/Temporary Training Artifacts/${yourName}/Data Engineering Tutorials/Transform Project: Alert Metrics/data/output/flight_alerts_platinum_high"),
        medium=Output("/${space}/Temporary Training Artifacts/${yourName}/Data Engineering Tutorials/Transform Project: Alert Metrics/data/output/flight_alerts_platinum_medium"),
        low=Output("/${space}/Temporary Training Artifacts/${yourName}/Data Engineering Tutorials/Transform Project: Alert Metrics/data/output/flight_alerts_platinum_low"),
    )
    def alerts_by_priority(source_df, high, medium, low):
    
        # filter the source dataframe to just those records where the passenger status is "Platinum"
        platinum_df = source_df.dataframe().filter(source_df.dataframe().flyer_status == 'Platinum')
    
        # Call the write_dataframe() function on each output to write the dataframe out to a dataset for each filtered priority
        high.write_dataframe(platinum_df.filter(platinum_df.priority == 'High'))
        medium.write_dataframe(platinum_df.filter(platinum_df.priority == 'Medium'))
        low.write_dataframe(platinum_df.filter(platinum_df.priority == 'Low'))
    
  4. Replace the following lines in your code:

    • ${space} with your space
    • ${yourName} with your /Tutorial Practice Artifacts folder name
    • ${flight_alerts_joined_passengers_RID}with the RID of the output from the transformed output from flight_alerts_joined_passengers.py.
  5. Click the Preview button. Your preview window will present you with three tabs, one for each output (high, medium, and low).

  6. If the result looks as expected (i.e., filtered to Platinum and the designated priority), then commit and build the code on your branch.

  7. If your build was successful (i.e., datasets materialized as expected), consider updating the input/output paths to RIDs and committing. Remember that this may require a browser refresh to get the "Replace paths with RIDs" link to appear.

  8. Complete the PR process and merge your branch into Master (you may delete you branch after the merge if desired).

  9. Build your code on the Master branch.

Read a bit more about multi-output transforms in this section of the Python transforms documentation.