5C. [Repositories] Multiple Outputs with Data Transforms3. Add Code For Your Transformed Datasets

3 - Add Code for Your “Transformed” Datasets

This content is also available at learn.palantir.com ↗ and is presented here for accessibility purposes.

📖 Task Introduction

Before adding our transform code that creates multiple dataset outputs, we’ll perform a simple join of the three clean output datasets from your flight alerts and passengers datasource projects using PySpark. This is the type of “pre-work” you’d conduct in a /transformed code folder.

🔨 Task Instructions

  1. Right click the /transformed folder in your repository and add a new file called flight_alerts_joined_passengers.py.

  2. Replace the default code in your new Python transform file with the code block below.

    from transforms.api import transform_df, Input, Output
    
    
    @transform_df(
        Output("/${space}/Temporary Training Artifacts/${yourName}/Data Engineering Tutorials/Transform Project: Alert Metrics/data/transformed/flight_alerts_joined_passengers"),
        flight_alerts_df=Input("${flight_alerts_clean_RID}"),
        passengers_df=Input("${passengers_clean_RID}"),
        join_table_df=Input("${passenger_flight_alerts_clean_RID}"),
    )
    def compute(flight_alerts_df, passengers_df, join_table_df):
    
        # join flight alert data to passenger data by using the passenger_flight_alerts_clean join table
        joined_df = (
            flight_alerts_df
            .join(join_table_df, on='alert_display_name', how='left')
            .join(passengers_df, on='passenger_id', how='left')
        )
    
        return joined_df
    
  3. Replace the following lines in your code:

    • ${space} with your space

    • ${yourName} with your /Tutorial Practice Artifacts folder name

    • ${flight_alerts_clean_RID} with the RID of the flight_alerts_clean dataset in your Datasource Project: Flight Alerts project

    • ${passengers_clean_RID} with the RID of the passengers_clean dataset in your Datasource Project: Passengers project

    • ${passenger_flight_alerts_clean_RID} with the RID of the passenger_flight_alerts_clean dataset in your Datasource Project: Passengers project.

      • ℹ️ Did you know you can use your repository’s Foundry Explorer helper in the bottom left of your screen to search for datasets you want to reference in your code? Simply open the helper, follow the folder path to the desired dataset, and then after clicking on the dataset, obtain the folder path or RID from the Details section as shown below (to obtain the RID, you’ll need to click on the Show more link to expand additional details).
  4. Use the Preview button to test the output of your code.

  5. Commit your code to your branch with a reasonable, descriptive message (e.g., “feature: add joined dataset”).

  6. Build your code on your branch and ensure successful completion.

  7. If successful, complete the PR process and merge your branch into Master.

  8. Build your code on the Master branch.