This content is also available at learn.palantir.com ↗ and is presented here for accessibility purposes.
Before adding our transform code that creates multiple dataset outputs, we’ll perform a simple join of the three clean output datasets from your flight alerts and passengers datasource projects using PySpark. This is the type of “pre-work” you’d conduct in a /transformed
code folder.
Right click the /transformed
folder in your repository and add a new file called flight_alerts_joined_passengers.py.
Replace the default code in your new Python transform file with the code block below.
from transforms.api import transform_df, Input, Output
@transform_df(
Output("/${space}/Temporary Training Artifacts/${yourName}/Data Engineering Tutorials/Transform Project: Alert Metrics/data/transformed/flight_alerts_joined_passengers"),
flight_alerts_df=Input("${flight_alerts_clean_RID}"),
passengers_df=Input("${passengers_clean_RID}"),
join_table_df=Input("${passenger_flight_alerts_clean_RID}"),
)
def compute(flight_alerts_df, passengers_df, join_table_df):
# join flight alert data to passenger data by using the passenger_flight_alerts_clean join table
joined_df = (
flight_alerts_df
.join(join_table_df, on='alert_display_name', how='left')
.join(passengers_df, on='passenger_id', how='left')
)
return joined_df
Replace the following lines in your code:
${space}
with your space
${yourName}
with your /Tutorial Practice Artifacts
folder name
${flight_alerts_clean_RID}
with the RID of the flight_alerts_clean
dataset in your Datasource Project: Flight Alerts project
${passengers_clean_RID}
with the RID of the passengers_clean
dataset in your Datasource Project: Passengers project
${passenger_flight_alerts_clean_RID}
with the RID of the passenger_flight_alerts_clean
dataset in your Datasource Project: Passengers project.
Use the Preview button to test the output of your code.
Commit your code to your branch with a reasonable, descriptive message (e.g., “feature: add joined dataset”).
Build your code on your branch and ensure successful completion.
If successful, complete the PR process and merge your branch into Master
.
Build your code on the Master
branch.