This content is also available at learn.palantir.com ↗ and is presented here for accessibility purposes.
A Transform Project typically combines sources and applies additional business logic to produce enriched datasets. In general, these datasets are not meant for general exposure (like the ones in the Ontology Project stage are). In this task we’ll perform a simple join of the three clean output datasets from your flight alerts and passengers datasource projects. This is the type of “pre-work” you’d conduct in a /transformed
code folder.
You’ll start with the flight alerts dataset, which has ~2.5k rows. To enable an eventual join with the passengers data (~10k rows), you’ll need to first use the passenger_flight_alert_clean
join table (~132k rows), which contains the primary keys of the flight alerts and passengers tables.
Open the /transformed
folder you created in the previous exercise.
Create a new batch pipeline in Pipeline Builder called passengers_flight_alerts_transformed
.
Add the following clean datasets you’ve created in this learning path:
flight_alerts_clean
passengers_clean
passenger_flight_alert_clean
Add a transform to flight_alerts_clean
that left joins passenger_flight_alert_clean
on the alert_display_name
.
Name your transform Alerts to Join Table
.
Create a new transform called Alerts to Passengers
that joins your Alerts to Join Table
transform to passengers_clean
, using passenger_id
as the shared key.
Save your pipeline.