This content is also available at learn.palantir.com ↗ and is presented here for accessibility purposes.
After a Datasource Project has generated a set of clean outputs, the next stage in a pipeline — the Transform Project — prepares data to feed into the Ontology layer. These projects import the cleaned datasets from one or more Datasource Projects, join them with lookup datasets to expand values, normalize or de-normalize relationships to create object-centric or time-centric datasets, or aggregate data to create standard, shared metrics.
Up to this point in the Data Engineering training track, you’ve authored code-based data transformations that output a single dataset. Foundry transform APIs provide at least two ways to generate multiple outputs in a single transform file. This is helpful in cases where you want to programmatically brake inputs into distinctive parts. In this tutorial, you’ll explore one of the available methods for outputting multiple datasets from a single transform as you take your pipeline into the Transform Project phase.
The exercises in this tutorial will take the clean outputs from your Datasource project: Flight Alerts and Datasource Project: Passengers and further process them using the concept of a multi-output Python transform. You’ll first generate an intermediate transform that joins the flight alerts data with the passenger data. Then you’ll create a multi-output transform that creates individual datasets of alerts based on passenger country.