5C. [Repositories] Multiple Outputs with Data Transforms6. Exercise Summary

6 - Exercise Summary

This content is also available at learn.palantir.com ↗ and is presented here for accessibility purposes.

✅ What you built

  • A /Transform Project: Alert Metrics project folder provisioned with suggested sub-folders.
  • A new flight_alert_metrics_logic repository.
  • A transformed dataset that performed a simply join of clean outputs from the datasource stages of your pipeline.
  • A generated transform that programmatically created (8) outputs based on passenger country of origin.
  • A multi-output transform that writes dataframes filtered by passenger flyer_status and alert priority to (3) distinct datasets.

✅ What you learned

  1. A Transform Project typically combines sources and applies additional business logic to produces Ontology-ready datasets. In general, these datasets are not meant for general use.

  2. You can use your repository’s Foundry Explorer helper in the bottom left of your screen to search for datasets you want to reference in your code.

  3. Through the concept of fallback branches, the Foundry build process will "fall back" to the Master branch of the input if it cannot find a branch corresponding to your current one. You can also define a sequential fallback branch behavior in your repository's Settings → Branches → Fallback Branches.

  4. In multi-output transforms exercise we just completed, your code reads and processes the input dataset a single time. If you want to re-use the same data transformation logic across multiple transform objects, you’ll use a generated transform. For example, you’d consider a generated transform if:

    • You have an input dataset with information about various countries, and you have code that filters down the input by country and then calculates statistics.
    • You have multiple input datasets that may contain null values and you want to apply code that removes any nulls.

In these two cases, it would be useful to use the same data transformation code across multiple transforms. Instead of separately defining a transform object for each of your outputs, you can generate transform objects using a for-loop.