This content is also available at learn.palantir.com ↗ and is presented here for accessibility purposes.
Careful management and efficient use of code at every stage of your pipeline will substantially improve maintainability. Consistency in dataset and column names makes your transform code more approachable by others in your organization and ensures outputs can more readily be joined with other data assets. When user-defined functions (e.g., for cleaning or formatting) are needed, writing them once and referencing them in import statements keeps your codebase lean and understandable.
In this tutorial you:
Below is a list of product documentation used in the course of this training:
The preprocessing stage in a pipeline prepares datasets for more substantive, policy-based cleaning steps that will generate datasets that can be used more broadly throughout your organization. The next tutorial will not only move your pipeline into the cleaning phase; it will also introduce new best practices and techniques for transforming data in Foundry.