2. [Repositories] Introduction to Data Transformations18. Key Takeaways

18 - Key Takeaways

This content is also available at learn.palantir.com ↗ and is presented here for accessibility purposes.

Careful management and efficient use of code at every stage of your pipeline will substantially improve maintainability. Consistency in dataset and column names makes your transform code more approachable by others in your organization and ensures outputs can more readily be joined with other data assets. When user-defined functions (e.g., for cleaning or formatting) are needed, writing them once and referencing them in import statements keeps your codebase lean and understandable.

In this tutorial you:

  1. Set up a python code repository and practiced Git workflows.
  2. Created copies of datasets via identity transforms.
  3. Built utility functions for formatting and updating data and referenced those functions in transform files.
  4. Generated raw and processed versions of your source data in preparation for subsequent cleaning.

Below is a list of product documentation used in the course of this training:

The preprocessing stage in a pipeline prepares datasets for more substantive, policy-based cleaning steps that will generate datasets that can be used more broadly throughout your organization. The next tutorial will not only move your pipeline into the cleaning phase; it will also introduce new best practices and techniques for transforming data in Foundry.