This content is also available at learn.palantir.com ↗ and is presented here for accessibility purposes.
Always be documenting. Foundry applications and project structures that support data pipelines provide ample opportunities for your to let your current and future team know the relevant facts about your data transformations. Having preprocessed your data, it’s time to clean it and prepare it for use downstream. This means airtight transform syntax as much as it means documenting the scope and logic every step of the way.
In this tutorial, you’ll engineer a “clean” output for your project to be consumed by downstream pipelines and use cases. The code you’ll be implementing makes use of common PySpark features for transforming data inputs, and a significant portion of the tutorial will require you to explore selected documentation entries that expound on PySpark best practices. As a reminder however, teaching PySpark syntax patterns is outside the scope of this course.