4. Scheduling Data Pipelines5. Defining How Your Schedule Will Build

5 - Defining how your Schedule will Build

This content is also available at learn.palantir.com ↗ and is presented here for accessibility purposes.

📖 Task Introduction

You’ve defined the WHAT and the WHEN of your pipeline, but there are a few more settings that structure HOW builds in this schedule should execute.

🔨 Task Instructions

  1. In the Build scope section of the Scheduler panel, set the dropdown to Project Scoped. In short, this enables the schedule to run with a token scoped to project permissions rather than to the permissions of an individual user.

  2. Open the collapsed ▸ Advanced options section at the bottom of the Scheduler panel.

  3. Select the option to Abort build on failure. You do not want one part of your pipeline updated while another part fails, and halting the entire schedule on a single failure helps prevent that kind of lop-sidedness in your pipeline.

  4. Set the option to Customize the number of attempts for failed jobs. Set the number of retries to 3 and the time between retries to 1 minute. This setting helps overcome any ephemeral network issues or other “flakiness” that might have temporarily suspended your builds.

    ℹ️ The Force build option in the Advanced settings should only be used for Data Connection ingests. Otherwise, you may end up building datasets that do not need to be built, thereby wasting Spark computation resources.

  5. Click the blue Save button in the bottom right of the Schedule panel to save your schedule.

We’ve touched on some basic schedule configuration settings in this exercise. Follow this link to the Foundry documentation page that describes best practices for scheduling pipeline. Data engineers should consider bookmarking this page when establishing or improving data pipelines.