4. Scheduling Data Pipelines2. Creating A Schedule

2 - Creating a Schedule

This content is also available at learn.palantir.com ↗ and is presented here for accessibility purposes.

📖 Task Introduction

In Foundry, a pipeline Schedule is treated as a first-class artifact with its own resource ID (RID) and permissioning scheme. This task will prompt you to revisit key data transformation and building concepts to contextualize the scheduling process and then create a container for your schedule.

Foundry data engineers must know the granular components of data build execution to effectively construct and maintain data pipelines. Review these critical terms and then read this extended article on the Foundry build process before proceeding. More background on the orchestration of the build process will be covered in the course "Foundry under the Hood" later in the Data Engineering learning path.

🔨 Task Instructions

  1. Open the “Flight Alerts Pipeline” Data Lineage graph you created in the previous exercise. It should be in your /Datasource Project: Flight Alerts/documentation folder.

  2. Select the “calendar” icon (hover text: “Manage schedules”) in the collapsed helper on the right side of the screen to open the Scheduler application.

  3. Select the blue Create new schedule button in the middle of the right-hand panel. Your graph coloring scheme switches to scheduler mode and your nodes will now be colored based on schedule logic you define. Clicking Exit schedule at the top of your screen will close the Scheduler application and return to standard node coloring.

  4. At the top of the Scheduler panel, choose the New schedule text and change the name to yourName Flight Alerts Pipeline (e.g., Jmeier Flight Alerts Pipeline).

  5. Just below the title, select the “Schedule description...” text to edit the description: “Build schedule for Datasource Project: Flight Alerts.”

    As discussed in the Scheduler documentation, there are three available schedule build types:

    • Single build: Build selected datasets only.
    • Full build: Build selected datasets and all upstream datasets.
    • Connecting build: Build all datasets between the input datasets (excluded) and target datasets (included).

    ℹ️ In general, we recommended avoiding Full Builds and using Connecting Builds whenever possible because of the precision they enable. Connecting builds require all of the datasets connecting the target datasets and upstream datasets must be connected by a job spec path on the branch on which the schedule is set.

  6. Click the option to Switch to a connecting build near the top of the Scheduler panel.