11 - Setting Schedule Health Checks: Schedule Duration

This content is also available at learn.palantir.com ↗ and is presented here for accessibility purposes.

📖 Task Introduction

If your build schedule is taking longer than expected, it could be an indication that something is wrong that needs your attention. It may mean, among other things, that the volume of input data has increased or the size/number of data partitions has grown larger than your current Spark profile can efficiently compute. There may be higher than average Spark resource contention in your Foundry environment. Whatever the case, keeping tabs on the length of time it takes to complete your scheduled build (i.e., one or more jobs that logically build together or in sequence) will minimize the risk of breaching service level agreements or freshness expectations downstream.

🔨 Task Instructions

  1. In the Health tab of your schedule metrics page, click the Time ▾ dropdown and choose Schedule Duration.
  2. We don’t yet have a sufficient number of schedule executions to use metrics to inform the expected duration. Tick the second checkbox and set the threshold to be 1 deviation above the median, defined by the most recent 5 executions.
    • Once you have more schedule runs, you can return here and update the rule to a specific time threshold.
  3. Add the check to your Flight Alerts Schedule check group.