7 - Install Schema and TSLU Checks Throughout your Pipelines

This content is also available at learn.palantir.com ↗ and is presented here for accessibility purposes.

📖 Task Introduction

We recommend installing a schema check on all inputs to your pipeline and optionally (if you know the expected update frequency) the TSLU check. On all outputs (i.e., targets) of your build schedule, you should include at least a TSLU and schema check.

As you proceed, recall that you have three connected units of build, each with their own configured schedules:

  • yourName Flight Alerts Schedule: The build schedule for the jobs in your Datasource Project: Flight Alerts project.
  • yourName Passengers Schedule: The build schedule for the jobs in your Datasource Project: Passengers project.
  • yourName Alert Metrics Schedule: The build schedule for the jobs in your Transform Project: Alert Metrics project.

🔨 Task Instructions

  1. Return to your Flight Alerts Pipeline Data Lineage graph and expand it to include all downstream nodes from flight_alerts_clean, all the way to your generated and multi-output transforms (if you're following the "Builder" path, this means your filtered country and priority datasets). Remember to add all nodes upstream, too.
  2. Add a TSLU and Schema check to all schedule targets, adding each to its corresponding check group.
  3. Add a Schema check to all schedule inputs, adding each to its corresponding check group. Be aware that your *_json_raw and *_csv_raw input datasets to your Passengers schedule do not expect a particular schema due to the nature of the transform. Instead, consider placing the schema checks at the preprocessed stage for each.

If you're following the "Builder" path, you may not have preprocessed Passenger datasets, and you may safely skip this step.

Tips:

  • Mouse over each schedule in the side panel for a visual reminder of which nodes are inputs and targets of each schedule.
  • Apply checks to multiple nodes at the same time by using shift + drag to select multiple nodes on the Data Lineage graph, right clicking, and choosing Add health check...
  • If a dataset is a target of one build schedule and an input to another, there’s no need (or ability!) to apply the schema or TSLU check twice.
  • Add each to its corresponding check group.