4 - Add a Schema Check from the Data Lineage Application

This content is also available at learn.palantir.com ↗ and is presented here for accessibility purposes.

📖 Task Introduction

Over the last seven tutorials you’ve constructed an interlocking set of pipelines connected through input/output relationships. We’re going to start by focusing in on the Datasource Project: Flight Alerts pipeline and apply an important health check that evaluates the schema of your schedule inputs and outputs.

Often, schedule targets are used as inputs to other data transforms, Contour analyses, or Ontology objects, all of which expect a specific schema. We therefore recommend implementing a schema check on the inputs to and targets of your scheduled builds so you can be notified of potentially disruptive schema changes.

🔨 Task Instructions

  1. Open your Flight Alerts Pipeline in your Datasource Project: Flight Alerts project folder.
  2. Click the Manage schedules icon on the right of your Data Lineage application screen and click on your saved schedule for this pipeline: yourName Flight Alerts Schedule.
    • The graph shows you that flight_alerts_clean is the target of your scheduled build and that the three datasets marked as input triggers are the inputs. When monitoring a pipeline, you'll configure health checks for the inputs to and targets of your builds as well as on the schedule itself.
  3. Right click on flight_alerts_clean and choose Add health check... from the menu of options. This opens a health check selector right here in Data Lineage—a convenient way to quickly add a check to one or more selected datasets.
  4. Scroll to the bottom of the list of checks and choose Schema from the Schema category. This opens the schema health check configuration window.
  5. Locate and click the Edit severity link and change the severity to critical. Not all schema changes need be “critical,” but in our case, there’s a downstream reliance on this dataset as an input that will fail if there’s a change.
  6. The current check comparative allowance, EXACT_MATCH_ORDERED_COLUMNS, will pass if and only if the column number, order, and type are unchanged. Assume we are less concerned with column order or additive changes (i.e., no removing columns or changing data types on existing columns). Change the comparative allowance to COLUMN_ADDITIONS_ALLOWED, which ensures your existing column names and types (but not order) allows for additional columns if needed.
  7. Click the Add check group link and select your Flight Alerts Schedule group.
  8. Add a note to your check: “Dataset used as input to Transform Project: Alert Metrics.”
  9. Save your check.