7. [Repositories] Configuring Data Expectations1. About This Course

1 - About this Course

This content is also available at learn.palantir.com ↗ and is presented here for accessibility purposes.

Data Health checks run after a build has completed using a variety of backend processes depending on the check type. Because they run separate from the transform code after a build or job has completed, they cannot be used to fail a build. In other words, if you install a primary key uniqueness health check, you will be only be notified of the failure, but undesirable data may continue to propagate downstream.

Foundry’s Data Expectations library, by contrast, can be invoked in your transform to create health checks that (a) cause the job to fail if unmet, (b) offer more granularity than the out-of-the-box data health checks, (c) are under configuration management in your repository, and (d) add a layer of documentation in your code about the expected shape and size of the data. So, if your encoded primary key data expectation fails, your job will fail and unexpected data will not propagate downstream. What’s more, encoded expectations will show up in the Data Health app alongside any standard ones you configure.

⚠️ Course prerequisites

  • DATAENG 06: Monitoring Data Pipeline Health: If you have not completed the previous course in this track, do so now.

Outcomes

In many cases, data health checks like the ones you applied in the previous tutorial will be sufficient to monitor your pipelines. A full monitoring and protection program should take advantage of the Data Expectations framework for greater granularity and control. In this brief tutorial, you’ll add encoded data checks to a few of your data transforms and view them in the Data Health application.

📖 Learning Objectives

Understand when and how to apply data expectations checks.

💪 Foundry Skills

  • Apply a Data Expectations check to an existing code repository.
  • View expectations checks in the Data Health app.