3. [Repositories] Creating a Project Output6. Using Contour For Data Validation

6 - Using Contour for Data Validation

This content is also available at learn.palantir.com ↗ and is presented here for accessibility purposes.

📖 Task Introduction

Testing proposed data or type changes is critical to minimizing downstream errors, and Foundry provides a number of methods for data validation. Examples include:

  • Using visuals and statistics at the bottom of the dataset application to ensure there are no null values in a column.
  • Using the Preparation application to prototype the impact of type changes (e.g., from integer to double).
  • Using Code Workbook to test how changes to Python code impact downstream visualizations.
  • Using Contour to quickly dissect columns and rows or to prototype joins.

In this exercise, you’ll use Contour to verify that the alert_display_name column in your clean flight alerts dataset is a suitable primary key and—critically—that the key is unique. In reality, there are a number of ways you could conduct this quick validation, but this method will also give you the opportunity to save a Contour analysis in the /analysis folder of your datasource project.

🔨 Task Instructions

  1. Ensure your flight_alerts_clean dataset has been successfully built on your branch. If it has, consider clicking the option to “Replace paths with RIDs.” You may need to refresh your browser for your repository to present the option to you. If you elect to replace the paths with the RIDs, you will need to commit your code again with a message like “refactor: update output path to use RID.”

  2. Open the output dataset (flight_alerts_clean) by either:

    • Ctrl+clicking on the dataset name on line 6 of your transform code.
    • Opening the Foundry Explorer helper tab in the bottom left of your screen and selecting the Output dataset link on the left side of the helper window. Then in the Details panel of your helper, ctrl + select the dataset name.

    The Foundry Explorer helper is a file navigation interface that lets you quickly browse all files and folders.

  3. With your dataset open in the dataset application, check immediately under the dataset name in the top left to ensure you are on your feature branch as shown below.

  4. Click the blue Analyze button in the top right of the dataset preview to open the data in Contour.

    ℹ️ Contour is a helpful debugging and sense checking tool during the pipeline development process, and is often faster than other available methods.

  5. If prompted immediately for a save location, place it into your .../Datasource Project: Flight Alert/analysis folder as "Flight Alerts Primary Key Analysis."

    If you are unable to save it there, navigate to your .../Datasource Project: Flight Alert/analysis folder and create a new Analysis (using the green ➕ New button in the top right of the screen) titled "Flight Alerts Primary Key Analysis" and choose the flight_alerts_clean dataset.

  6. When the analysis opens, notice that your starting board, which lists your starting dataset, indicates that you are operating on your branch of the flight_alerts_clean dataset.

  7. Add a histogram board. In the Y-AXIS column dropdown, select alert_display_name and use the default X-AXIS aggregate of Count.

  8. Select Compute in the bottom right of the histogram configuration window.

    The histogram orders values by count *in descending order, so if the top row has a value (count) of 1, we know that all values in this column are unique.

    Later in this track, you will learn to enforce column value uniqueness.

  9. If you were not previously prompted to save your analysis, do so now by using the instructions in step 5 above.

    ℹ️ Refer to this location for a completed example Contour analysis if needed: .../Foundry Training & Resources/Example Projects/[Datasource] Flight Alerts/analysis/Flight Alerts PK Analysis