2. [Builder] Introduction to Data Transformations6. Preprocessing Logic Flight Alerts

6 - Preprocessing Logic: Flight Alerts

This content is also available at learn.palantir.com ↗ and is presented here for accessibility purposes.

📖 Task Introduction

Now that your pipeline artifact is created, it’s time to use a few Pipeline Builder transforms to correct the formatting issues identified. Let’s begin by addressing the flight_alerts data, which has the following problems:

  • Column names appear in various formats and should be standardized to “snake case” (e.g., from flightDate to flight_date).
  • The category column values need to be normalized.
  • The flightDate column should be cast to a date.
  • The priority and status columns should be cast from integer to string. Although the values are indeed integers, it’s a best practice to only use integer values if mathematical operations will be involved.

🔨 Task Instructions

  1. Add a transform to the flight_alerts_raw node in your pipeline.

  2. In the upper left corner of the application, name your transform Preprocess flight_alerts.

  3. Apply the following transforms:

    • Normalize Column Names
    • Trim whitespace (apply to category)
    • Title case (apply to category)
    • Cast flight_date to date using M/d/yy as the format.
    • Cast priority and status to string.
    • Use the dataset preview window at the bottom of the application to confirm the data issues in the task introduction are indeed addressed.
    • Click the ⊕ Add pipeline output button on the righthand side of the screen.
    • Name your output flight_alerts_preprocessed.
    • Save your pipeline.