This content is also available at learn.palantir.com ↗ and is presented here for accessibility purposes.
📖 Task Introduction
Now that your pipeline artifact is created, it’s time to use a few Pipeline Builder transforms to correct the formatting issues identified. Let’s begin by addressing the flight_alerts data, which has the following problems:
Column names appear in various formats and should be standardized to “snake case” (e.g., from flightDate to flight_date).
The category column values need to be normalized.
The flightDate column should be cast to a date.
The priority and status columns should be cast from integer to string. Although the values are indeed integers, it’s a best practice to only use integer values if mathematical operations will be involved.
🔨 Task Instructions
Add a transform to the flight_alerts_raw node in your pipeline.
In the upper left corner of the application, name your transform Preprocess flight_alerts.
Apply the following transforms:
Normalize Column Names
Trim whitespace (apply to category)
Title case (apply to category)
Castflight_date to date using M/d/yy as the format.
Castpriority and status to string.
Use the dataset preview window at the bottom of the application to confirm the data issues in the task introduction are indeed addressed.
Click the ⊕ Add pipeline output button on the righthand side of the screen.