This content is also available at learn.palantir.com ↗ and is presented here for accessibility purposes.
📖 Task Introduction
Your passengers_raw data could use a few cleaning steps before making it available for broader use in your organization:
Converting the dob column to a date type
Removing unnecessary columns left over from the JSON parsing process
Normalizing the flyer_status column
🔨 Task Instructions
You’re currently in your .../data/raw/ folder. Proceed to .../data/clean/.
Add a new pipeline artifact called passengers_datasource_clean.
Import the two datasets you just output into your .../raw/ folder.
passenger_flight_alerts_raw doesn't need any cleaning. Create an output for it called passenger_flight_alerts_clean.
Add a transform step after passengers_raw with the following logic (see images below for assistance if needed):
CAST the dob column to a date type after using concatenate strings to append "19" to the year
Drop the _error and _file columns
Apply the Title Case transform for the flyer_status column
For Step 5, first, clean the dob column using concatenate strings to prepend "19" to the year, as pictured in the first screenshot, to prepare it for the CAST and other cleanup pictured in the second screenshot.
5a:
5b:
Name your transform node Clean Passengers.
Create an output from your transform called passengers_clean.