5. [Builder] Transforms Project4. Create A Cleaned Output

4 - Create a cleaned output

This content is also available at learn.palantir.com ↗ and is presented here for accessibility purposes.

📖 Task Introduction

Your passengers_raw data could use a few cleaning steps before making it available for broader use in your organization:

  • Converting the dob column to a date type
  • Removing unnecessary columns left over from the JSON parsing process
  • Normalizing the flyer_status column

🔨 Task Instructions

  1. You’re currently in your .../data/raw/ folder. Proceed to .../data/clean/.

  2. Add a new pipeline artifact called passengers_datasource_clean.

  3. Import the two datasets you just output into your .../raw/ folder.

  4. passenger_flight_alerts_raw doesn't need any cleaning. Create an output for it called passenger_flight_alerts_clean.

  5. Add a transform step after passengers_raw with the following logic (see images below for assistance if needed):

    • CAST the dob column to a date type after using concatenate strings to append "19" to the year
    • Drop the _error and _file columns
    • Apply the Title Case transform for the flyer_status column

    For Step 5, first, clean the dob column using concatenate strings to prepend "19" to the year, as pictured in the first screenshot, to prepare it for the CAST and other cleanup pictured in the second screenshot.

    5a:

    5b:

  6. Name your transform node Clean Passengers.

  7. Create an output from your transform called passengers_clean.

  8. Color the nodes on the graph as desired.

  9. Save and deploy your pipeline.