2. [Builder] Introduction to Data Transformations3. Simulate Your Datasource

3 - Simulate your Datasource

This content is also available at learn.palantir.com ↗ and is presented here for accessibility purposes.

📖 Task Introduction

Each stage in your project may need multiple batch pipelines built with Pipeline Builder where you develop and maintain the data transformations in a structured setting. Since this tutorial does not actually connect to an external source, you’re going to simulate one by creating copies of three raw files into your Datasource Project.

🔨 Task Instructions

  1. Proceed to the Datasource project folder that you created in the previous tutorial ↗, e.g., .../Temporary Training Artifacts/${yourName}/Data Engineering Tutorials/Datasource Project: Flight Alerts/.

  2. If you don't yet have a /data or /datasets/ folder in that location (either name will do), create one.

    • In your /datasets folder, create the following sub-folders:
    • /raw
    • /clean
    • /preprocessed
  3. Click into the /raw folder.

  4. Create a new pipeline by clicking on the green ➕ New ▾ button in the top right of your screen and choosing Pipeline from the dropdown list of artifacts.

  5. Create a batch pipeline and name it flight_alerts_datasource.

  6. Add the following datasets using the Add datasets button, each of which is located in /Foundry Training and Resources/Example Projects/[Datasource] Flight Alerts/datasets/raw/.

    • flight_alerts_raw
    • status_mapping_raw
    • priority_mapping_raw
  7. Create three outputs in your pipeline, one for each of the datasets you just imported in step 6 above. Be sure to simply use the input schemas for each.

  8. Consider coloring the input and output datasets distinctly using the “color nodes” option from the legend and labeling them accordingly. Use the clickable image below as a reference.

  9. Save and Deploy your pipeline to build the output datasets.

📖 Task Introduction

Each stage in your project may need multiple batch pipelines built with Pipeline Builder where you develop and maintain the data transformations in a structured setting. Since this tutorial does not actually connect to an external source, you’re going to simulate one by creating copies of three raw files into your Datasource Project.

🔨 Task Instructions

  1. Proceed to the Datasource project folder that you created in the previous tutorial, e.g., .../**Temporary Training** Artifacts/${yourName}/Data Engineering Tutorials/Datasource Project: Flight Alerts/.

  2. If you don't yet have a /data or /datasets/ folder in that location (either name will do), create one.

    • In your /datasets folder, create the following sub-folders:
    • /raw
    • /clean
    • /preprocessed
  3. Select the /raw folder.

  4. Create a new pipeline by selecting ➕ New ▾ in the top right of your screen and choosing Pipeline from the dropdown menu of artifacts.

  5. Create a batch pipeline and name it flight_alerts_datasource.

  6. Add the following datasets using the Add datasets button, each of which is located in /Foundry Training and Resources/Example Projects/[Datasource] Flight Alerts/datasets/raw/.

    • flight_alerts_raw
    • status_mapping_raw
    • priority_mapping_raw
  7. Create three outputs in your pipeline, one for each of the datasets you just imported in step 6 above. Be sure to simply use the input schemas for each.

  8. Consider coloring the input and output datasets distinctly using the “color nodes” option from the legend and labeling them accordingly. Use the clickable image below as a reference.

  9. Save and Deploy your pipeline to build the output datasets.