5C. [Repositories] Multiple Outputs with Data Transforms2. Create Your Folder Structure And Repository

2 - Create Your Folder Structure and Repository

This content is also available at learn.palantir.com ↗ and is presented here for accessibility purposes.

📖 Task Introduction

A Transform Project typically combines sources and applies additional business logic to produce enriched datasets. In general, these datasets are not meant for general exposure (the ones in the Ontology Project stage are). This task will help you implement the recommended high-level directory structure for your own transform project.

🔨 Task Instructions

  1. Create a new folder in your .../Temporary Training Artifacts/yourName/Data Engineering Tutorials/ folder called Transform Project: Alert Metrics.

  2. Add the following folders inside that top-level project folder:

    • /data
    • /documentation
    • /analysis
  3. Create a new Python transform code repository named flight_alert_metrics_logic.

  4. Create a new branch from Master called yourName/feature/join_data.

  5. In your repository’s /datasets folder, create two new sub-folders: transformed and output. This will place the datasets output by your transforms into the folder structure recommended in the documentation.

    • In short, the output folder represents the final product of the Transforms Project, while any pre-work needed to create that output product happens in the transformed folder.