5C. [Repositories] Multiple Outputs with Data Transforms10. Take Stock Of Your Pipeline

10 - Take Stock of Your Pipeline

This content is also available at learn.palantir.com ↗ and is presented here for accessibility purposes.

📖 Task Introduction

You’ve got a multi-stage pipeline with three distinct schedules constructed for your two datasource projects and one transform project. Let’s bring them all to the Data Lineage graph and visualize their logic.

🔨 Task Instructions

  1. Open the Alert Metrics Pipeline Data Lineage graph.

  2. Expand all ancestor datasets by selecting all “clean” nodes on the left, right clicking them, and choosing Expand nodes.... Then click the << in the Expand parents window.

  3. Arrange the nodes aesthetically as desired.

  4. Click the Manage schedules button on the right side of your screen.

  5. Mouse over each schedule to quickly visualize the input/output relationship between the three schedules.

  6. Change the Node color options to Schedule count. There should be only 1 schedule per node on the graph. If any nodes belong to multiple schedules, this coloring option will reveal it (and it should be subsequently corrected).

  7. Change the Node color options to out-of-date. Due to the way you’ve been building your pipeline, you’ll notice uneven dataset recentness that would be corrected in practice once the furthest upstream datasets updated.

  8. Try a few of the other node coloring options, including:

    • Repository
    • Folder
    • Time last built