1. Data Pipeline Foundations2. Intro To Data Pipelines

2 - Intro to Data Pipelines

This content is also available at learn.palantir.com ↗ and is presented here for accessibility purposes.

A data pipeline is a flow of data from a source system through intermediate datasets to ultimately produce high-quality, curated datasets that can be structured into the Ontology or serve as the foundation of machine learning and analytical workflows.

In this exercise we’ll review the basic stages of the pipeline development process. A data pipeline lifecycle typically involves these distinct activities:

  1. Agree on the desired output(s)
  2. Determine the source data needed to support the output(s)
  3. Define the pipeline scope and service level agreements (SLA)
  4. Map the pipeline stages and create the associated project structure
  5. Test, build, and optimize your transforms
  6. Apply schedule and dataset health checks
  7. Create a pipeline schedule
  8. Maintain your pipeline