Overview

To create a pipeline, you need datasets. Datasets are added to a pipeline and can be cleaned, transformed, and combined with other datasets to be deployed for further use, often as part of the Foundry Ontology.

Pipeline Builder supports both structured and semi-structured datasets.

Structured datasets consist of files that contain open-source tabular data and metadata about the columns in the dataset. The column metadata is stored alongside the dataset as a schema.

Pipeline Builder also supports semi-structured datasets, including XML, JSON, and CSV files. You can use parsing transform functions to convert semi-structured files into tabular form and benefit from schema safety. Learn how to transform data in your pipeline.

The first step towards defining a workflow in Pipeline Builder is to add one or more datasets to your workspace. Learn how to add datasets or change input computation modes in the following documentation, and learn more about datasets in Foundry by visiting data integration.