Introduction to Data Analysis in Foundry2. Defining Tabular Data

2 - Defining Tabular Data

This content is also available at learn.palantir.com ↗ and is presented here for accessibility purposes.

A Foundry dataset is a collection of columns, rows, schema, and values built with user-defined logic. When that logic runs, it executes one of several transaction types to produce a tabular structure common to most data platforms in the industry.

Behind the scenes, Foundry partitions datasets into smaller files and stores them in the backing file system. When a dataset build runs (for example, on a schedule), it assembles the inputs (partitioned dataset files in the backing file system) and executes the user-defined logic on them to generate the output.

Take a few moments to read this overview of Foundry dataset anatomy. The rest of the tutorial will assume familiarity with these terms and concepts.

The architecture of distributed data differs from that of standard relational database tables, but most of it is abstracted away in Foundry's analytical applications. A general grasp of how datasets are constructed in Foundry can, however, help you optimize the performance of your analyses and better understand how datasets stay up-to-date.Review the image below for an example of how a dataset in Foundry utilizes a transaction and multiple input datasets.

Architecture flowchart showing how Dataset A and Dataset B are affected by a transaction to create output Dataset files.