Dataset Preview

The Dataset view consists of the following main components:

Dataset header
Information panel
Additional dataset views
Data Preview
Dataset Actions

Dataset app UI

Dataset header

The header of the page identifies the selected dataset and provides basic information such as: name, display name (if existing), location and the selected branch. It also allows some file related operations such as sharing, moving, renaming, etc.

Information panel

The information panel provides information on the dataset (metadata) as well as offers some basic administrative operations. It’s divided into three sections:

About - Provides information on the dataset including: time it was created and updated; users who created and last updated the dataset; size of the table; tools and input datasets used to create the data; tags and more. An Edit schema view is also available under the Updated by section that will infer a schema for CSV and JSON files. Here, users can also apply additional parsing options to drop jagged rows, change encoding, or add additional columns like file path, byte offset for row, import timestamp, or row number. For other file types, schema edits can be made in the Schema section under the Details tab.
Columns - Provides information on the different columns in the dataset, including: the type of data; description; data stats (percentage of null values, distributions and samples).
Schedules - Shows schedules that affect the dataset (see Schedules documentation for more information).

Additional dataset views

History tab

The history view provides historical job (build) information. A Summary view on the right-hand side of the page shows aggregated information on job statuses over time.

On the left panel, a list of jobs appears with their statuses and durations. Upon selection, a detailed Job view appears on the right hand side showing detailed job information, including: Job progress, job spec, build logs, files and resulted schema.

History of streaming datasets

In streaming datasets, the History tab will only appear when the view is set to Archive. The History tab will show the archive transactions alongside the streaming jobs.

Dataset history page

Details

The details view provides additional technical information about the dataset, as well as some administrative operations:

Schema - Provides full information on the table schema (column specifications) and allows editing the schema (if applicable).
Files - Shows the list of files that makes the dataset and allows downloading them.
Job spec - Shows the job specification containing essential information for the dataset to build.
Syncs - Surfaces the status and details of data syncs to different databases. For some sync types additional settings can be applied.
Custom metadata - Allows adding custom fields of information to the dataset. The fields added in this section are displayed in the information panel of the main Preview page.
Resource usage metrics - Provides graphs and information on disk and Spark usage of the dataset over time.
Last run details (only for streams) - Shows detailed information about the latest stream run.

Stream (only for streaming datasets)

When the dataset is a streaming dataset, the Stream tab will show current and historical information on the streaming jobs. By changing the time period, you can explore the logs and details of jobs that streamed the dataset during that time.

Health tab

The Health tab provides tools to monitor data health.

Streaming datasets

In streaming datasets the Health tab will only appear when the view is set to Archive. The checks will then refer to the archive dataset rather than the stream.

Compare

Use the Compare tab to compare two different datasets. Click on the tab and select a dataset to compare with. The Compare tab can be used in several ways:

Compare two separate datasets to understand their differences
Compare a dataset to an older transaction of the same dataset to see how it changed over time
Compare the master version of a dataset to a different branch to see how merging that branch will affect the dataset

Streaming datasets

In streaming datasets the Compare tab will only appear when the view is set to Archive. You will then be able to compare the archive dataset with other non-streaming datasets.

Data Preview

The dataset preview table shows a sample of the data and allows light interaction with the full dataset. Use the preview table to understand the structure of the data and to quickly explore the values in the dataset.

By default, the preview table will show a limited sample of the data; the exact number of rows is displayed in the preview table header. However, any action taken on the data, such as filtering or sorting, will apply to the full dataset and increase the preview sample size. Depending on the number of rows, you may not see the entire dataset in the preview.

The preview table provides several useful capabilities:

By clicking on a column’s menu you can sort, filter and generate charts over the column data
By clicking on an individual cell you can exclude or include only the selected value from the preview
Report and view issues on individual columns
Search for specific column names

Streaming data preview

Streaming data preview provides a small sample of recently streamed rows. It will update automatically when set to Live updates. Sorting, filtering and charting are only available when the page is set to Archive, and will represent only the state of the archive dataset.

Dataset preview filters

Upload files manually

In Dataset Preview, you can upload files of following types directly into a dataset: .csv, .tsv, .xls, .xlsm, and .xlsx.

For .csv and .tsv files, Foundry will attempt to infer the schema of the new file. If the filename and schema of the new file are identical to a previous upload, you can update data in the existing dataset. If the filename is different from previous uploads, you can append data to an existing dataset.

The following steps apply to uploading all filetypes:

Navigate to your preferred folder and create a dataset.

Menu showing the options when searching for "dataset" after clicking the +New button.

Drag-and-drop the file into the dataset preview window.

Dataset Actions

The Actions menu provides quick access to Foundry tools and operations allowing you to analyze, explore, transform and manage the data. Some actions, such as Analyze (in Contour) and Build, are surfaced outside the Actions menu for quick access.

←

PREVIOUSData Health / Add health checks to a Marketplace product [Beta]

NEXTCSV parsing

→