The Dataset view consists of the following main components:
The header of the page identifies the selected dataset and provides basic information such as: name, display name (if existing), location and the selected branch. It also allows some file related operations such as sharing, moving, renaming, etc.
The information panel provides information on the dataset (metadata) as well as offers some basic administrative operations. It’s divided into three sections:
The history view provides historical job (build) information. A Summary view on the right-hand side of the page shows aggregated information on job statuses over time.
On the left panel, a list of jobs appears with their statuses and durations. Upon selection, a detailed Job view appears on the right hand side showing detailed job information, including: Job progress, job spec, build logs, files and resulted schema.
In streaming datasets, the History tab will only appear when the view is set to Archive. The History tab will show the archive transactions alongside the streaming jobs.
The details view provides additional technical information about the dataset, as well as some administrative operations:
When the dataset is a streaming dataset, the Stream tab will show current and historical information on the streaming jobs. By changing the time period, you can explore the logs and details of jobs that streamed the dataset during that time.
The Health tab provides tools to monitor data health.
In streaming datasets the Health tab will only appear when the view is set to Archive. The checks will then refer to the archive dataset rather than the stream.
Use the Compare tab to compare two different datasets. Click on the tab and select a dataset to compare with. The Compare tab can be used in several ways:
In streaming datasets the Compare tab will only appear when the view is set to Archive. You will then be able to compare the archive dataset with other non-streaming datasets.
The dataset preview table shows a sample of the data and allows light interaction with the full dataset. Use the preview table to understand the structure of the data and to quickly explore the values in the dataset.
By default, the preview table will show a limited sample of the data; the exact number of rows is displayed in the preview table header. However, any action taken on the data, such as filtering or sorting, will apply to the full dataset and increase the preview sample size. Depending on the number of rows, you may not see the entire dataset in the preview.
The preview table provides several useful capabilities:
Streaming data preview provides a small sample of recently streamed rows. It will update automatically when set to Live updates. Sorting, filtering and charting are only available when the page is set to Archive, and will represent only the state of the archive dataset.
In Dataset Preview, you can upload files of following types directly into a dataset: .csv
, .tsv
, .xls
, .xlsm
, and .xlsx
.
For .csv
and .tsv
files, Foundry will attempt to infer the schema of the new file. If the filename and schema of the new file are identical to a previous upload, you can update data in the existing dataset. If the filename is different from previous uploads, you can append data to an existing dataset.
The following steps apply to uploading all filetypes:
The Actions menu provides quick access to Foundry tools and operations allowing you to analyze, explore, transform and manage the data. Some actions, such as Analyze (in Contour) and Build, are surfaced outside the Actions menu for quick access.