Exploration and analysis in Contour are performed through the use of boards in series. Some boards create charts or perform calculations, while others are used to manipulate your dataset by filtering, removing columns, and so on.
Use the links in this summary table to navigate between board types on this page.
Board | Description | Visualize | Filter Rows | Aggregate | Manipulate Columns | Remove Duplicates |
---|---|---|---|---|---|---|
Summary | Reports the row count for your table. | Yes | No | No | No | No |
Filter | Filter your dataset by numeric, text, or date and time values. | No | Yes | No | No | Yes |
Expression | Use the expression language to derive new columns or perform complex filtering. | No | Yes | No | Yes | No |
Table | View a portion of raw data, explore schemas and calculate data coverage metrics. | Yes | No | No | No | No |
Histogram | Create a histogram of your data and filter to specific groups. | Yes | Yes | Yes | Yes, via the Pivot option | No |
Distribution | Create a distribution plot of your data. | Yes | Yes | No | No | No |
Time series | Create a chart with date/time on the x-axis and filter to specific groups. | Yes | Yes | No | No | No |
Edit columns | Combine, duplicate, remove, rename, or split columns. | No | No | No | Yes | No |
Transform data | Obfuscate data, find and replace values, or parse dates. | No | No | No | Yes | No |
Chart | Create customizable, multi-layered charts. | Yes | Yes | Yes | No | No |
Grid | Create a matrix of two categorical columns. Cells can be filtered and are displayed as a heatmap. | Yes | Yes | No | No | No |
Heatmap | View a heatmap based on coordinate data. | Yes | Yes | No | No | No |
Pivot table | Create a pivot table for one or more metrics. | Yes | Yes | Yes | Yes, via the Pivot option | No |
Column editor | Derive new columns or remove unnecessary columns. | No | No | No | Yes | Yes |
Multi-column editor | Rename, remove, reorder columns, or remove duplicate rows in the data. | No | No | No | No | No |
Enrich | Enrich data with another dataset and return columns from both datasets. | No | No | No | Yes | Yes |
Link | Join to another dataset and return the matching records of that dataset. | No | No | No | Yes | Yes |
Set math | Keep, add, or remove rows based on external dataset. | No | Yes | No | No | No |
Join | Perform curated joins. | No | Yes | No | No | No |
Export | Export your final filtered set of observations to CSV or XLS. | No | No | No | No | No |
Reorder columns | Reorder the columns in your table. | No | No | No | No | No |
Macro | Apply templatized transformations to your path. | No | No | No | No | No |
Sort | Sort the rows of data based on one or more columns. | No | No | No | No | No |
Calculation | Display multiple aggregate calculations. | Yes | No | Yes | No | No |
Unpivot | Reshape your data by turning some columns into rows. | No | No | No | Yes | No |
The summary board displays the number of rows and columns in your table at the current location in the path.
If you have not filtered data down at all, then this is the number of rows in your starting set. If you have applied filters (for example, by adding a histogram and selecting certain bars), this is the number of rows remaining after the filter.
The purpose of the filter board is to apply customizable filters on your dataset. Although you can also apply filters in other boards (distribution, histogram), the filter board allows for building in one place more complex filters involving multiple variables.
Using a list in the filter board is akin to a WHERE IN (x,y,z)
clause in SQL. Contour can handle lists of thousands of items in the filter board. However, large lists will tax the browser, and lists that are too large will likely cause browser failure. In these cases, the list should be imported into Contour as a separate set, and the filter should be implemented using a link or set math board. Learn how to use the link or set math boards.
Click Add filter, choose a column to filter, then choose a filter type from the dropdown. Based on the column you selected, Contour will select an appropriate category of filter (for example, number for columns of numeric values).
In some text filters, you can use wildcards: *
can be replaced by multiple characters, and ?
can be replaced by a single character.
In a "matches" (regular expression) text filter, you can input your regular expression directly (no quotes or string indicators necessary).
To add another filter, simply click Add filter again. You can choose to match all filters or any filter. To remove a filter, click the trash button next to the filter. Click Save to apply your filters.
The text filter currently offers the following options:
?
to indicate a single character wildcard, or * to indicate a multi-character wildcard. For example, a term of h?l*o
will match “hi hello hi” or “hi halqqqqqo hi”.hello
would match “hello”, but NOT “hi hello hi”.h?l*o
will match “hello” or “halqqqqqo”.In addition to its visual tools like the histogram and chart, Contour also offers an expression board that lets you work with Contour’s rich expression language to derive new columns from your data, perform complex filtering, or perform complex aggregations.
Column names are case-sensitive. Additionally, when selecting a column, you may write the column name with or without double quotes. For example, year("birthdate_col")
is equivalent to year(birthdate_col)
. For consistency, column names in this documentation are written with double quotes.
The table board shows a snapshot of your dataset in tabular format. Note that only the first limit
(default: 1,000) rows in the dataset are displayed. This limit exists to prevent browser performance issues and is generally not configurable.
The table board is useful for spot-checking your data to make sure it looks as you expect. You can interact with the table: drag-and-drop the columns to reorder them or choose from the dropdown on each column. These formatting changes to the table do not change the underlying data (if you view only a subset of the columns, all columns still exist in the underlying data).
To move multiple columns at once, select the columns while holding down the Shift key. You can also use the Configure panel to modify multiple columns at once.
You can add conditional formatting to the table board by clicking the column dropdown.
Then, use the dialog to add rules for a given column. Conditionally formatted cells will appear with text and background of the selected color. Rules are not supported for Date columns.
You can add the table board at any point in your path to get a quick preview of the data at that moment, or you can switch from path view to the table panel.
The table panel makes the table (not boards) the focus, so you can see how the data changes as you add each board. This can be especially helpful when writing expressions.
You can switch to the table panel by clicking Table in the upper right. Click the button again or click Hide table to return to path view.
The table panel does not support conditional formatting.
The histogram board aggregates the distinct values in a given column and displays the results as a bar chart.
For example, the following histogram calculates the average length of a taxi ride by which New York neighborhood it began in.
Note that only the top ten bars are displayed. To display more bars, click + Show More. You can display up to 50 values at once. If there are more than 50 values, use the dropdown to navigate to other parts of the range.
The histogram board is a visualization of a SQL GROUP BY
clause.
The above example histogram is equivalent to the following SQL query:
Copied!1 2 3
SELECT start_neighborhood, mean(trip_time_in_secs) FROM <table name> GROUP BY start_neighborhood
The Approx. Median aggregate is approximate. Contour calls the percentile_approx ↗ function with percentage value 0.5 and the default accuracy.
When you click Switch to Pivoted Data, any boards you add after the histogram will use the aggregated data computed in the table, rather than the original dataset.
The new dataset will include the column you selected for Y-Axis in the original histogram configuration, as well as a column for the aggregate. For example:
The histogram defaults to sorting by the aggregate in descending order. For very large histograms, sorting is performed on the 1,000 highest values of the aggregate.
You can use the dropdowns to change to sort by the Y-Axis column values instead, or to change the sort direction.
Select data on the histogram to filter the dataset for future boards.
Selection modes:
Then choose Keep to filter to only the selected values, or Remove to keep only the non-selected values.
The distribution board displays the distribution of a numerical variable for an aggregate metric.
The distribution board is similar to the histogram, but it displays aggregated data based on ranges of values, rather than specific values. For example, the following distribution displays data about customers’ ages. Ages are divided into ten ranges (or “buckets”).
In calculating the distribution board, we first find the minimum and maximum of the X-Axis and create a function to calculate the buckets. The SQL equivalent of the distribution is then approximately equivalent to the following:
Copied!1 2 3
SELECT X_AXIS_BUCKET_FUNCTION([x-axis-column]), <AGGREGATE_METRIC>([aggregate-column]) FROM <PARENT_BOARD> GROUP BY X_AXIS_BUCKET_FUNCTION([x-axis-column])
The Approx. Median aggregate is approximate. Contour calls the percentile_approx ↗ function with percentage value 0.5 and the default accuracy.
To select a range to filter by, click-and-drag your desired interval on the chart.
You can then adjust the interval more finely in the editable board footer.
You can choose to Keep the values in the selected interval, or Remove those values, keeping only non-selected values. To clear your selection, click the Clear button (x).
The time series board allows you to group data by time intervals and calculate aggregate metrics on that data.
For example, given a dataset with personal information about customers, the following time series board computes the number of people born in each year.
You can further specify a column to use as the series. For the above example, you could choose to use gender as the series. The time series board will then divide into one line for each value in the series column: in this case, F (female) or M (male).
Note that the time series performs its aggregates over the entire dataset, and reduces the output to the first 1000 values upon displaying it.
The Approx. Median aggregate is approximate. Contour calls the percentile_approx ↗ function with percentage value 0.5 and the default accuracy.
You can select a date range on the time series to filter the dataset for future boards. Click , then click-and-drag your desired interval. (You can adjust the interval more finely in the editable board footer.) To clear your selection, click the icon.
Choose Keep from the dropdown to filter to only the selected values, or choose Remove to keep only the non-selected values.
You can edit columns in Contour with the following boards:
You can transform data in a column using the following boards:
The SHA-1 hash can be decrypted and is not considered fully secure. Therefore, it should NOT be used for data compliance purposes.
k
) to apply to a dataset, ensuring at least k
number of instances with the same set of sensitive information to reduce the risk of re-identification (even if there is no personally identifiable information). This process is done by “suppressing” specific fields that would potentially help with the re-identification of the data.The appropriate k-value for your use case is determined by context. Organizations typically set their own policies for setting k-values based on the context of the analysis and statistical risk of re-identification. Some example policies include National Center for Education Statistics ↗ and the U.S Department of Health & Human Services ↗. At a minimum, the k-value should always be greater than 1 and less than the total number of rows in the dataset.
Using the k-anonymize function, the board asks for the columns to k-anonymize, k-value target, strategies for suppression, and what to do with rows that do not meet the k-value post-suppression.
Columns: Represents “quasi-identifiers” or attributes that can be linked with external data to uniquely identify an individual.
k-value: Represents the threshold value k
where there are at least k
number of instances with the same set of sensitive information.
Strategies: Represents how data should be suppressed and in what order. You can set the given order of operations to reach the indicated k-value. For each column listed, you can choose amongst different strategies that would be applied to data to meet the k-value:
***
as the replacement value, but this can be replaced with a user-provided value.Rows that do not meet the k-value post-suppression: If some rows do not meet the k-value threshold and cannot be suppressed to meet a count greater than k
, the following options are available:
***
. This option is particularly relevant if you want to retain the same count of rows.Find and replace text within a column, or find empty or null cells. This board supports properties that are String or Numeric types.
Parse dates from strings.
The Contour Chart board allows you to build custom charts for analyzing your data.
Choose a chart type for the main chart layer, then configure the x and y axes. Currently the Chart board offers the following types of charts:
Bar
Horizontal Bar
Line
Scatter
Heat Grid
Pie
Segment by
For chart types other than heat grid and pie, you can also choose to segment the data into series.
Sorting
Expand the Options section to change how your chart data is sorted. You may order chart data by values in the main layers:
The following example sorts a bar chart by the number of gold medals received by a country in the Olympics:
Data can be sorted in ascending or descending order. Overlay plot values cannot be used to order chart data.
Formatting
Use the Format tab to configure the chart. You can change the X- and Y-axis titles, formatting of the axes, legend positioning, series sorting, and series colors.
Adding Overlays
You can add overlay plots by clicking + Add Overlay. For example, you might want to overlay a line chart on top of a bar chart.
When you add an overlay, you can choose whether the chart should use the data in the current path or from a different dataset.
Plotting data from a different dataset does not join that dataset with the working set. To join datasets, you should use the Join board.
Note that only the main chart layer is part of the data path. The other layers are solely for presentation purposes. In other words, making a selection or otherwise manipulating the data on an overlay layer will not affect the data downstream in your path.
You can plot your chart layers on separate y axes if the values of the individual layers are not related, or if the data range or plot scale is significantly different.
Bucket selection
You can choose how to bucket data points when configuring Group by columns (e.g. on the x-axis) and Segment by columns. Only numeric, date, or time columns can be bucketed. For example, if you create a bar chart and select a date column for the x-axis Group by column and choose bucket type Year, the resulting chart will have a bar for every year. The available bucket types are listed below.
Numeric column bucket types:
Date and time column bucket types:
If the bucket selection does not fit within the result limit, the most granular option that does fit will be applied so that data is not dropped. Read Result limit for more information.
Contour limits the number of data points it displays on the browser. Practically speaking, Contour cannot display more data points than there are pixels in the screen. In order to produce accurate charts and not drop any data, the Chart board will rebucket the chart configuration to the most granular bucket selection possible that fits under the result limit.
The result limit is set by your Palantir administrator and defaults to 1000 points. Rebucketing will occur for numeric, date, or time columns.
To illustrate rebucketing, consider the following example:
Select data on the chart to filter the dataset for future boards. Use Ctrl+Click or Cmd+Click for multi-select.
You can pan and zoom on charts to more easily see the data. Hovering over a bar or point on the chart also displays a tooltip highlighting what you’re looking at.
The grid board is similar to the histogram, but the grid board aggregates data by two columns rather than one, displaying a heat grid chart of the results. (For more than two columns, you can use a pivot table.) For example, the following grid compares education level to yearly income:
The grid board is a visualization of an aggregate query, similar to the histogram and pivot table boards. A grid is approximately equivalent to the following SQL query:
Copied!1 2 3
SELECT [x-axis-column], [y-axis-column], <AGGREGATE_METRIC>([aggregate-column]) FROM <PARENT_BOARD> GROUP BY [x-axis-column], [y-axis-column]
The Approx. Median aggregate is approximate. Contour calls the percentile_approx ↗ function with percentage value 0.5 and the default accuracy.
Select one or more cells on the grid to filter the dataset for future boards. Click a cell again to deselect it.
Choose Keep to filter to only the selected values, or choose Remove to keep only the non-selected values.
The heatmap board displays geocoded data on a map, color-coded to represent the values.
The Approx. Median aggregate is approximate. Contour calls the percentile_approx ↗ function with percentage value 0.5 and the default accuracy.
You can draw a radius on the Heatmap to select all rows containing geo data that lies within within that radius.
Click , then click-and-drag to draw a circle on the map.
Choose to Keep the values in the selected radius, or Remove those values, keeping only non-selected values.
To clear your selection and remove the filter, click outside of the circle on the map.
The pivot table board allows you to quickly compute multiple aggregate values of your data across multiple dimensions. The result of this computation is sampled and therefore what is displayed in the table may be incomplete. This sampling is described in further detail below.
Given a dataset with demographic information about customers, the following pivot table computes how many customers (by age) are married females, married males, single females or single males.
To prevent slow front-end and back-end performance, the number of rows to calculate is limited. The limit is 1,000 rows in most environments and is generally not configurable.
Let's assume that as in the above screenshot, you have Pivot Table row aggregates of PERIOD
and PRACTICE
, and a column aggregate of POSTCODE
. For each combination, you want to get the row count and the max value of the column NAME
. If the limit in your environment is the default value of 1,000, you will only calculate 1,000 complete rows. Each row is guaranteed to be complete, but some rows may not be present.
When you sort a column in your pivot table, sorting is performed on the preview, rather than the entire dataset. To sort your entire dataset, use the Sort Board. See Sort for more information.
In order to interact with the entirety of pivoted data, use the Switch to pivoted data option on the board, which will transition your Contour analysis to the fully-computed pivoted data for all boards beneath the pivot table board. Alternatively, you can attempt to avoid the cell limit by further filtering your data upstream of the pivot table.
When specifying a column aggregate, the values in the column must be case-insensitively unique. For example, if column "Borough" contains values "Brooklyn" and "brooklyn", and you specify "Borough" as a column aggregate, the pivot table calculation will fail. Consider casting all values to a consistent case to avoid this issue.
The Approx. Median aggregate is approximate. Contour calls the percentile_approx ↗ function with percentage value 0.5 and the default accuracy.
You can drag and drop between Columns, Rows and Aggregates.
You can specify multiple aggregates in a single pivot table. Each aggregate will be calculated for each combination of rows and columns you select.
Grand totals can also be calculated for rows, columns, or both. Grand totals are calculated by performing the aggregate over the entire dataset (in other words, the grand total of Unique Count is the total number of unique counts over the dataset, the grand total of Mean is the mean of the entire dataset).
When you click Pivot (switch to aggregated data), any boards you add after the histogram will use the aggregated data computed in the table, rather than the original dataset.
The new dataset will include the column you selected for Y-Axis in the original histogram configuration, as well as a column for the aggregate. For example:
The column editor board allows you to easily remove columns from your dataset and derive new columns. Subsequent boards will consume the set of columns you choose to keep.
You can perform binary operations on existing columns in your dataset to create new derived columns, or parse columns of strings into number- or date-formatted columns.
SQL Equivalent
Derived columns are equivalent to using operators in SQL or Spark. For example, the following derives a column for Income per person:
Copied!1 2 3 4 5
SELECT [Household Members], [Marital Status], [Income Column] / [Household Members] AS [Income per person] FROM [Table Name]
To remove columns, select Show existing columns and select the name of any column you want to remove. You can add back a column by selecting it again. If you want to delete many columns, you can also select Remove All and then select any columns you want to retain.
You can remove duplicate rows using the Remove duplicate rows option in the column editor board.
SQL Equivalent
Removing columns via the column editor board is equivalent to selecting column names in SQL. For example, given a table that has 5 columns, A-E, the following removes columns D and E:
Copied!1 2
SELECT columnA, columnB, columnC FROM <tableName>
The multi-column editor board allows you to reorder, rename and remove columns from your data, and remove duplicate rows. Subsequent boards will consume the set of columns you choose to keep.
The left side of the board shows All Columns, while the right side shows Kept Columns. In the Kept Columns section, you can choose to rename or reorder the kept columns, or use the bulk rename functionality.
SQL Equivalent
Reordering, renaming, and removing columns is equivalent to selecting column names in SQL. For example, given a table that has 5 columns, A-E, the following code removes columns D and E, and renames A to A_1:
Copied!1 2
SELECT columnA as columnA_1, columnB, columnC FROM <tableName>
The enrich board lets you join your current working dataset to another dataset, and merge the matching results into your data.
Learn how to use the enrich board.
The link board lets you join to another dataset and return the matching records of that dataset. This differs from the set math keep only operation in that it returns columns from the linked (right) table only.
Learn how to use the link board.
The set math board lets you alter your current dataset based on another set. You can filter the dataset to keep only data that exists in the other dataset (keep only); append data from another dataset (add); or remove data based on the results of another dataset (remove).
Learn how to use the set math board..
The join board presents you with suggested join templates curated by your Palantir administrator. If you would like to add or modify suggested joins, contact your administrator.
Learn how to use the join board..
The export board allows you to download your analytical set as a CSV or XLS file.
Choose csv or xls from the dropdown, then click Export. After the board finishes its operations on the server, you are given the option to customize the filename. Then click Download <#> records to download the file.
The reorder columns board lets you drag and drop the columns in your table into a different order.
The macro board lets you apply a previously created macro to your path.
The sort board lets you sort all of the data in the dataset. Note that this sort is limited to the analysis and doesn't persist to the saved dataset. The sort may be lost by any downstream aggregations (e.g. a join or removing duplicate rows), so it is recommended to do such aggregations prior to the sort.
The calculation board lets you display multiple aggregate calculation on your data in the form of cards or lists. The available aggregate metrics are: Unique Count, Min, Max, Sum, Mean, Median, Standard Deviation, and Variance.
The calculation board can either be formatted as a card or as a list.
The card format has additional formatting options for horizontal or vertical direction and metric sizes.
Lastly, each calculation can have conditional formatting based on a set of specified rules (conditions). This means that font color and background color can change based on whether a condition is met.
The unpivot board allows you to reshape your data by turning some columns into rows. The columns that you select will be reformatted into two new columns: a header column (containing the original column names) and a value column (containing the original data values.)