Materializations

Up-to-date data is critical to many Foundry workflows. Ontology users can create materializations of indexed data from the Ontology that contains the latest state of each object instance by combining data from both input datasources and user edits.

Use cases for materializations

The two main use cases for materializations are:

  • Building downstream Foundry pipelines that require the latest state of each object instance including user edits.
  • Enabling downloads of Ontology data containing the latest state of all object instances for an object type.

We recommend orchestrating bulk downloads in Foundry by creating materialized datasets and initiating the downloads through existing download workflows for other Foundry datasets, such as data exports and exports through Foundry Transforms.

Create a materialized dataset

Navigate to the Materializations tab by toggling the Edits configuration in the Datasources tab in the Ontology Manager. On the Materializations tab, you can can create materialized object datasets or object restricted views with various configurations depending on input datasource types.

Materializations landing page

Comparison of writeback datasets and materialized datasets

In Object Storage V1 (Phonograph), writeback datasets are the equivalent of materialized datasets. Writeback datasets are required in OSv1 to enable user edits on an object type or a many-to-many link type with a join table.

Object Storage V2 does not require materialized datasets to enable user edits. Instead, users can enable user edits for an object type by toggling the Edits configuration in the Datasources tab in the Ontology Manager. This makes materializations optional in OSv2 such that users would only need to create materializations if needed for the two main use cases mentioned above. OSv2 also allows multiple materialized datasets to be created, in case users want to materialize only a subset of the properties from an object type.

There are other behavior differences between OSv1 writeback datasets and OSv2 materialized datasets, described below.

Build schedules in writeback and materialized datasets

Object Storage V1 (Phonograph) writeback datasets and Object Storage V2 materialized datasets handle build schedules differently.

  • In OSv1, there is no mechanism to trigger builds for writeback datasets when there are new user edits. Instead, users can create schedules for building their writeback datasets as often as they want. When there is no new data, these builds are automatically aborted to avoid using any additional compute. If no schedule is set up and the writeback dataset is not being built, the data in the writeback dataset may not be an accurate representation of the Ontology.
  • OSv2 is designed to address two separate use cases differently.
    • To have user edits reflected in the materialized datasets as soon as edits are applied, users can enable automatic propagation of user edits. This mode propagates user edits to the configured materialized datasets automatically (with a latency of a few minutes). This may incur additional cost as more frequent builds may occur depending on the frequency of new user edits.
    • If the latency of user edit propagation to materialized datasets is not critical, users can reduce costs by configuring periodic builds. In this mode, materialized datasets are rebuilt whenever the input datasources have new data or every 6 hours.

Creating a new output dataset

Existing output datasets

Retention of writeback and materialized datasets

The retention of writeback and materialized datasets do not work the same.

  • In OSv1, the writeback dataset acts like a regular dataset in the sense that it can be put on specific retention policies that can be specified within the platform. This enables users to look back at the historical snapshots of the object type state if the writeback dataset is built regularly.

  • In OSv2, materialized datasets are subject to a retention that is not customizable. Historical transactions are constantly deleted and only the latest snapshot is guaranteed to be available. In this case, users will have to set up a transform downstream if it is important to keep historical snapshots of object type states.

Dataset schema in writeback and materialized datasets

Object Storage V1 (Phonograph) writeback datasets and Object Storage V2 materialized datasets relate to input datasource schemas differently.

  • In OSv1, the schema of the input datasource is copied and used as the schema of the writeback dataset.
  • OSv2 changes this behavior to increase the legibility of the Foundry Ontology. Since users are materializing data from the Ontology, the schema used for materialized datasets is copied from the Ontology definitions instead of relying on the backing datasource configuration. Specifically, the API Name metadata of each property is used as the schema of the materialized dataset. Contact your Palantir representative if you want to continue using the schema of the input datasource while migrating from OSv1 to OSv2 (for example, to guarantee backward compatibility for existing writeback datasets).

__ prefixed columns (e.g. __is_deleted, __patch_offset) in the materialized dataset are metadata columns used by Foundry for deduplication purposes and do not represent any information on the state of the object type. These columns could be renamed or removed from future releases without prior warning and should not be used in production workflows.

Restricted views in writeback and materialized datasets

Object Storage V1 (Phonograph) does not allow materializing restricted views for object types that are granularly permissioned using restricted views as an input datasource. Users can only materialize writeback datasets that contain all the rows from the backing dataset of the restricted view input datasource. Users are then responsible for properly securing access to the writeback dataset based on their access restrictions.

In Object Storage V2, users can configure both regular datasets or restricted views as materialized resources for object types that are granularly permissioned using restricted views as an input datasource, as shown below.

Materialized resource type selection

In the case of an object type having multiple input datasources, users can configure their materialized datasets by selecting which input datasources they would like to materialize data from. If an input datasource is not selected, object type properties mapped from that input datasource will not be reflected in the materialized dataset. If some of the input datasources are restricted views, users have two options:

  • Users can select one of the restricted view resources to materialize as a restricted view. An example configuration is shown below.

Materialized restricted views

  • Users can select multiple input datasources, but in that case they can only materialize ontology data as a Foundry dataset. This limitation exists because different restricted view input datasources can have different policy configurations, and restricted views do not currently support setting column-level policies. An example configuration is shown below.

Materialized datasets with RV source