Transforms preview

The Palantir extension for Visual Studio Code allows you to preview Python transforms directly from your local Visual Studio Code environment or a VS Code Workspace in the Palantir platform. This capability enables rapid testing of transforms without the need to exit the code editor. Currently, this feature is only available for Python transforms.

Initiate a preview

You can start a preview within your local Visual Studio Code environment or an in-platform VS Code Workspace in the following four ways:

  • Select the Run preview command from the Command Palette.

Find the "Run  preview" command in the Command Palette.

  • Select the Run preview icon from the toolbar.

The "Run preview" icon, which is a fast-forward symbol, from the VS Code toolbar.

  • Select Preview above the transform.

The "Preview" option above an example Python transform.

  • Open the Preview panel and select the Preview button next to the code filename.

The "Preview" button in the VS Code "Preview" panel.

Preview process

The extension provides two methods for previewing transforms:

1. Remote preview with a Code Assist workspace (local use of Palantir extension only)

Using Palantir Code Assist with the Palantir extension is the standard method for previewing during local development.

Upon cloning a Python transforms repository and opening a Python file, the Palantir extension will establish a connection to a remote Code Assist workspace, just as it would for an in-platform code repository environment.

A Code Assist workspace setting up in a local VS Code environment.

Once Code Assist is ready, the extension will identify all transforms present in the open files.

After selecting a transform for preview, the preview will execute on the remote Code Assist workspace.

2. Local preview

Using a local preview is the default previewing method for users working within an in-platform VS Code Workspace. It is also the default previewing method in local development if a stack administrator has enabled local preview.

For local development, local preview must be enabled by your platform administrator on the Code Repositories Control Panel page.

Upon opening a Palantir repository, the extension will configure the environment. Once the environment is set up and transforms are detected, you will be able to execute previews locally. To do this, the extension downloads and temporarily stores portions of datasets to your machine as long as you have the appropriate permissions for the data.

Comparing Preview modes

Code Assist preview and Preview Engine preview use different execution models. Code Assist preview uses a preview version of the transforms library, which is a close re-implementation of the actual transforms library used during in-platform builds. This results in broader feature support at the cost of precision. There are subtle implementation differences between the preview and build versions of the transforms library which can lead to non-intuitive and sometimes misleading preview results.

On the other hand, Preview Engine uses the original transforms library to execute the user code. This way, the fact that the transformation is running in preview mode should be barely perceivable to the underlying code, resulting in higher accuracy and performance. The main drawback is that support has to be added for each library primitive lower down the architecture resulting in fewer supported features at the time of writing.

Sample-less vs. sampled dataset loading in Preview Engine

Preview Engine features a sample-less dataset loading option. To understand its importance, let us take a look at the input loading method of both Code Assist preview and Preview Engine preview. When an input dataset is requested, a certain subset of the input dataset is downloaded to disk before preview is actually run. The subset is uniformly sampled from the input and the number of rows can be configured by the user with a default of 10,000. In some use cases, this sampling is adequate and does not introduce statistical bias. However, for certain transformations, such as narrow filters or joins between multiple inputs, the result can be deceivingly short as the likelihood of matching values for the filtering expressions are less likely and exponentially less likely (in the number of joined inputs) for join expressions.

In the case of sample-less dataset loading, there is no pre-sampling happening. Instead, Preview Engine relies on modern data processing engines, such as Spark or Polars, to push down predicates ↗ to the data-source level and only download chunks of the dataset that are most likely to match the query. This means that filters or other narrowing expressions used anywhere within the transform code may be eligible to be pushed down, resulting in fully accurate preview results without much extra computational time incurred.

Some pipelines cannot take full advantage of predicate push-down, for example, pipelines that do not contain filter expressions. In these cases, the pipeline's author can introduce some conditional filter expressions in their code to speed up their preview runs during development.

There is one more caveat to keep in mind when deciding to go with sampled or sample-less dataset loading for a given input. Sampled inputs are cached locally, on disk, while caching is not supported for sample-less loading. This means if an input is not used in join expressions and the statistical properties of filters applied to this input are less relevant to the pipeline preview's correctness, sampled dataset loading is the better choice for speedier previews. In all other cases, sample-less dataset loading should be preferred.

Supported features by different preview methods

The following table shows the current support matrix of different preview executors. Code Repositories Preview is used not only in Code Repositories but also in the Remote preview mode of the Visual Studio Code extension. When previewing in Local mode, users can choose to use Full dataset (which is the same as sample-less) and Sampled dataset loading modes.

Code Repositories (Code Assist)Sample-less preview (Preview Engine)Sampled preview (Preview Engine)
DebuggingSupportedSupportedSupported
Foundry datasetsBoth tabular (with schema) and raw filesOnly tabular datasetsBoth tabular (with schema) and raw files
Transform generatorsSupportedSupportedSupported
Lightweight transformsSupportedNot supportedSupported
Views & Object materializationsSupportedNot supportedNot supported
IncrementalityIgnoredNot supportedNot supported
External transformsBoth sources and egress policiesSources supported in Code WorkspacesSources supported in Code Workspaces
Media setsSupportedNot supportedNot supported
ModelsSupportedNot supportedNot supported
CipherSupportedNot supportedNot supported
Language modelsSupportedNot supportedNot supported
Data ExpectationsSupportedNot supportedNot supported
Virtual tablesSupportedNot supportedNot supported
Spark sidecarsNot supportedNot supportedNot supported

External transforms in Code Workspaces enforce strict export controls. The Code Workspaces application maintains a historical record of a workspace's inputs, so previous inputs that contain additional security markings may stop a preview due to marking violations. Additionally, the application accounts for all previously incorporated container markings when a workspace computes its marking security checks and export controls to avoid the inappropriate exposure of marked data.

If a workspace contains markings that are incompatible with an external transform, restart the workspace without checkpoints to clear tracked markings. Review the external transforms documentation for additional information.

In both local development and VS Code Workspaces, if sample-less dataset loading is used for a transformation's preview but the transformation also makes use of unsupported features, the preview will fall back to sampled dataset loading. This behavior triggers a warning that can be seen in the extension's logs as well.