Transforms preview

The Palantir extension for Visual Studio Code allows you to preview Python transforms directly from your local Visual Studio Code environment or a VS Code Workspace in the Palantir platform. This capability enables rapid testing of transforms without the need to exit the code editor. Currently, this feature is only available for Python transforms.

Initiate a preview

You can start a preview within your local Visual Studio Code environment or an in-platform VS Code Workspace in the following four ways:

  • Select the Run preview command from the Command Palette.

Find the "Run  preview" command in the Command Palette.

  • Select the Run preview icon from the toolbar.

The "Run preview" icon, which is a fast-forward symbol, from the VS Code toolbar.

  • Select Preview above the transform.

The "Preview" option above an example Python transform.

  • Open the Preview panel and select the Preview button next to the code filename.

The "Preview" button in the VS Code "Preview" panel.

Preview process

The Palantir extension for Visual Studio Code runs local preview using the Preview Engine. This Preview Engine downloads and temporarily stores parts of datasets to a user's machine if they have the appropriate permissions for the data.

To use preview during local development, local preview must be enabled by your platform administrator from the Code Repositories settings page in Control Panel.

Upon opening a Palantir repository, the extension will configure the environment. Once the environment is set up and transforms are detected, you will be able to execute previews locally.

Inside Code Repositories, we use Code Assist to run preview. The following sections compare the two preview modes.

Comparing Preview modes

Code Assist preview and Preview Engine preview use different execution models. Code Assist preview uses a preview version of the transforms library, which is a close re-implementation of the actual transforms library used during in-platform builds. This results in broader feature support at the cost of precision. There are subtle implementation differences between the preview and build versions of the transforms library which can lead to non-intuitive and sometimes misleading preview results.

On the other hand, Preview Engine uses the original transforms library to execute the user code. This way, the fact that the transformation is running in preview mode should be barely perceivable to the underlying code, resulting in higher accuracy and performance. The main drawback is that support has to be added for each library primitive lower down the architecture resulting in fewer supported features at the time of writing.

Sample-less vs. sampled dataset loading in Preview Engine

Preview Engine features a sample-less dataset loading option. To understand its importance, consider the input loading method of both Code Assist preview and Preview Engine preview. When an input dataset is requested, a certain subset of the input dataset is downloaded to a disk before preview is actually run. The subset is uniformly sampled from the input, and the number of rows can be configured by the user with a default of 10,000. In some use cases, this sampling is adequate and does not introduce statistical bias. However, for certain transformations, such as narrow filters or joins between multiple inputs, the result can be deceivingly short as matching values for the filtering expressions is less likely and exponentially less likely (in the number of joined inputs) for join expressions.

In the case of sample-less dataset loading, there is no pre-sampling happening. Instead, Preview Engine relies on modern data processing engines, such as Spark or Polars, to push down predicates ↗ to the data-source level and only download chunks of the dataset that are most likely to match the query. This means that filters or other narrowing expressions used anywhere within the transform code may be eligible to be pushed down, resulting in fully accurate preview results without much extra computational time incurred.

Some pipelines cannot take full advantage of predicate push-down, for example, pipelines that do not contain filter expressions. In these cases, the pipeline's author can introduce some conditional filter expressions in their code to speed up their preview runs during development.

There is one more caveat to keep in mind when deciding to go with sampled or sample-less dataset loading for a given input. Sampled inputs are cached locally, on disk, while caching is not supported for sample-less loading. This means if an input is not used in join expressions and the statistical properties of filters applied to this input are less relevant to the pipeline preview's correctness, sampled dataset loading is the better choice for speedier previews. In all other cases, sample-less dataset loading should be preferred.

Supported features by different preview methods

The following table shows the current support matrix of different preview executors. Code Repositories Preview is used not only in Code Repositories but also in the Remote preview mode of the Visual Studio Code extension. When previewing in Local mode, users can choose to use Full dataset (which is the same as sample-less) and Sampled dataset loading modes.

Code Repositories (Code Assist)Sample-less preview (Preview Engine)Sampled preview (Preview Engine)
DebuggingSupportedSupportedSupported
Foundry datasetsBoth tabular (with schema) and raw filesOnly tabular datasetsBoth tabular (with schema) and raw files
Transform generatorsSupportedSupportedSupported
Data expectationsSpark and lightweight transformsSupported for Spark transformsSupported for Spark transforms
Lightweight transformsSupportedSupported for Parquet datasetsSupported
Views and object materializationsSupportedNot supportedSupported
IncrementalitySupportedSupportedSupported
External transformsBoth sources and egress policiesSources supported in Code WorkspacesSources supported in Code Workspaces
Media setsSupportedNot supportedNot supported
ModelsSupportedNot supportedNot supported
CipherSupportedNot supportedNot supported
Language modelsSupportedNot supportedNot supported
Virtual tablesSupportedNot supportedNot supported
Spark sidecarsNot supportedNot supportedNot supported

External transforms in Code Workspaces enforce strict export controls. The Code Workspaces application maintains a historical record of a workspace's inputs, so previous inputs that contain additional security markings may stop a preview due to marking violations. Additionally, the application accounts for all previously incorporated container markings when a workspace computes its marking security checks and export controls to avoid the inappropriate exposure of marked data.

If a workspace contains markings that are incompatible with an external transform, restart the workspace without checkpoints to clear tracked markings. Review the external transforms documentation for additional information.

In both local development and VS Code workspaces, if sample-less dataset loading is used for a transformation's preview but the transformation also makes use of unsupported features, the preview will fall back to sampled dataset loading. This behavior generates a warning that can be viewed in the extension's logs.