The Palantir extension for Visual Studio Code allows you to preview Python transforms directly from your local Visual Studio Code environment or a VS Code Workspace in the Palantir platform. This capability enables rapid testing of transforms without the need to exit the code editor. Currently, this feature is only available for Python transforms.
You can start a preview within your local Visual Studio Code environment or an in-platform VS Code Workspace in the following four ways:
The extension provides two methods for previewing transforms:
Using Palantir Code Assist with the Palantir extension is the standard method for previewing during local development.
Upon cloning a Python transforms repository and opening a Python file, the Palantir extension will establish a connection to a remote Code Assist workspace, just as it would for an in-platform code repository environment.
Once Code Assist is ready, the extension will identify all transforms present in the open files.
After selecting a transform for preview, the preview will execute on the remote Code Assist workspace.
Using a local preview is the default previewing method for users working within an in-platform VS Code Workspace. It is also the default previewing method in local development if a stack administrator has enabled local preview.
For local development, local preview must be enabled by your platform administrator on the Code Repositories Control Panel page.
Upon opening a Palantir repository, the extension will configure the environment. Once the environment is set up and transforms are detected, you will be able to execute previews locally. To do this, the extension downloads and temporarily stores portions of datasets to your machine as long as you have the appropriate permissions for the data.
Code Assist preview and Preview Engine preview use different execution models. Code Assist preview uses a preview version of the transforms
library, which is a close re-implementation of the actual transforms
library used during in-platform builds. This results in broader feature support at the cost of precision. There are subtle implementation differences between the preview and build versions of the transforms
library which can lead to non-intuitive and sometimes misleading preview results.
On the other hand, Preview Engine uses the original transforms
library to execute the user code. This way, the fact that the transformation is running in preview mode should be barely perceivable to the underlying code, resulting in higher accuracy and performance. The main drawback is that support has to be added for each library primitive lower down the architecture resulting in fewer supported features at the time of writing.
Preview Engine features a sample-less dataset loading option. To understand its importance, let us take a look at the input loading method of both Code Assist preview and Preview Engine preview. When an input dataset is requested, a certain subset of the input dataset is downloaded to disk before preview is actually run. The subset is uniformly sampled from the input and the number of rows can be configured by the user with a default of 10,000. In some use cases, this sampling is adequate and does not introduce statistical bias. However, for certain transformations, such as narrow filters or joins between multiple inputs, the result can be deceivingly short as the likelihood of matching values for the filtering expressions are less likely and exponentially less likely (in the number of joined inputs) for join expressions.
In the case of sample-less dataset loading, there is no pre-sampling happening. Instead, Preview Engine relies on modern data processing engines, such as Spark or Polars, to push down predicates ↗ to the data-source level and only download chunks of the dataset that are most likely to match the query. This means that filters or other narrowing expressions used anywhere within the transform code may be eligible to be pushed down, resulting in fully accurate preview results without much extra computational time incurred.
Some pipelines cannot take full advantage of predicate push-down, for example, pipelines that do not contain filter expressions. In these cases, the pipeline's author can introduce some conditional filter expressions in their code to speed up their preview runs during development.
There is one more caveat to keep in mind when deciding to go with sampled or sample-less dataset loading for a given input. Sampled inputs are cached locally, on disk, while caching is not supported for sample-less loading. This means if an input is not used in join expressions and the statistical properties of filters applied to this input are less relevant to the pipeline preview's correctness, sampled dataset loading is the better choice for speedier previews. In all other cases, sample-less dataset loading should be preferred.
The following table shows the current support matrix of different preview executors. Code Repositories Preview is used not only in Code Repositories but also in the Remote preview mode of the Visual Studio Code extension. When previewing in Local mode, users can choose to use Full dataset
(which is the same as sample-less) and Sampled
dataset loading modes.
Code Repositories (Code Assist) | Sample-less preview (Preview Engine) | Sampled preview (Preview Engine) | |
---|---|---|---|
Debugging | Supported | Supported | Supported |
Foundry datasets | Both tabular (with schema) and raw files | Only tabular datasets | Both tabular (with schema) and raw files |
Transform generators | Supported | Supported | Supported |
Lightweight transforms | Supported | Not supported | Supported |
Views & Object materializations | Supported | Not supported | Not supported |
Incrementality | Ignored | Not supported | Not supported |
External transforms | Both sources and egress policies | Sources supported in Code Workspaces | Sources supported in Code Workspaces |
Media sets | Supported | Not supported | Not supported |
Models | Supported | Not supported | Not supported |
Cipher | Supported | Not supported | Not supported |
Language models | Supported | Not supported | Not supported |
Data Expectations | Supported | Not supported | Not supported |
Virtual tables | Supported | Not supported | Not supported |
Spark sidecars | Not supported | Not supported | Not supported |
In both local development and VS Code Workspaces, if sample-less dataset loading is used for a transformation's preview but the transformation also makes use of unsupported features, the preview will fall back to sampled dataset loading. This behavior triggers a warning that can be seen in the extension's logs as well.