Project structure

Default bootstrapped repository

Here is the standard structure for a bootstrapped Transforms Python repository:

transforms-python
├── conda_recipe
│   └── meta.yaml
└── src
    ├── myproject
    │   ├── __init__.py
    │   ├── datasets
    │   │   ├── __init__.py
    │   │   └── examples.py
    │   └── pipeline.py
    ├── setup.cfg
    └── setup.py

There are also additional files inside the repository that can be viewed by going to the Settings cog in the File Explorer tab in Code Repositories and selecting Show hidden files and folders. In almost all cases, these hidden files should not be edited; Palantir does not provide support for repositories with custom changes to these hidden files.

You can learn more about the following files below:

Make sure you go through the getting started guide before reading on. Also, this page assumes that you’re using the default project structure that’s included in a bootstrapped Transforms Python repository.

Repository upgrade file changes

When you create the repository for the first time, it is boostrapped with the default contents of the latest Transforms Python template version at that time. During subsequent repository upgrades, files in the repository are upgraded to align with the contents of the most recent Transforms Python template version. Custom user changes to these files may be overwritten during the new upgrade template to ensure consistency. We do not support custom changes to these files as it can lead to unexpected behavior.

The following files will not be overwritten by a repository upgrade:

  • Default files in the conda_recipe and src folders
  • Inner and outer build.gradle files

The following files will be merged with the newest Python template file during a repository upgrade. In the case of any common keys, the Python template's version is chosen:

  • gradle.properties
  • versions.properties

The remaining files will be overwritten by the upgrade to match the files of the newest Python template versions.

pipeline.py

In this file, you define your project’s Pipeline, which is a registry of the Transform objects associated with your data transformations. Here is the default src/myproject/pipeline.py file:

Copied!
1 2 3 4 from transforms.api import Pipeline from myproject import datasets my_pipeline = Pipeline() my_pipeline.discover_transforms(datasets)

Note that the default pipeline.py file uses automatic registration to add your Transform objects to your project’s Pipeline. Automatic registration discovers all Transform objects in your project’s datasets package. Thus, if you re-structure your project such that your transformation logic is not contained within the datasets folder, make sure to update your src/myproject/pipeline.py file appropriately.

Alternatively, you can explicitly add each of your Transform objects to your project’s Pipeline using manual registration. Unless you have a workflow that requires you to explicitly add each Transform object to your Pipeline, it’s recommended to use automatic registration. For more information about Pipeline objects, refer to the section describing Pipelines.

setup.py

In this file, you define a transforms.pipeline entry point named root that’s associated with your project’s Pipeline—this allows Transforms Python to discover your project’s Pipeline. Here is the default src/setup.py file:

Copied!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 import os from setuptools import find_packages, setup setup( name=os.environ['PKG_NAME'], version=os.environ['PKG_VERSION'], description='Python data transformation project', # Modify the author for this project author='{{REPOSITORY_ORG_NAME}}', packages=find_packages(exclude=['contrib', 'docs', 'test']), # Instead, specify your dependencies in conda_recipe/meta.yml install_requires=[], entry_points={ 'transforms.pipelines': [ 'root = myproject.pipeline:my_pipeline' ] } )

If you modify the default project structure, you may need to modify the content in your src/setup.py file. For more information, refer to the section describing the transforms.pipeline entry point.

examples.py

This file contains your data transformation code. Here is the default src/myproject/datasets/examples.py file:

Copied!
1 2 3 4 5 6 7 8 9 10 11 """ from transforms.api import transform_df, Input, Output from myproject.datasets import utils @transform_df( Output("/path/to/output/dataset"), my_input=Input("/path/to/input/dataset"), ) def my_compute_function(my_input): return utils.identity(my_input) """

After un-commenting the sample code, you can replace /path/to/input/dataset and /path/to/out/dataset with the full paths to your input and output datasets, respectively. If your data transformation relies on multiple datasets, you can provide additional input datasets. You must also update my_compute_function to contain the code to transform your input dataset(s) to your output dataset. Also, keep in mind that a single Python file supports the creation of multiple output datasets.

Note that the sample code uses the DataFrame transform decorator. Alternatively, you can use:

  • The transform decorator—you should use this decorator if you’re writing a data transformation that depends on access to files as opposed to a dataset, or
  • The Pandas transform decorator—you should use this decorator if you’re working exclusively with the Pandas library and your input data can fit into memory.

For more information about creating Transform objects, which describe your input and output datasets as well as your transformation logic, refer to the section describing Transforms.

meta.yaml

A conda build recipe is a directory containing all the metadata and scripts required to build a conda ↗ package. One of the files in the build recipe is meta.yaml—this file contains all the metadata. For more information about the structure of this file, refer to the conda documentation on the meta.yaml file ↗. Here is the default conda_recipe/meta.yaml file:

Copied!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 # If you need to modify the runtime requirements for your package, # update the 'requirements.run' section in this file package: name: "{{ PACKAGE_NAME }}" version: "{{ PACKAGE_VERSION }}" source: path: ../src requirements: # Tools required to build the package. These packages are run on the build system and include # things such as revision control systems (Git, SVN) make tools (GNU make, Autotool, CMake) and # compilers (real cross, pseudo-cross, or native when not cross-compiling), and any source pre-processors. # https://docs.conda.io/projects/conda-build/en/latest/resources/define-metadata.html#build build: - python 3.9.* - setuptools # Packages required to run the package. These are the dependencies that are installed automatically # whenever the package is installed. # https://docs.conda.io/projects/conda-build/en/latest/resources/define-metadata.html#run run: - python 3.9.* - transforms {{ PYTHON_TRANSFORMS_VERSION }} - transforms-expectations - transforms-verbs build: script: python setup.py install --single-version-externally-managed --record=record.txt

If your Transforms Python project requires any additional build dependencies, you can use the package tab to discover available packages and automatically add these to your meta.yml file as described in the documentation on sharing Python libraries. This step will automatically detect the channel that produces the package you are trying to import and it will add it as a backing repository.

It is also possible to manually update the "requirements" section in this file. However, it is strongly recommended not to do so manually as you run the risk of requesting packages and versions that are not available, and which will subsequently cause Checks to fail on your repository. For any dependencies that you add, make sure that the required packages for your dependencies are available.

Note that it is unlikely you will need to modify sections other than "requirements".

Supported Python 3 versions

Palantir supports active versions of Python, adhering to the Python Software Foundation's end-of-life schedule. Refer to the Python version support page for more details.

Example usage:

Copied!
1 2 3 4 5 6 7 8 9 requirements: build: - python 3.9.* - setuptools # Any extra packages required to run your package. run: - python 3.9.* - transforms {{ PYTHON_TRANSFORMS_VERSION }}
  • Be sure that the Python dependencies in the build and run sections are identical. Mismatches between the Python dependencies can lead to undesired outcomes and failures.
  • Ranges such as python >=3.9 or python >3.9,<=3.10.11 are not supported for Python versions.

Pinning run-time versions

If your transforms require a specific library version to be pinned, and you wanted to manually add this rather than using the recommended package tab, you can explicitly specify this alongside the library name in the requirements block. Below is an example pinning:

Copied!
1 2 3 4 5 6 7 requirements: run: # The below pins an explicit version - mylibrary 1.0.1 # The below specifies a maximum version (version equal or lower): - scipy <=1.4.0

Note:

  • No space is allowed after the operator. e.g. scipy <= 1.4.0 will fail CI checks.
  • The operator >= for versions is not yet supported in Foundry.

Using pip-managed dependencies

If your transform requires a specific library that is not available through Conda, but is available when installed using pip ↗, you can declare those in the additional pip section. The dependency will be installed on top of your Conda environment. Below is an example of adding a pip dependency:

Copied!
1 2 3 4 5 6 7 8 9 10 11 requirements: build: - python 3.9.* - setuptools run: - python 3.9.* - transforms {{ PYTHON_TRANSFORMS_VERSION }} pip: - pypicloud

Note:

  • Dependencies added to the pip section are installed on top of the Conda environment that is derived from the packages in the run section. Therefore, removing run or build would cause failures.
  • The pip section can only be used in Transforms Python repositories, and cannot be used in Python libraries.