Here is the standard structure for a bootstrapped Transforms Python repository:
transforms-python
├── conda_recipe
│ └── meta.yaml
└── src
├── myproject
│ ├── __init__.py
│ ├── datasets
│ │ ├── __init__.py
│ │ └── examples.py
│ └── pipeline.py
├── setup.cfg
└── setup.py
There are also additional files inside the repository that can be viewed by going to the Settings cog in the File Explorer tab in Code Repositories and selecting Show hidden files and folders. In almost all cases, these hidden files should not be edited; Palantir does not provide support for repositories with custom changes to these hidden files.
You can learn more about the following files below:
Make sure you go through the getting started guide before reading on. Also, this page assumes that you’re using the default project structure that’s included in a bootstrapped Transforms Python repository.
When you create the repository for the first time, it is boostrapped with the default contents of the latest Transforms Python template version at that time. During subsequent repository upgrades, files in the repository are upgraded to align with the contents of the most recent Transforms Python template version. Custom user changes to these files may be overwritten during the new upgrade template to ensure consistency. We do not support custom changes to these files as it can lead to unexpected behavior.
The following files will not be overwritten by a repository upgrade:
conda_recipe
and src
foldersbuild.gradle
filesThe following files will be merged with the newest Python template file during a repository upgrade. In the case of any common keys, the Python template's version is chosen:
gradle.properties
versions.properties
The remaining files will be overwritten by the upgrade to match the files of the newest Python template versions.
pipeline.py
In this file, you define your project’s Pipeline, which is a registry of the Transform objects associated with your data transformations. Here is the default src/myproject/pipeline.py
file:
Copied!1 2 3 4
from transforms.api import Pipeline from myproject import datasets my_pipeline = Pipeline() my_pipeline.discover_transforms(datasets)
Note that the default pipeline.py
file uses automatic registration to add your Transform objects to your project’s Pipeline. Automatic registration discovers all Transform objects in your project’s datasets
package. Thus, if you re-structure your project such that your transformation logic is not contained within the datasets
folder, make sure to update your src/myproject/pipeline.py
file appropriately.
Alternatively, you can explicitly add each of your Transform objects to your project’s Pipeline using manual registration. Unless you have a workflow that requires you to explicitly add each Transform object to your Pipeline, it’s recommended to use automatic registration. For more information about Pipeline objects, refer to the section describing Pipelines.
setup.py
In this file, you define a transforms.pipeline
entry point named root
that’s associated with your project’s Pipeline—this allows Transforms Python to discover your project’s Pipeline. Here is the default src/setup.py
file:
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
import os from setuptools import find_packages, setup setup( name=os.environ['PKG_NAME'], version=os.environ['PKG_VERSION'], description='Python data transformation project', # Modify the author for this project author='{{REPOSITORY_ORG_NAME}}', packages=find_packages(exclude=['contrib', 'docs', 'test']), # Instead, specify your dependencies in conda_recipe/meta.yml install_requires=[], entry_points={ 'transforms.pipelines': [ 'root = myproject.pipeline:my_pipeline' ] } )
If you modify the default project structure, you may need to modify the content in your src/setup.py
file. For more information, refer to the section describing the transforms.pipeline entry point.
examples.py
This file contains your data transformation code. Here is the default src/myproject/datasets/examples.py
file:
Copied!1 2 3 4 5 6 7 8 9 10 11
""" from transforms.api import transform_df, Input, Output from myproject.datasets import utils @transform_df( Output("/path/to/output/dataset"), my_input=Input("/path/to/input/dataset"), ) def my_compute_function(my_input): return utils.identity(my_input) """
After un-commenting the sample code, you can replace /path/to/input/dataset
and /path/to/out/dataset
with the full paths to your input and output datasets, respectively. If your data transformation relies on multiple datasets, you can provide additional input datasets. You must also update my_compute_function
to contain the code to transform your input dataset(s) to your output dataset. Also, keep in mind that a single Python file supports the creation of multiple output datasets.
Note that the sample code uses the DataFrame transform decorator. Alternatively, you can use:
For more information about creating Transform objects, which describe your input and output datasets as well as your transformation logic, refer to the section describing Transforms.
meta.yaml
A conda build recipe is a directory containing all the metadata and scripts required to build a conda ↗ package. One of the files in the build recipe is meta.yaml
—this file contains all the metadata. For more information about the structure of this file, refer to the conda documentation on the meta.yaml file ↗.
Here is the default conda_recipe/meta.yaml
file:
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
# If you need to modify the runtime requirements for your package, # update the 'requirements.run' section in this file package: name: "{{ PACKAGE_NAME }}" version: "{{ PACKAGE_VERSION }}" source: path: ../src requirements: # Tools required to build the package. These packages are run on the build system and include # things such as revision control systems (Git, SVN) make tools (GNU make, Autotool, CMake) and # compilers (real cross, pseudo-cross, or native when not cross-compiling), and any source pre-processors. # https://docs.conda.io/projects/conda-build/en/latest/resources/define-metadata.html#build build: - python 3.9.* - setuptools # Packages required to run the package. These are the dependencies that are installed automatically # whenever the package is installed. # https://docs.conda.io/projects/conda-build/en/latest/resources/define-metadata.html#run run: - python 3.9.* - transforms {{ PYTHON_TRANSFORMS_VERSION }} - transforms-expectations - transforms-verbs build: script: python setup.py install --single-version-externally-managed --record=record.txt
If your Transforms Python project requires any additional build dependencies, you can use the package tab to discover available packages and automatically add these to your meta.yml
file as described in the documentation on sharing Python libraries. This step will automatically detect the channel that produces the package you are trying to import and it will add it as a backing repository.
It is also possible to manually update the "requirements" section in this file. However, it is strongly recommended not to do so manually as you run the risk of requesting packages and versions that are not available, and which will subsequently cause Checks to fail on your repository. For any dependencies that you add, make sure that the required packages for your dependencies are available.
Note that it is unlikely you will need to modify sections other than "requirements".
Palantir supports active versions of Python, adhering to the Python Software Foundation's end-of-life schedule. Refer to the Python version support page for more details.
Example usage:
Copied!1 2 3 4 5 6 7 8 9
requirements: build: - python 3.9.* - setuptools # Any extra packages required to run your package. run: - python 3.9.* - transforms {{ PYTHON_TRANSFORMS_VERSION }}
python >=3.9
or python >3.9,<=3.10.11
are not supported for Python versions.If your transforms require a specific library version to be pinned, and you wanted to manually add this rather than using the recommended package tab, you can explicitly specify this alongside the library name in the requirements
block. Below is an example pinning:
Copied!1 2 3 4 5 6 7
requirements: run: # The below pins an explicit version - mylibrary 1.0.1 # The below specifies a maximum version (version equal or lower): - scipy <=1.4.0
Note:
scipy <= 1.4.0
will fail CI checks.>=
for versions is not yet supported in Foundry.If your transform requires a specific library that is not available through Conda, but is available when installed using pip ↗, you can declare those in the additional pip
section. The dependency will be installed on top of your Conda environment. Below is an example of adding a pip dependency:
Copied!1 2 3 4 5 6 7 8 9 10 11
requirements: build: - python 3.9.* - setuptools run: - python 3.9.* - transforms {{ PYTHON_TRANSFORMS_VERSION }} pip: - pypicloud
Note:
pip
section are installed on top of the Conda environment that is derived from the packages in the run
section. Therefore, removing run
or build
would cause failures.pip
section can only be used in Transforms Python repositories, and cannot be used in Python libraries.