Getting started

Beta

The Compute Modules feature is in a beta state and may not be available on your enrollments.

You can also view this documentation in the platform within the Compute Modules application for an efficient developer experience.

To get started with compute modules, you can use your preferred developer environment. In a few minutes, you will be able to create and deploy a compute module and test it in Foundry.

In Foundry, choose a folder and select + New > Compute Module, then follow the steps in the dialog to start with an empty compute-module backed function or pipeline. Follow the documentation below for next steps depending on your execution mode, or, for a more seamless experience, select the Documentation tab within your compute module to follow along with in-platform guidance.

Build a compute module-backed function

In the following sections, we will use the open-source Python library ↗. If you prefer to create your own client or implement your compute module in another language not supported by the SDKs, review the documentation on how to implement the custom compute module client.

Prerequisites:

Write the compute module in your local machine

  1. Begin by creating a new directory for your compute module.
  2. Create a file called Dockerfile in the directory.
  3. Copy and paste the following into the Dockerfile:
# Change the platform based on your Foundry resource queue
FROM --platform=linux/amd64 python:latest

COPY requirements.txt .
RUN pip install -r requirements.txt
COPY src .

# USER is required to be non-root and numeric for running compute modules in Foundry
USER 5000
CMD ["python", "app.py"]
  1. Create a new file called requirements.txt. This file specifies dependencies for our Python application. Copy and paste the following into the file:
foundry-compute-modules
  1. Create a new subdirectory called src. This is where we will store our Python application.
  2. Inside the src directory, create a file called app.py.
  3. Your directory should now look like this:
MyComputeModule
├── Dockerfile
├── requirements.txt
└── src
    └── app.py
  1. Inside app.py, copy and paste the following code:
from compute_modules.annotations import function

@function
def add(context, event):
    return str(event['x'] + event['y'])

@function
def hello(context, event):
    return 'Hello' + event['name']

Understand your function code

When working with compute module functions, your function will always receive two parameters: event objects and context objects.

Context object: A Python dict object parameter containing metadata and credentials that your function may need. Examples include user tokens, source credentials, and other necessary data. For example, If your function needs to call the OSDK to get an Ontology object, the context object includes the necessary token for the user to access that Ontology object.

Event object: A Python dict object parameter containing the data that your function will process. Includes all parameters passed to the function, such as x and y in the add function, and name in the hello function.

If you use static typing for the event/return object, the library will convert the payload/result into that statically-typed object. Review documentation on automatic function schema inference for more information.

The function result will be wired as a JSON blob, so be sure the function is able to be serialized into JSON.

Create your first container

Now, you can publish your code to Foundry using an Artifact repository, which will be used to store your Docker images.

  1. Create or select an Artifact repository to publish your code to Foundry. To do this, navigate to the Documentation tab of your compute module. Then, find the corresponding in-platform documentation section to this external documentation page: Build a compute module-backed function > Create your first container. There, you can Create or select repository.
  2. On the next page, select the dropdown menu to choose Publish to DOCKER and follow the instructions on the page to push your code.
  3. In the Configure tab of your compute module, select Add Container. Then, select your Artifact repository and the image you pushed.
  4. Select Update configuration to save your edits.
  5. Once the configuration is updated, you can start the compute module from the Overview page, test it using the bottom Query panel, and view the logs.

Build a compute module-backed pipeline

Compute modules can operate as a connector between inputs and outputs of a data pipeline in a containerized environment. In this example, you will build a simple use case with streaming datasets as inputs and outputs to the compute module, define a function that doubles the input data, and write it to the output dataset. You will use notional data to simulate a working data pipeline.

Prerequisites

Write the compute module to your local machine

  1. Create a new directory for your compute module.
  2. Create a file called 1Dockerfile in the directory.
  3. Copy and paste the following into the Dockerfile:
# Change the platform based on your Foundry resource queue
FROM --platform=linux/amd64 python:latest

COPY requirements.txt .
RUN pip install -r requirements.txt
COPY src .

# USER is required to be non-root and numeric for running compute modules in Foundry
USER 5000
CMD ["python", "app.py"]
  1. Create a new file called requirements.txt. Store your dependencies for your Python application in this file. For example:
requests == 2.31.0
  1. Create a new subdirectory called src. This is where you will put your Python application.
  2. Inside the src directory, create a file called app.py.
  3. Your directory should now look like the following:
MyComputeModule
├── Dockerfile
├── requirements.txt
└── src
    └── app.py
  1. Import the following modules in app.py:
import os
import json
import time
import requests
  1. Inside app.py, get the bearer token for input and output access:
with open(os.environ['BUILD2_TOKEN']) as f:
    bearer_token = f.read()
  1. Inside app.py, get input and output information:
with open(os.environ['RESOURCE_ALIAS_MAP']) as f:
    resource_alias_map = json.load(f)

input_info = resource_alias_map['identifier you put in the config']
output_info = resource_alias_map['identifier you put in the config']

input_rid = input_info['rid']
input_branch = input_info['branch'] or "master"
output_rid = output_info['rid']
output_branch = output_info['branch'] or "master"
  1. Inside app.py, interact with inputs and outputs and perform computations. For example:
FOUNDRY_URL = "yourenrollment.palantirfoundry.com"

def get_stream_latest_records():
    url = f"https://{FOUNDRY_URL}/stream-proxy/api/streams/{input_rid}/branches/{input_branch}/records"
    response = requests.get(url, headers={"Authorization": f"Bearer {bearer_token}"})
    return response.json()

def process_record(record):
    # Assume input stream has schema 'x': Integer
    x = record['value']['x']
    # Assume output stream has schema 'twice_x': Integer
    return {'twice_x': x * 2}

def put_record_to_stream(record):
    url = f"https://{FOUNDRY_URL}/stream-proxy/api/streams/{output_rid}/branches/{output_branch}/jsonRecord"
    requests.post(url, json=record, headers={"Authorization": f"Bearer {bearer_token}"})
  1. Inside app.py, run your code as an autonomous task. For example:
while True:
    records = get_stream_latest_records()
    processed_records = list(map(process_record, records))
    [put_record_to_stream(record) for record in processed_records]
    time.sleep(60)

Deploy your compute module

  1. Build a Docker image and publish it to the Artifact repository.
  2. Finally, deploy the compute module using pipeline execution mode after selecting the relevant container image.

You can now view the results streamed live in the output dataset.

Understand your pipeline code

To interact with inputs and outputs, we provide a bearer token and input/output information.

You can then write code to interact with the inputs and outputs and perform computations. The code snippets provide a simple example of pipelining two stream datasets:

  • It reads the latest records from the input stream dataset using the bearer token and input info by calling the stream-proxy service.
  • It then performs computations (in the above example, doubling the data). The data format depends on your own input data.
  • Next, it writes results to the output stream dataset using the bearer token and output info.
  • Finally, as you cannot query a pipeline mode compute module, the code runs the pipeline autonomously at the end of the script, which will be executed on container start.

Create your first container

Now, you can publish your code to Foundry using an Artifact repository, which will be used to store your Docker images.

  1. Create or select an Artifact repository to publish your code to Foundry. To do this, navigate to the Documentation tab of your compute module. Then, find the corresponding in-platform documentation section to this external documentation page: Build a compute module-backed function > Create your first container. There, you can Create or select repository.
  2. On the next page, select the dropdown menu to choose Publish to DOCKER and follow the instructions on the page to push your code.
  3. In the Configure tab of your compute module, select Add Container. Then, select your Artifact repository and the image you pushed.
  4. Select Update configuration to save your edits.
  5. Once the configuration is updated, you can start the compute module from the Overview page, test it using the bottom Query panel, and view the logs.