Usage types

Resource Transparency reports can be viewed in the Resource Management App, where users can see a breakdown of compute and storage resources consumed by Projects and Ontologies.

Foundry compute

Foundry is a platform that runs computational work on top of data. This work is measured using Foundry compute-seconds. Compute-seconds represent a unit of computational work in the platform and are used by both batch (long-running) and interactive (ad-hoc) applications. These compute-seconds can be driven by a variety of factors, including the number of parallelized compute executors, the size of the compute executors, the size of the data being worked on, and the complexity of the computational job.

Many compute frameworks in Foundry operate in a "parallel" manner, which means multiple executors are working on the same job at the same time. Parallelization significantly speeds up the execution of most jobs but uses more compute-seconds per unit time.

Wall-clock time

An important term to define is wall-clock time. Wall-clock time, also known as elapsed real time, is the actual time taken for a process from start to point of measurement, as measured by a clock on the wall or a stopwatch. In other words, wall-clock time is the difference (in seconds) between the time at which a task started and the time at which the task finishes. It is important to note that many Foundry compute-seconds can be used per wall-clock second and that different job types use compute-seconds at different rates, depending on their configuration.

A useful analogy for wall-clock time versus Foundry compute-seconds is the concept of human hours. Two people who each work 8-hour work days produce 16 human-hours worth of work, even though the wall-clock time they worked was only 8 hours.

Parallelized batch compute

Parallelized Batch compute represents queries or jobs that run in a "batch" capacity, meaning they are triggered to run in the background on a certain scheduled cadence or ad-hoc basis. Batch compute jobs do not consume any compute when they are not being run. Foundry will automatically allocate computational resources to these jobs as soon as they are triggered. Compute usage is metered as soon as the resources are provisioned and until they are relinquished from the job.

To provide insight into how compute resources are used across the platform, vCPU and memory usage is measured for individual jobs and reported on a dataset and object level.

Currently, the following batch compute jobs are monitored:

Parallelized interactive compute

Interactive compute represents queries that are evaluated in real-time, usually as part of an interactive user session. To provide fast responses, interactive compute systems maintain always-on idle compute, which means interactive queries are more expensive than batch evaluation.

Interactive usage is reported for each query - a query consumes its fair share of the backend application where it was scheduled. That usage is then rolled up on the Project level, much like batch compute.

Currently, queries from the following applications are included in interactive compute usage:

Parallelized continuous compute

Continuous compute is used by always-on processing jobs that are continuously available to receive messages and process them using arbitrary logic.

Continuous compute is measured for the length of time that the job is ready to receive messages and perform work.

Currently, usage from the following applications are included in continuous compute:

The ability to create a continuous compute job is not available on all Foundry environments. Contact your Palantir representative for more information if your use case requires it.

Units of measurement in parallelized compute

For parallel processing compute, we generate compute-seconds by measuring two metrics: core-seconds and memory-to-core ratio.

Parallelized core-seconds

Core-seconds reflect the number of vCPU cores used for the duration of job. For example, 2 cores used for 3 seconds results in 6 core-seconds. The duration of a job is the time between submitting the job and the job reporting completion. This includes spin-up time and job cleanup time.

To determine how many cores a given job used, the properties of the job are inspected. Specifically, the number of executors, the number of vCPU cores for the executors, and the driver are taken into consideration.

A common parallelized compute engine in the platform is Spark. Some users may only interact with Spark Configuration Service profiles, which provide pre-determined Spark configurations in “sizes”. Usually, these properties are specified in the job’s Spark profile or are set to system defaults.

For example:

spark.driver.cores = 3
spark.executor.cores = 2

In this example, the total core-seconds can be calculated in the following way:

core_seconds = (num_driver_cores + num_executor_cores * num_executors) * job_duration_in_seconds

Usage is based on allocated vCPU cores, not on the utilization of those cores. We recommend requesting only the necessary resources for your jobs to avoid over-allocation.

Memory-to-core ratio

Given that live computation hosts have a fixed memory-to-core ratio, we must consider how many GiB of memory are used per core. Let's say we have a host with four cores and 32GiB of memory. On this host, we could schedule four jobs, each one of them requesting one core and 8GiB of memory. However, if one of these jobs request more memory, 16GiB, other jobs cannot take advantage of the additional cores as there is insufficient memory. This means that one of the remaining jobs will require additional capacity. As a result, the ratio of memory-to-cores is a key part of the compute-second computation.

In Foundry, the default memory-to-core ratio is 7.5 GiB per core .

Parallelized core-seconds to compute-seconds

Foundry compute-seconds reflect both the number of vCPU cores and the amount of memory that was reserved for a job. Compute-seconds combine core-seconds with the amount of memory reserved.

In summary, we calculate compute-seconds by taking the maximum of two factors:

  • Cores used per task, and the
  • Memory-to-core ratio of the executor of the task.

This can be expressed with the following expression: max(num_vpcu, gib_ram / 7.5)

Consider the example below with the following characteristics:

  • Two Executors, each with one core and 12GiB RAM
  • Total wall-clock computation time is 5 seconds
vcpu_per_executor = 1
ram_per_executor = 12
num_executors = 2
num_seconds = 5

default_memory_to_core_ratio = 7.5
job_memory_multiplier = 12 / 7.5 = 1.6

job_core_seconds = num_vcpu * num_excutors * num_seconds
job_core_seconds = 1 * 2 * 5 = 10

job_compute_seconds = max(num_vcpu, job_memory_multiplier) * num_executors * num_seconds
job_compute_seconds = max(1vcpu, 1.6mem-to-core) * 2executors * 5sec
job_compute_seconds = 16 compute-seconds

We can see that while the job only used 10 core-seconds, it used 16 compute-seconds total due to the outsized memory request.

Query compute-seconds

In Foundry, there are various indexed stores that can be queried in an on-demand manner. Each of these indexed stores uses compute-seconds when executing their queries. For documentation on how queries use compute-seconds, refer to the following documentation.

Ontology volume

Foundry's Ontology and indexed data formats provide tools for fast, organization-centric queries and actions. These backing systems store the data in formats that are significantly more flexible for ad-hoc operational and analytical use cases. Ontology volume is measured in gigabyte-months.

Ontology Volume usage is tracked in the following systems:

  • Ontology objects (v1 & v2)
  • Postgate (Postgres interface, not available in all Foundry configurations)
  • Lime (legacy document store without Ontology mappings)

The size of the synced dataset may be different than the size in Foundry. This is because each system uses different layouts or compression to store and serve data.

Foundry storage

Foundry storage measures the general purpose data stored in the non-Ontology transformation layers in Foundry. Disk usage is measured in gigabyte-months.

Each dataset’s storage usage is calculated individually. Dataset branches and previous transactions (and views) impact how much disk space a single dataset consumes. When files are removed from a view with a DELETE transaction, the files are not removed from the underlying storage and thus continue to accrue storage costs. The total disk usage is calculated in two steps:

  • Looking at all the transactions that were ever committed or aborted on a dataset and summing up the size of the underlying files that were added.
  • Subtracting all the transactions that were delete using Retention to get the live disk space used.

The only way to reduce size is to use Retention to clean up unnecessary transactions. Commiting a DELETE transaction or updating branches does not reduce storage used.

Ontology volume usage attribution

Ontology volume usage is primarily attributed to the project of each object's input datasource. Foundry resources and objects appear side-by-side when viewing the granular usage details for any project as shown below.

Note: When usage is attributed to a Workshop application with embedded modules, it will account for any usage that occurs in its embedded modules.

Usage by resources and objects

Some objects are unattributable to a single project; for example, an object may have multiple input datasources that span multiple projects. In these cases, usage is attributed to the Ontology itself as below.

Usage by resources and objects

In general, objects accrue the following types of usage:

  • Foundry Compute captures compute used to index datasets to object types; in other words, the cost of syncing the Ontology.
  • Ontology Volume captures the size of the indexes of all object types.
  • Foundry Storage is empty for objects.

Usage units

Compute-second

All computational work done by all Foundry products is expressed as compute-seconds. In the Foundry platform, a compute-second is not a measurement of time, but rather a unit of work that the platform executes. The compute-second is the atomic unit of work in the platform, meaning it is the minimum granularity at which compute is measured in Foundry. See the table below for details on how each Foundry product type uses compute-seconds.

Gigabyte-month

All storage usage by all Foundry products is expressed as gigabyte-months, which is a measure of allocated storage over time. A 1GB data file consumes 1 GB-month of usage per month.

The storage volume is calculated hourly, and the gigabyte-months value is calculated from the total hourly measurements in that monthly period. For example, for a month with 30 days:

Days 0-3      - 0GB volume
Day 4, 06:00  - 3GB volume (3GB added)
Days 5-10     - 3GB volume (no change from day 3)
Day 11, 00:00 - 6GB volume (3GB added)
Days 11-20    - 6GB volume (no change)
Day 21, 00:00 - 3GB volume (3GB deleted)
Days 21-30    - 3GB volume (no change)

Total:
(0GB * 4 days + 3GB * (18hrs/24) days + 3GB * 6 days + 6GB * 10 days + 3GB * 10 days) / 30 days
   = 3.675 gigabyte-months of usage

Since the number of days in a month varies, the gigabyte-months generated per day by the same volume of storage will change per month. For example:

90GB stored for 1 day in a month with 30 days will consume:
(90GB * 1 day) / 30 days = 3 gigabyte-months

90GB stored for 1 day in a month with 31 days will consume:
(90GB * 1 day) / 31 days = 2.90 gigabyte-months

This means that, when viewing storage usage for a dataset of unchanging size, the gigabyte-months consumed by day or week will have some fluctuation; the gigabyte-months consumed for the whole month will not fluctuate.

List of Foundry applications and associated usage

Data transformation

Foundry applicationFoundry computeFoundry Ontology volumeFoundry storage
Code Repositories (Python, Java, SQL, GPU, Mesa)YesNoYes
Streaming repositoriesYesNoNo
Pipeline BuilderYesNoYes
PreparationYesNoYes
Data Connection (Agent-based)NoNoYes
Data Connection (cloud ingest)YesNoYes
Data HealthYesNoNo
Dataset projectionsYesNoNo
Object indexing (Phonograph2)YesYesNo
Time series indexingYesNoNo
RecipesYesNoNo

Analytics

Foundry applicationFoundry computeFoundry Ontology volumeFoundry storage
Code Workbook: SparkYesNoYes
Code Workbook: GPUYesNoYes
Contour analysisYesNoNo
Contour builds and dashboardsYesNoYes
ReportsYes (from other applications)NoNo
Restricted ViewsYesNoNo
NotepadYes (from other applications)NoNo
FusionNoYesYes (writeback)

Model and AI integration

Foundry applicationFoundry computeFoundry Ontology volumeFoundry storage
Foundry ML batchYesNoYes
Foundry ML liveYesNoNo

Ontology and application building

Foundry applicationFoundry computeFoundry Ontology volumeFoundry storage
Ontology objectsYesYesYes (export)
Ontology relationship tablesYesYesYes (export)
Ontology ActionsYesYes (writeback)No
Direct Object Storage V1 indicesYesYesYes (export)
Postgres indicesYesYesNo
Direct Lime indicesYesYesNo
Foundry RulesYesYesYes

Notes:

Yes (writeback) refers to the process of storing user edits or user created objects to the object set in Foundry.

Yes (export) refers to the process of storing user edits to the designated writeback dataset in Foundry.

Yes (from other applications) refers to the usage generated by other embedded Foundry applications, such as a Contour board embedded in a Notepad document.

Compute usage with AIP

AIP compute usage involves large language models (LLMs). Fundamentally, LLMs take text as an input and respond with text as an output. The amount of text input and output is measured in tokens. Compute usage for LLMs is measured in compute-seconds per some number of tokens. Different models may have different rates for compute usage, as described below.

Tokens in AIP

Tokens are defined as words (or parts of words) that constitute a countable unit for the underlying LLM. Different model providers have distinct definitions for what constitutes a token; for instance, OpenAI ↗ and Anthropic ↗. On average, tokens are around 4 characters long, with a character being a single letter or punctuation mark.

In AIP, tokens are consumed by applications that send prompts to and receive prompts from LLMs. Each of these prompts and responses consist of a measurable number of tokens. These tokens can be sent to multiple LLM providers; due to differences between providers, these tokens are converted into compute-seconds to match the price of the underlying model provider.

All applications that provide LLM-backed capabilities consume tokens when being used. See the following list for the set of applications that may use tokens when you interact with their LLM-backed capabilities.

  • AIP Assist
  • AIP Logic
  • AIP Error Enhancer
  • AIP Code Assist
  • Workshop LLM-backed tools
  • Quiver LLM-backed tools
  • Pipeline Builder LLM-backed tools
  • Direct calls to the Language Model Service (including both Python and TypeScript libraries)

Measuring compute with AIP

If you have an enterprise contract with Palantir, contact your Palantir representative before proceeding with compute usage calculations. The following section only details default compute second multipliers for tokens.

The following table lays out the compute-second rates for the available models. These prices are in compute-seconds per 10,000 tokens. Note that input and output tokens are priced differently by model providers, meaning their compute-second equivalent will also be different.

ModelFoundry cloud providerFoundry regionCompute seconds per 10k input tokensCompute seconds per 10k output tokens
GPT-3.5T ↗AWSNorth America25.233.6
AWSEU / UK21.328.4
AWSSouth America / APAC / Middle East17.323.1
GPT-3.5T 16k ↗AWSNorth America50.467.2
AWSEU / UK42.656.9
AWSSouth America / APAC / Middle East34.746.2
GPT-4 ↗AWSNorth America5041010
AWSEU / UK426853
AWSSouth America / APAC / Middle East347693
GPT-4 32k ↗AWSNorth America10102020
AWSEU/UK8531710
AWSSouth America / APAC / Middle East6931390
GPT-4 Turbo ↗AWSNorth America168504
AWSEU / UK142426
AWSSouth America / APAC / Middle East116347
GPT-4 Vision ↗AWSNorth America168504
AWSEU / UK142426
AWSSouth America / APAC / Middle East116347
GPT-4o ↗AWSNorth America43172
AWSEU / UK36145
AWSSouth America / APAC / Middle East30118
GPT-4o mini ↗AWSNorth America2.610.3
AWSEU / UK2.28.7
AWSSouth America / APAC / Middle East1.87.1
Anthropic Claude 2 ↗AWSNorth America137412
AWSEU / UK116349
AWSSouth America / APAC / Middle East95284
Anthropic Claude 3 ↗AWSNorth America52258
AWSEU / UK44218
AWSSouth America / APAC / Middle East35177
Anthropic Claude 3 Haiku ↗AWSNorth America4.321.5
AWSEU / UK3.618.2
AWSSouth America / APAC / Middle East3.014.8
Anthropic Claude 3.5 Sonnet ↗AWSNorth America52258
AWSEU / UK44218
AWSSouth America / APAC / Middle East35177
ada embedding ↗AWSNorth America1.68N/A
AWSEU / UK1.42N/A
AWSSouth America / APAC / Middle East1.16N/A
text-embedding-3-large ↗AWSNorth America2.24N/A
AWSEU / UK1.89N/A
AWSSouth America / APAC / Middle East1.54N/A
text-embedding-3-small ↗AWSNorth America0.34N/A
AWSEU / UK0.29N/A
AWSSouth America / APAC / Middle East0.24N/A
Mistral 7B ↗AWSNorth America3282
AWSEU / UK2769
AWSSouth America / APAC / Middle East2256
Mitral 8X7B ↗AWSNorth America96287
AWSEU / UK81243
AWSSouth America / APAC / Middle East66198
Llama 2_13B ↗AWSNorth America144478
AWSEU / UK122405
AWSSouth America / APAC / Middle East99329
Llama 2_70B ↗AWSNorth America144478
AWSEU / UK122405
AWSSouth America / APAC / Middle East99329
Llama 3_8B ↗AWSNorth America144478
AWSEU / UK122405
AWSSouth America / APAC / Middle East99329
Llama 3_70B ↗AWSNorth America144478
AWSEU / UK122405
AWSSouth America / APAC / Middle East99329
Llama 3.1_8B ↗AWSNorth America158525
AWSEU / UK133444
AWSSouth America / APAC / Middle East108361
Llama 3.1_70B ↗AWSNorth America158525
AWSEU / UK133444
AWSSouth America / APAC / Middle East108361
Snowflake Arctic Embed ↗AWSNorth America3838
AWSEU / UK3232
AWSSouth America / APAC / Middle East2626
Gemini 1.5 Flash ↗AWSNorth America1.35.2
AWSEU / UK1.14.4
AWSSouth America / APAC / Middle East0.93.5
Gemini 1.5 Pro ↗AWSNorth America2186
AWSEU / UK1873
AWSSouth America / APAC / Middle East1559

AIP routes text directly to backing LLMs which run the tokenization themselves. The size of the text will dictate the amount of compute that is used by the backing model to serve the response.

It's also important to understand tokenization. Take the following example sentence that is sent to the GPT-4 model.

The quick brown fox jumps over the lazy dog.

This sentence contains 44 characters and will tokenize in the following way, with a | character separating each token:

The| quick| brown| fox| jumps| over| the| lazy| dog|.

This sentence therefore contains 10 tokens and will use the following number of compute-seconds:

compute-seconds = 10 tokens * 504 compute-seconds / 10,000 tokens
compute-seconds = 10 * 504 / 10,000
compute-seconds = 0.504

Understanding drivers of compute usage with AIP

Usage of compute-seconds resulting from LLM tokens is attached directly to the individual application resource that requests the usage. For example, if you use AIP to automatically explain a pipeline in Pipeline Builder, the compute-seconds used by the LLM to generate that explanation will be attributed to that specific pipeline. This is true across the platform; keeping this in mind will help you track where you are using tokens.

In some cases, compute usage is not attributable to a single resource in the platform; examples include AIP Assist and Error Explainer, among others. When usage is not attributable to a single resource, the tokens will be attributed to the user folder initiating the use of tokens.

We recommend staying aware of the tokens that are sent to LLMs on your behalf. Generally, the more information that you include when using LLMs, the more compute-seconds will be used. For example, the following scenarios describe different ways of using compute-seconds.

  • In Pipeline Builder, you can ask AIP to explain your transformation nodes; the number of selected nodes affects the number of tokens used by the LLM to generate a response, and thus compute-second usage. This is because as the number of nodes increases, so does the amount of text the LLM must process regarding the configuration of those nodes.
  • In AIP Assist, asking the LLM to generate large blocks of code requires more output tokens. Shorter responses use fewer tokens and thus less compute.
  • In AIP Logic, sending large amounts of text with your prompts requires more tokens and thus more compute-seconds.