LLM capacity is a limited resource at the industry level, and all providers (Azure, OpenAI, AWS Bedrock, Google Cloud Vertex, etc.) limit the maximum capacity available per account. Palantir AIP consequently follows the market-level constraint set introduced by LLM providers. The standard unit of measure across the industry is tokens per minute (TPM) and requests per minute (RPM).
Palantir has set a certain maximum capacity for each enrollment, referred to as “enrollment-level rate limits”. This capacity is measured per model using TPM and RPM, and includes all models of all providers enabled on your enrollment, including GPT, Claude, Gemini, Llama, Mixtral and more. In this way, each model has a separate, independent capacity not affected by the usage of other models.
By default, all customers are on the medium tier, which is large enough to build prototypes and scale to a few use cases, even with hundreds of users and large datasets, including millions of documents for example.
Additionally, AIP offers the option to upgrade the enrollment capacity from the medium tier to a large or XL tier if you require additional capacity. If you are constantly hitting enrollment rate limits blocking you from expanding your AIP usage, or if you expect you will increase the volume of your pipelines or total number of users, contact Palantir Support.
Enrollment limits are now displayed on the AIP rate limits tab in the Resource Management application, along with the enrollment tier.
AIP offers enough capacity to build very large scale workflows with enrollment tiers, particularly the XL tier. These tiers have provided enough capacity for hundreds of Palantir customers using LLMs at scale, and we continue to increase these limits.
Enrollment administrators can navigate to the AIP rate limits page in the Resource Management application to configure the maximum percent of TPM and RPM that all resources within a given Project can utilize at every given minute combined, per model.
This means that you have the flexibility to maximize LLM utilization for production use cases in case of ambitious use cases in AIP, and limit or disallow experimental projects from saturating the entire enrollment capacity.
By default, all Projects are given a specific limit at which to operate. An admin can create additional Project limits, define which Projects are included in each limit, and what percent of enrollment capacity can be used.
Generally, AIP prioritizes interactive requests over pipelines with batch requests. Interactive queries are defined as any real-time interaction that a user has with an LLM, such as AIP Assist, Workshop, Agent Studio, preview in the AIP Logic LLM board, and preview in the Pipeline Builder LLM node. Batch queries are defined as a large set of requests sent without a user expecting an immediate response, for example Transforms pipelines, Pipeline Builder, Automate (for Logic).
This principle currently guarantees that 20% of capacity at the enrollment and Project level will always be reserved for interactive queries. This means that for a 100,000 TPM capacity for a certain model, only a maximum of 80,000 TPM can be used for pipelines at any given minute, while at least 20,000 TPM (and up to 100,000 TPM) is available for interactive queries.
Consider the following example: