Architecture

Palantir AIP (Artificial Intelligence Platform) is designed to scale across all types of end users, the world’s most demanding data-driven workloads, and myriad infrastructure substrates. To achieve this, the underlying service mesh operates atop a set of software-defined principles that are enforced by Palantir Apollo ↗.

As AIP's scope and capability set has expanded and become mission critical for many institutions, we have needed to ensure that:

Each of the hundreds of services within the platform run in a highly-available, redundant configuration. Beyond core backend services, this also includes front-end application services, analytics tools, application builders, and each constituent service used by each type of user.
All service upgrades are performed in a zero-downtime capacity, with granular monitoring that informs how upgrade strategies are deployed, monitored, and potentially rolled back. With Apollo as the backbone for all service orchestration fleet-wide, the level of secure automation far exceeds what is possible through manual or bespoke operations.
Auto-scaling across both the core services and the associated compute mesh leverage a consistent containerization paradigm. This is achieved through the Rubix ↗ engine, which underpins all of the platform's autoscaling infrastructure — and works in lockstep with the Apollo delivery platform.

Architecture header.

The service mesh, jointly powered by Rubix and Apollo, contains modular capabilities that integrate into existing enterprise architectures. Additionally, the platform architectures maximize future flexibility to ensure that customers continuously benefit from the latest technologies, whether developed by Palantir or the open-source community:

The storage architecture is not tied to any particular underlying paradigm. The platform makes use of several storage technologies at different tiers of the architecture. This includes blob storage (or HDFS), horizontally scalable key/value stores, horizontally scalable relational databases, and multi-modal time series subsystems, among many others.
The compute architecture is not tied to any particular underlying infrastructure. Different workloads at different tiers of the platform leverage specific runtimes, with flexibility engineered in at every level. Common runtimes for data integration include Apache Spark and Apache Flink, but external transformation engines can be used if desired. Palantir-authored engines power the Ontology and other capability sets that do not cleanly map to existing compute modalities.
We work hard to ensure that the most popular open languages are securely and consistently available within code-driven paradigms. This includes Python, SQL, and Java for data transformation; Python and R for machine learning workflows; and TypeScript and JavaScript for defining both workflows and frontend applications.
Security and Lineage are core to every operation in AIP, and are consistently enforced at every tier of the platform’s architecture. This ensures that no single service (or end user) is responsible for enforcing the enterprise’s existing security policies, or implementing the “bookkeeping” required to maintain provenance. From data to decision, the highly-available core services are designed to apply, enforce, and track governance policies that are configured, synchronized, and/or inherited.