Initial setup overview

This guide will walk you through the process of connecting your organization's data to Foundry.

Before starting, it is important to recognize that this first step towards connecting your organizational data to Foundry is fundamentally a networking concept. The initial setup is best done by someone familiar with network engineering and aware of the organization's network topology and configurations, such as firewall rules.

Conceptual overview

Connecting data to Foundry requires that the following three components are installed or configured, in this order:

  1. Connection: Required to access a data source.
    • Agent: Connect to software running on your system; required to access private networks and on-premises data sources.
    • Direct connection: Connect to a data source over the Internet; preferred when connecting over a public network.
  2. Source / Connector: Used to access data external to Foundry.
  3. Sync: Ingest or export data into Foundry.

An Agent is a piece of Palantir software that runs within your organization’s network. The Agent functions as a secure intermediary between your organization’s data sources and your Foundry instance. An Agent connection is required to access sources running on private networks or on-premises systems. A single running Agent can support multiple sources and syncs.

Learn more about Agent architecture.

A direct connection is a connection to a data source that is accessible over the Internet, such as a REST API, an SFTP server, or an Azure storage account. You can configure a direct connection to avoid setting up an Agent while still receiving excellent uptime and performance. Direct connections require a network egress policy for your enrollment and connection credentials.

A source, or connector, is any sort of external data system that you connect to Foundry. For example, a source could be a Postgres database, an S3 bucket, a filesystem on a Linux server, an SAP instance, or a REST API on the internet. A configured source is required to establish any syncs to Foundry, and data must be synced from the source into a dataset before it can be used in Foundry.

A sync reads specific data from a source and ingests it into Foundry. For example, if you have a PostgreSQL database source that contains multiple tables, you might configure a sync to ingest one specific table into Foundry. Once a sync has successfully run, the result in Foundry will be a dataset to use across all of Foundry's data pipelining, model development, and analytical tools.

Roles and workflows

Most Foundry users will never need to set up a new Agent themselves. Agent setup requires an IT-focused skill set, though the same Agent can be reused to support multiple sources and syncs. Some organizations can operate long-term with Agents set up during the first week of a Foundry deployment. New Agents are only needed to access data that your existing Agents cannot access (due to network segmentation or data scale, for example) or to set up an additional Agent to allow for high availability.

The table below summarizes the configuration frequency and skill set required for maintaining the resources required for connecting to data:

ResourceFrequency of configurationTypical user roleKnowledge required
AgentRareIT / Network EngineerNetwork and firewall policies; Linux VMs; SSH
SourceOccasionalIT / Network Engineer; Data EngineerDebugging network access; credential management
SyncFrequentData Engineer; Data ScientistWriting SQL queries; managing files

High availability

We recommend setting up redundant hardware to establish a high availability (HA) architecture. High availability increases resiliency and allows no-downtime maintenance during operating hours.

Foundry offers HA at the source level, meaning that if a source is assigned to multiple Agents, Foundry will dispatch ingestions to one of the healthy Agents. We strongly recommend configuring Agents in a high availability setup at the start of source creation; adding extra Agents to a created source requires re-entering the credentials for that source.

The following best practices are recommended when setting up high availability:

  • Always install Agents by pairs, on similar hardware.
  • Give each Agent in a pair similar names, such as agent-1 and agent-2.
  • Systematically assign both Agents in a pair to every source.
  • Configure non-overlapping upgrade windows on both Agents in a pair. Upgrade windows should be during business days and provide sufficient soaking time. Doing so ensures that any unexpected issues with an update will be contained to a single Agent and can be detected by operators or administrators.

Next steps

To use direct connections to access data sources over the Internet, like a public REST API or an S3 bucket, start with the direct connection setup.

To connect to a data source that exists within your organization's network, start with Agent setup.