This guide will walk you through the process of connecting your organization's data to Foundry.
Before starting, it is important to recognize that this first step towards connecting your organizational data to Foundry is fundamentally a networking concept. The initial setup is best done by someone familiar with network engineering and aware of the organization's network topology and configurations, such as firewall rules.
Connecting data to Foundry requires that the following three components are installed or configured, in this order:
An Agent is a piece of Palantir software that runs within your organization’s network. The Agent functions as a secure intermediary between your organization’s data sources and your Foundry instance. An Agent connection is required to access sources running on private networks or on-premises systems. A single running Agent can support multiple sources and syncs.
Learn more about Agent architecture.
A direct connection is a connection to a data source that is accessible over the Internet, such as a REST API, an SFTP server, or an Azure storage account. You can configure a direct connection to avoid setting up an Agent while still receiving excellent uptime and performance. Direct connections require a network egress policy for your enrollment and connection credentials.
A source, or connector, is any sort of external data system that you connect to Foundry. For example, a source could be a Postgres database, an S3 bucket, a filesystem on a Linux server, an SAP instance, or a REST API on the internet. A configured source is required to establish any syncs to Foundry, and data must be synced from the source into a dataset before it can be used in Foundry.
A sync reads specific data from a source and ingests it into Foundry. For example, if you have a PostgreSQL database source that contains multiple tables, you might configure a sync to ingest one specific table into Foundry. Once a sync has successfully run, the result in Foundry will be a dataset to use across all of Foundry's data pipelining, model development, and analytical tools.
Most Foundry users will never need to set up a new Agent themselves. Agent setup requires an IT-focused skill set, though the same Agent can be reused to support multiple sources and syncs. Some organizations can operate long-term with Agents set up during the first week of a Foundry deployment. New Agents are only needed to access data that your existing Agents cannot access (due to network segmentation or data scale, for example) or to set up an additional Agent to allow for high availability.
The table below summarizes the configuration frequency and skill set required for maintaining the resources required for connecting to data:
Resource | Frequency of configuration | Typical user role | Knowledge required |
---|---|---|---|
Agent | Rare | IT / Network Engineer | Network and firewall policies; Linux VMs; SSH |
Source | Occasional | IT / Network Engineer; Data Engineer | Debugging network access; credential management |
Sync | Frequent | Data Engineer; Data Scientist | Writing SQL queries; managing files |
We recommend setting up redundant hardware to establish a high availability (HA) architecture. High availability increases resiliency and allows no-downtime maintenance during operating hours.
Foundry offers HA at the source level, meaning that if a source is assigned to multiple Agents, Foundry will dispatch ingestions to one of the healthy Agents. We strongly recommend configuring Agents in a high availability setup at the start of source creation; adding extra Agents to a created source requires re-entering the credentials for that source.
The following best practices are recommended when setting up high availability:
agent-1
and agent-2
.To use direct connections to access data sources over the Internet, like a public REST API or an S3 bucket, start with the direct connection setup.
To connect to a data source that exists within your organization's network, start with Agent setup.