This page describes the core concepts used throughout Data Connection.
A Source represents a single connection, including any configuration necessary to specify the target system and the credentials required to successfully authenticate. Sources must be configured with a particular runtime depending on the networking between the Palantir platform and the target system. The runtime also defines where any capabilities used with the source will be run.
A source is set up based on a particular connector (also referred to as a source type). A broad range of connectors are available in the Palantir platform, designed to support the most common data systems across organizations. Depending on the connector and runtime selected, different capabilities may be available.
For systems without a dedicated connector, the generic connector or REST API source may be used with code-based connectivity options such as external transforms, external functions, and compute modules.
A credential is a secret value that is required to access a particular system; that is, credentials are used for authentication. Credentials can be passwords, tokens, API keys, or other secret values. In the Palantir platform, all credentials are encrypted and stored securely. Depending on the runtime, secrets may be stored locally on a Data Connection agent, or directly in the platform.
Some sources are able to authenticate without storing any secrets, such as when using OpenID connect, outbound applications, or a cloud identity.
Sources must be configured with a runtime. The runtime defines the networking configuration and where capabilities are executed.
Palantir allows you to use three different runtimes to connect to your systems. In general, if the system to which you're connecting can accept inbound connections from the network where your Foundry instance is hosted, you should use a direct connection runtime. If this is not possible, and the source type you are using supports an agent proxy, this is the preferred agent-based option. If neither of the other runtimes are available, you should fall back to using an agent worker.
Not all runtimes are available for all source types.
Runtime option | Networking | Capability execution |
---|---|---|
Direct connection [recommended] | The target system must allow direct inbound traffic from Palantir; for standard Foundry instances, this usually means allowing inbound traffic from the standard egress IP addresses viewable in Control Panel or the Data Connection application. | Capabilities are executed in Foundry. |
Agent proxy | An agent installed on your infrastructure is used to reverse proxy traffic to systems that are not reachable through a direct connection. | Capabilities are executed in Foundry. |
Agent worker | An agent installed on your infrastructure is used to run jobs that interact with your target system and separately push or pull data from Foundry. | Capabilities are executed on a customer-provided Linux host. |
Direct connections enable users to connect to data sources accessible over the Internet without needing to set up an agent. This is the preferred source connection method if the data source is accessible over the Internet; it avoids the operational overhead of setting up and maintaining agents and offers high uptime and performance. Learn how to set up a direct connection.
When using direct connections with a Foundry instance hosted on-premise, the target system must be reachable from the network where your Foundry instance is running. If this is not the case, you must use one of the agent-based runtime options.
An agent is a piece of software provided by Palantir that runs on a host within your network. The agent connects to your source systems, and can also communicate with Foundry. An agent is required in order to use the agent proxy and agent worker runtimes. The same agent may be used as an agent proxy or agent worker, which is determined when using the agent with a particular source.
Learn more about how to set up an agent by following this tutorial.
The agent proxy runtime is used to connect to data sources not accessible over the Internet. The agent acts as an inverting network proxy, forwarding network traffic originating in Foundry into the network where the agent is deployed, and relaying traffic back to Foundry. This allows capabilities in Foundry to work almost exactly the same as when using a direct connection but without requiring you to allow inbound network traffic to your systems originating from Foundry's IP addresses.
For high availability, multiple agents can be configured with non-overlapping maintenance windows to ensure there is always an active agent to proxy connections to the target systems reachable through the agent proxy. Learn how to set up an agent proxy runtime.
The agent worker runtime is used to connect to data sources not accessible over the Internet. An agent worker should only be used when the desired connector does not support the agent proxy runtime. Agent worker runtimes are associated with a single or multiple agents that store the source configuration and credentials locally in an encrypted format, and run source capabilities on the agent itself. Learn how to set up a source with an agent worker runtime.
Sources may support a variety of capabilities, where each capability represents some functionality that can run over the source connection. There are a wide range of supported capabilities for bringing data into Foundry, pushing data out of Foundry, virtualizing data stored outside of Foundry, and making interactive requests to other systems.
A summary of available capabilities is included in the following table. For more information about capabilities supported for a specific connector, refer to that connector's documentation page.
Capability | Description |
---|---|
Batch syncs | Sync data from an external source to a dataset. |
Streaming syncs | Sync data from an external message queue to a stream. |
Change data capture (CDC) syncs | Sync data from a database to a stream with CDC metadata. |
Media syncs | Sync data from an external source to a media set. |
HyperAuto | Sync an entire system automatically. |
File exports | Push data as files from a dataset to an external system. |
Table exports | Push data with a schema from a dataset to an external database. |
Streaming exports | Push data from a stream to an external message queue. |
Webhooks | Make structured requests to an external system interactively. |
Virtual tables | Register data from an external data warehouse to use as a virtual table. |
Virtual media | Register unstructured media from an external system as a media set. |
Exploration | Interactively explore the data and schema of an external system before using other capabilities. |
Use in code | Use a source in code to extend or customize any functionality not covered by the point-and-click-configurable capabilities listed above. |
Additional capabilities are being developed, and capability coverage is regularly updated in the documentation for specific connectors.
Supported capabilities for specific connectors are also displayed on the new source page in the Data Connection application. It is possible to search both by connector name and by capability. The example below shows the results of a search for sources that support a "virtual" option.
Batch syncs read data from an external system and write it into a Foundry dataset. A batch sync defines what data should be read and which dataset to output into in Foundry. Batch syncs can be configured to sync data incrementally and allow syncing data both with and without a corresponding schema. Learn how to set up a sync.
In general, there are two main types of batch syncs:
Streaming syncs provide the ability to stream data from systems that provide low latency data feeds. Data is delivered into a streaming dataset. Some examples of systems that support streaming syncs include Kafka, Amazon Kinesis, and Google Pub/Sub.
Learn more about streaming syncs.
Change data capture (CDC) syncs are similar to streaming syncs, with additional changelog metadata automatically propagated to the streaming dataset where data is delivered. This type of sync is normally used for databases that support some form of low-latency replication. Learn more about change data capture syncs.
Media syncs allow importing media data into a media set. Media sets provide better tooling than standard datasets for ingesting, transforming, and consuming media data throughout Foundry. When dealing with PDFs, images, videos, and other media, we recommend using media sets over datasets. Learn more about media syncs.
HyperAuto is a specialized capability that can dynamically discover the schema of your SAP system and automate syncs, pipelines, and creation of a corresponding ontology within Foundry. HyperAuto is currently only supported for SAP. Learn more about HyperAuto.
File exports are the opposite of file batch syncs. When doing a file export, data is taken directly from the underlying files contained within a Foundry dataset, which are written as-is to a filesystem location in the target system. Learn more about file exports.
Table exports are the opposite of table batch syncs. When performing a table export, data is exported as rows from a Foundry dataset with a schema, which are then written to a table in the target system. Learn more about table exports.
Streaming exports are the opposite of streaming syncs. When doing a streaming export, data is exported from a Foundry stream, and records are written to the specified streaming queue or topic in the target system. Learn more about streaming exports.
Webhooks represent a request to a source system outside of Foundry. Webhook requests can be flexibly defined in Data Connection to enable a broad range of connections to external systems. Learn more about webhooks.
Virtual tables represent the ability to register tabular data from an external system into a virtual table resource in Foundry.
In addition to registering individual virtual tables, this capability also allows for dynamic discovery and automatic registration of all tables found in an external system.
Learn more about virtual tables.
Virtual media works similarly to media syncs, allowing media from an external system to be used in a media set but without copying the data into Foundry. Instead, media files contained in an external system can be registered as virtual media items in a specific media set.
Learn more about virtual media.
The interactive exploration capability allows you to see what data is contained in an external system before performing syncs, exports, or other capabilities that interact with that system.
Exploration is most commonly used to check that a connection is working as intended and that the correct permissions and credentials are being used to connect.
The ability to use connections from code is intended to allow developers to extend and customize connections from Foundry to other systems. Palantir's general principle is that anything possible in the platform using dedicated connectors and point-and-click configuration options should also be achievable by writing custom code. At any point, developers should be able to switch to code-based connectivity for more granular control over the functionality or performance of workflows that perform external connections.
Any connector may be used in code; in most cases, we recommend using either the REST API source or generic connector when connecting from code.
Use in code option | Description |
---|---|
External transforms | External transforms allow transforms written in Python to communicate with external systems. External transforms are a code-based alternative for file batch syncs, file exports, table batch syncs, table exports, and media syncs. External transforms may also be used to register data into virtual media sets and virtual tables. |
External functions (webhooks) | External functions written in TypeScript support importing a source in order to invoke existing webhooks defined on that source. This allows existing webhook calls to be wrapped in custom typescript logic and error handling. |
External functions (direct) | External functions now allow direct calls to external systems using fetch for TypeScript and requests for Python. External functions are a code-based alternative for webhooks.External functions with direct external calls are not yet generally available. |
Compute modules | Compute modules allow for long-running compute and writing connections in arbitrary languages. Compute modules may be used as a code-based alternative for streaming syncs, streaming exports, change data capture syncs, and webhooks. Using sources in compute modules is not yet generally available. |
External models | External models currently do not support importing sources. Instead, you must use network egress policies directly. |
Code workspaces | Code workspaces currently do not support importing sources. Instead, you must use network egress policies directly. |
Code workbooks | Code workbooks currently do not support external connections. |
Not all source configurations allow usage of the Use in code capability. Agent worker connections are not supported, and some credential types such as cloud identity, outbound application, and OIDC may not currently be used from code.
Data connection also includes a variety of other concepts relevant for specific workflows. Some concepts were previously used and are now sunset, but are retained here for reference.
Historically, the term Sync was used in a generic way to refer to bringing data into Foundry. Syncs are now separated into the more specific capabilities listed above. More details are available for each capability, such as batch syncs, streaming syncs, change data capture syncs, media syncs, and so on.
The plugin framework used to implement connectors allows custom extensions called tasks. Tasks represent a unit of functionality configured by providing YAML and implemented in Java as part of the Data Connection plugin. Palantir has stopped developing new tasks, and all officially supported capabilities have migrated away from using tasks.
Tasks are currently considered Sunset
according to the product life cycle. Tasks will eventually be fully deprecated, and we strongly recommend using code-based connectivity options anywhere tasks may have previously been required.