Agent worker

To connect to systems within your network that cannot accept inbound network traffic from Foundry, you can use an intermediary agent either as an agent worker or agent proxy. This page describes the configuration options available for the agent worker runtime and assumes you are already familiar with data connection agents.

You should only use an agent worker runtime if your target system cannot accept inbound network traffic from Foundry, and the connector does not support the agent proxy runtime.

Set up an agent

You must set up an agent to use the agent worker runtime. When configuring a source, you can select the Agent worker option and choose at least one available agent. Capabilities run over that source connection will be executed as jobs on the agent worker host.

Capability execution

When using an agent worker runtime, capabilities are executed by running Java code on the agent host directly. These Java processes may pull data from your systems and push data up to Foundry (as with batch syncs), or pull data from Foundry and push to your systems (as with exports).

This execution model comes with some downsides including:

  • Potential classpath conflicts on the agent.
    • This is particularly relevant for custom JDBC workflows, since custom JARs may conflict with dependencies that ship from Foundry and run on the agent classpath.
  • Contention between jobs running on the agent.
    • Jobs may arbitrarily use up to the entire memory and disk space allocated to the agent process and may cause other jobs to never start or crash.
    • This is particularly problematic for webhooks run with an agent worker runtime. In this scenario, we strongly recommend a dedicated agent for webhook executions so long-running syncs do not prevent short-running webhooks from executing.
  • Lack of support for some capabilities.
    • Capabilities such as virtual tables and virtual media are incompatible with the agent worker runtime, since these require synchronous connections directly from Foundry and cannot run as jobs on the agent.

Memory allocation and usage

Agent memory is one of the key factors determining the performance of capabilities executed using an agent worker runtime.

The primary settings available for agent memory are:

  • JVM heap: Configured on the agent settings page, and indicates how much memory the agent should allocate on startup. This memory will be taken by the agent from the OS perspective and must be less than the available memory on the host machine. The default JVM heap value is 1 GB.
  • Host memory: Based on the specification of the machine where the agent is installed. We recommend at least 16 GB of memory.

Actual memory usage observed on the host will vary based on the workload currently being executed by the agent worker, including other processes running on the same host.

When observing and monitoring memory usage for an agent used as an agent worker, there are two primary metrics:

  • OS physical memory usage: The actual total memory usage on the agent host, including the agent process but also any other processes running on the same host. This may go beyond the allocated JVM heap size for the agent process, and up to the full physical memory available.
  • Agent memory usage: The memory usage by the agent, which will always be smaller than the configured JVM heap size.

For information on monitoring memory usage on agents, review agent metrics and health monitoring.

Load balancing

When using an agent worker runtime, multiple agents may be assigned to a single source connection. An agent is assigned jobs to execute specific capabilities configured on assigned source connections. Jobs are executed on one of the available agents at the time when the job is started.

Jobs are assigned to the agent with the largest available bandwidth. The bandwidth is calculated as:

(Maximum concurrent syncs) - (currently running batch syncs) = bandwidth

Maximum concurrent syncs defaults to 16, and is configurable under Agent settings. The Maximum concurrent syncs quota is enforced across all capabilities and all assigned sources, meaning that any run of any capability on any source uses up one unit of the available concurrent sync quota. This also includes legacy data connection tasks. If the bandwidth is zero on all available agents, or if the assigned agent has positive bandwidth but more than the maximum concurrent syncs currently running, jobs will be queued.

Only batch syncs are considered when calculating bandwidth. This means that other capabilities running on the agent will be ignored for the purposes of allocating additional jobs. If your primary agent workloads are streaming syncs, change data capture syncs, exports, or other capabilities, you may see unexpected behavior when allocating jobs in a multi-agent setup.

There are no guarantees that jobs will be distributed evenly across multiple available agents with the same bandwidth value.

In general, we do not recommend using multiple agents as a way to load balance a larger workload than could be successfully run on a single agent. The primary intended use of multiple agents is to allow for agents being taken offline for maintenance. For optimal performance and reliability, we recommend that each agent in a multi-agent setup should be able to handle the full set of capabilities configured on the assigned source connection(s).

Direct vs data proxy upload strategy

Agent worker runtimes support two options to specify how data from batch syncs should be uploaded to the Palantir platform:

Data proxy mode

In data proxy mode, data is uploaded using the public Foundry API using the data proxy service. This uses the same API gateway that is used when calling the Foundry API to read and write datasets.

Agents configured to use data proxy mode will contain the following in the agent configuration YAML:

Copied!
1 2 3 4 destinations: foundry: uploadStrategy: type: data-proxy

Direct mode [Sunset]

Direct mode is not available on new agents or on enrollments set up after June 2024. Data proxy mode is the default and only option supported for new agents. Agents previously configured to use direct mode will continue to be supported as long as the public IPs of the host where the agent is installed do not change.

In direct mode, data is uploaded directly to the underlying storage buckets in the Foundry data catalog. While providing performance improvements, this is only possible with custom network configuration by Palantir support, and is not available on our latest cloud infrastructure.

Agents configured to use direct mode will contain the following in the agent configuration YAML:

Copied!
1 2 3 4 destinations: foundry: uploadStrategy: type: direct

Custom JDBC drivers

Information on how to add custom JDBC drivers to an agent can be found in the documentation for the JDBC (Custom) connector. Drivers must be signed by Palantir and added directly to the agent to work with the agent worker runtime.

For agent proxy and direct connection runtimes, custom drivers are added directly in the JDBC (Custom) connector user interface as explained in our documentation. These drivers do not need to be signed by Palantir.

Credentials

One unique aspect of the agent worker runtime is that credentials are never stored in Foundry. Instead, at the time when credentials are input in the Data Connection user interface, they are encrypted with the public key of each agent assigned to the source. The encrypted credentials are stored on each respective agent.

This means that the following caveats and restrictions apply to credential configuration when using the agent worker runtime:

  • If the set of agents assigned to the source changes, credentials must be re-entered in Data Connection.
  • If an agent is reprovisioned using a fresh download link, credentials will not be automatically transferred and must be re-entered in Data Connection.

More information on moving agents between directories and hosts is covered in the agent configuration reference documentation, including instructions for retaining encrypted credentials when moving an existing agent directory.

Certificates

Agents communicate with both Foundry and your internal network. This means that agents need to have the correct certificates in their truststores for these connections to be established.

There are two situations that may require additional certificates to be configured on an agent:

Certificates to allow agent communication with Foundry

Certificate requirements for agents to communicate with Foundry are covered in the agent configuration documentation and are required whether the agent will be used as an agent proxy or agent worker.

Certificates to allow agent communication with your systems

When an agent is used as an agent worker, additional certificates may be required for Java processes running on the agent to successfully communicate with your systems. New certificates may need to be added for each new source connection, and these certificates should be updated if they expire or are rotated.

When required certificates are missing, you will see errors like the following when attempting to use a source capability such as exploration:

Wrapped by: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException:
PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException:
unable to find valid certification path to requested target

Follow these instructions to add additional certificates for connecting to specific source systems.

Switch source from agents to direct connection

If an agent-based source reaches out to a system that is accessible over the Internet, it should be migrated to a direct connection runtime. Follow the steps below to perform this migration.

  1. Navigate to Connection settings > Connection details and select Switch to direct cloud connection.
Switch to direct cloud connection
  1. Start the migration by following the instruction in the walkthrough dialog.
Start migration
  1. Choose a representative agent. This should be a healthy agent that can supply source secrets and drivers, if required.
Choose agent
  1. (Optional) Configure a driver.
Configure driver
  1. Add egress policies.
Add egress policies

To create new egress policies, you must have access to the workflow titled Manage network egress configuration in Control Panel, which is granted to the Information Security Officer role.

  1. Select Migrate to complete this process.
Migrate

Troubleshooting

This section describes situations that may occur during the migration, as well as suggested resolution steps. The migration is reversible.

Could not resolve type id as a subtype of 'com.palantir.magritte.api.Source'

Suggested resolution:

This happens when a dependency required for the source cannot be found. Ensure that you have configured all required certificates, proxies, and drivers required for a particular source, then retry the migration.

UnknownHostException

Suggested resolution:

  • Ensure that correct egress policies are assigned to the source.
  • Confirm that Foundry is able to access the endpoint that is throwing the exception.

Driver class not found

Suggested resolution: Confirm that the correct driver is uploaded to the JDBC source.

PKIX path building failed

Suggested resolution: Ensure that correct certificates are added to the source.

Add a private key

If the system you are connecting to requires mutual TLS (mTLS), you must manually add a private key to the agent.

The default bootstrapper keystore and truststore are regenerated any time the agent is restarted, and any changes made to the default keystore will be overridden on restart. The below instructions explain how to override the default keystore to point at a custom keystore in a different location on the agent host, and how to modify this custom keystore to add your private key.

  1. Copy the default bootstrapper keystore and store it in a separate location on the agent host. Run the following commands with the same username that is running the agent on the host. You may choose to name the folder security or according to your preferences.

    Copied!
    1 2 $ mkdir /home/<username>/security $ cp <bootvisor_root>/var/data/processes/<bootstrapper_dir>/var/conf/keyStore.jks /home/<username>/security/
  2. Import the keys from the customer-provided keystore into the copied agent keystore using the Java keytool command line tool. If this tool is not already installed, find it in the bin directory of the JDK that is bundled with the agent.

    Copied!
    1 2 3 4 $ keytool -importkeystore -srckeystore <CUSTOM_KEYSTORE.jks> -destkeystore /home/<username>/security/keyStore.jks Importing keystore CUSTOM_KEYSTORE.jks to keyStore.jks... Enter destination keystore password: keystore Enter source keystore password:
    • You can verify that the key/keys were added to the copied keystore using the keytool -list command:

      Copied!
      1 2 3 4 5 6 7 8 9 10 11 $ keytool -list -keystore /home/<username>/security/keyStore.jks Enter keystore password: Keystore type: jks Keystore provider: SUN Your keystore contains 2 entries <CUSTOM_KEY>>, 15-Dec-2022, PrivateKeyEntry, Certificate fingerprint (SHA-256): A5:B5:2F:1B:39:D3:DA:47:8B:6E:6A:DA:72:4B:0B:43:C7:2C:89:CD:0D:9D:03:B2:3F:35:7A:D4:7C:D3:3D:51 server, 15-Dec-2022, PrivateKeyEntry, Certificate fingerprint (SHA-256): DB:82:66:E8:09:43:30:9D:EF:0A:41:63:72:0C:2A:8D:F0:8A:C1:25:F7:89:B1:A3:6E:6F:C6:C5:2C:17:CB:B2
  3. Use the keytool -keypasswd command to update the imported key password. The agent keystore requires that both the key and keystore passwords match.

    Copied!
    1 2 $ keytool -keypasswd -alias <CUSTOM_KEY> -new keystore -keystore /home/<username>/security/keyStore.jks Enter keystore password:
  4. In Data Connection, navigate to the agent, then open the Agent settings tab. In the Manage Configuration section, select Advanced, choose the Agent tab, and update the keyStore to point to the newly copied keystore. Then, add keyStorePassword and set it to the appropriate value (keystore, by default).

    Copied!
    1 2 3 4 5 security: keyStore: /home/<username>/security/keyStore.jks keyStorePassword: keystore trustStore: var/conf/trustStore.jks ...

Agent keystore advanced configuration

  1. Finally, choose the Explorer tab and update both thekeyStorePath and keyStorePassword. Save the new configuration.

    Copied!
    1 2 3 4 5 security: keyStorePath: /home/<username>/security/keyStore.jks keyStorePassword: keystore trustStorePath: var/conf/trustStore.jks ...
  2. Restart the agent.

Note that the field is named keyStore when configuring in the Agent tab and keyStorePath in the Explorer tab. No changes are required to the Bootstrapper configuration.