The Agent Manager is referred to as a "bootvisor" on the server where it is installed.
This page contains information on how to configure agent logs, describes several common issues with agent configuration, and provides debugging guidance.
The steps described must be taken after SSHing into the host where the agent has been installed.
Before exploring additional troubleshooting topics, we recommend first checking./var/diagnostic/launch.yml
to confirm the agent successfully connected to Foundry. If the connection was unsuccessful, follow the instructions described in the field enhancedMessage
.
Common issues with agent configuration
curl -s https://<your domain name>/magritte-coordinator/api/ping > /dev/null && echo pass || echo fail
from the host where the agent is installed.
pass
as output. In which case you should:
echo $http_proxy
on the command line of a Unix-based machine.curl: (6) Could not resolve host: ...
. In this instance, it is likely there is something blocking the connection (e.g. a firewall or a proxy), and you should contact your Palantir representative.<agent-manager-install-location>/var/log/startup.log
file.
If you see the following error: Caused by: java.net.BindException: {} Address already in use
, it means there is a process already running on the port to which the Agent Manager is trying to bind.
<agent-manager-directory>/var/conf/install.yml
file and looking for a port
parameter (e.g. port: 1234
- here 1234 is the port). Note if there is no port parameter defined, the Agent Manager will use the default port 7032.ps aux | grep $(lsof -i:<PORT> |awk 'NR>1 {print $2}' |sort -n |uniq)
where <PORT>
is the port to which the Agent Manager is trying to bind.
com.palantir.magritte.bootvisor.BootvisorApplication
it means another Agent Manager is already running.To fix the BindException
error, you will need to find a new port for the Agent Manager, that isn't currently being used.
lsof -i :<PORT>
where <PORT>
is the chosen port number.Once you have found an available port, you will need to add (or update) the port
parameter in the configuration stored at <agent-manager-directory>/var/conf/install.yml
Below is an example Agent Manager configuration snippet with the port set to 7032
:
Copied!1 2 3
... port: 7032 auto-start-agent: true
Once you have saved the above configuration, restart the Agent Manager by running <agent-manager-root>/service/bin/init.sh stop && <agent-manager-root>/service/bin/init.sh start
.
Check the contents of the <agent-manager-directory>/var/data/processes/<latest-bootstrapper-directory>/var/log/startup.log
file.
If you see the following error: Caused by: java.net.BindException: {} Address already in use
, it means there is a process already running on the port to which the Bootstrapper is trying to bind.
port
parameter (for example, port: 1234
- here 1234 is the port). Note the default port for the Bootstrapper is 7002.ps aux | grep $(lsof -i:$PORT |awk 'NR>1 {print $2}' |sort -n |uniq)
where $PORT
is the port to which the Bootstrapper is trying to bind.
com.palantir.magritte.bootstrapper.MagritteBootstrapperApplication
it means another Bootstrapper is already running.To fix the BindException
error, you will need to find a new port for the Bootstrapper, that isn't currently being used.
lsof -i :<PORT>
where <PORT>
is the chosen port number.Once you have found an available port, you will need to set the port
parameter in the Bootstrapper's configuration. This can be done by navigating to the agent overview page in the Data Connection application. From there select the advanced configuration button and finally navigate to the "Bootstrapper" tab.
Below is an example Bootstrapper configuration snippet with the port set to 7002
:
Copied!1 2 3 4
server: adminConnectors: ... port: 7002 #This is the port value
Once you have updated the configuration, you will need to save your changes and restart the agent for them to take effect.
More often than not, this is caused by another "ghost" instance of the agent running that you need to find and shut down.
To find and terminate old processes, follow the steps below:
<agent-manager-install-location>/service/bin/init.sh stop
.<agent-manager-install-location>/var/data/processes/index.json
file.for folder in $(ls -d <agent-manager-root>/var/data/processes/*/); do $folder/service/bin/init.sh stop; done
to shut down the old processes.<agent-manager-install-location>/service/bin/init.sh start
).Manually starting agents on the host where they are installed (as opposed to through Data Connection) can lead to the creation of "ghost" processes.
Often when the agent process shows as "unhealthy" it is because it has crashed or been shut down by either the operating system or another piece of software such as an antivirus.
There are multiple reasons why the operating system might have shut down the process, but the most common one is because the operating system does not have enough memory to run it, which is referred to as being OOM (Out Of Memory) killed.
To check if any of the agent or Explorer subprocesses were OOM killed by the operating system, you can run the following command: grep "exited with return code 137" -r <agent-manager-directory> --include=*.log
. This will search all the log files within the Agent Manager directory for entries containing 'exited with return code 137' (return code 137 signifies a process was OOM killed).
The following is an example output produced by the above command and shows the agent subprocess is being OOM killed.: ./var/data/processes/bootstrapper~<>/var/log/magritte-bootstrapper.log:ERROR [timestamp] com.palantir.magritte.bootstrapper.ProcessMonitor: magritte-agent exited with return code 137
. If you see an output similar to this, you should follow the steps below on tuning heap sizes.
You can also check the operating system logs for OOM kill entries by running the following command: dmesg -T | egrep -i 'killed process
. This command will search the kernel ring buffer for 'killed process' log entries, which indicates a process was OOM killed.
Actual log entries of OOM killed processes will look like the following:
[timestamp] Out of memory: Killed process 9423 (java) total-vm:2928192kB, anon-rss:108604kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:1232kB oom_score_adj:0
(java)
can be ignored as they are not related to your agent.Before you change any heap allocations, you should first:
free -h
. On a 6 GB system, the output might look something like this:Copied!1 2 3
total used free shared buff/cache available Mem: 5.8Gi 961Mi 2.8Gi 9.0Mi 2.1Gi 4.6Gi Swap: 1.0Gi 0B 1.0Gi
In the output produced by the free
command, the available
column shows how much memory can be used for starting new applications. To determine how much memory can be allocated to the agent, we recommend that you stop the agent and run free -h
while the system is under normal to high load. The available value will tell you the maximum amount of memory you can devote to all agent processes combined. We recommend that you leave a buffer of approximately 2 - 4GB, if possible, to account for other processes on the system needing more memory, as well as off-heap memory usage by the agent processes. Note that not all versions of free
show the available column, so you may need to check the documentation for the version on your system to find the equivalent information.
Determine how much memory is assigned to each of the following subprocesses: Agent Manager, Bootstrapper, agent, and Explorer.
In order to find out how much memory is assigned to the agent and Explorer subprocesses, you should navigate to the agent configuration page within Data Connection, choose the advanced configuration button, and select the "Bootstrapper" tab. From there you will see each of the subprocesses have their own configuration block; within each block you should see a jvmHeapSize
parameter which defines how much memory is allocated to the associated processes.
By default, the Bootstrapper subprocess is assigned 512mb
of memory. This can be confirmed by first navigating to the <agent-manager-directory>/var/data/processes/
directory; from there you will need to run ls -lrt
to find the most recently created bootstrapper~<uuid>
directory. Once in the most recently created bootstrapper~<uuid>
directory, you can inspect the contents of the ./var/conf/launcher-custom.yml
file. Here, the Xmx
value is the amount of memory assigned to the Bootstrapper.
By default, the Agent Manager subprocess is also assigned 512mb
of memory. This can be confirmed by inspecting the contents of the file <agent-manager-directory>/var/conf/launcher-custom.yml
. Here, the Xmx
value is the amount of memory assigned to the Agent Manager.
Agents installed on Windows machines do not use the launcher-custom.yml
files and thus, by default, Java will allocate both the Agent Manager and Bootstrapper processes 25% of the total memory available to the system. To fix this you will need to set the Agent Manager and Bootstrapper heap sizes manually, which can be done by following the steps below:
setx -m JAVA_HOME "{BOOTVISOR_INSTALL_DIR}\jdk\{JDK_VERSION}-win_x64\"
setx -m MAGRITTE_BOOTVISOR_WIN_OPTS "-Xmx512M -Xms512M"
setx -m MAGRITTE_BOOTSTRAPPER_OPTS "-Xmx512M -Xms512M"
.\service\bin\magritte-bootvisor-win
Once you have determined how much memory the host has available and how much memory is assigned to each of the above subprocesses, you should then decide whether to: decrease the amount of memory allocated to the above processes or increase the amount of memory available to the host.
Whether or not you can safely decrease the amount of memory used by the agent processes will depend on your agent settings (for example, the maximum number of concurrent syncs and file upload parallelism), the types of data being synced, and the typical load on the agent. Decreasing the heap size makes it less likely that the OS will kill the process but more likely that the java process will run out of heap space. You may need to test different values to find what works. Contact your Palantir representative if you need assistance tuning this value.
To decrease the amount of memory allocated to one (or multiple) of the subprocesses, do the following:
jvmHeapSize
parameter for each of the individual subprocesses.Copied!1 2 3
agent: .... jvmHeapSize: 3g #This is jvm heap size value
Default heap allocations
By default an agent requires ~3gb of memory, allocated as follows:
Java processes also use some amount of off-heap memory; thus, we recommend you ensure there is at least ≥ 4gb left free for them.
There are two main causes of failed agent downloads: network connections and expired links.
If you can connect to Foundry but are getting an invalid tar.gz
file or an error message on the download, you may have an expired or invalidated link.
A user must be an editor of a Project to create an agent in that Project, but must be an owner of the Project to administer the agents within that Project. That means that a user may create an agent and then be unable to generate download links or perform other administrative tasks on the agent. For more on agent permissions, review the guidance in our permissions reference documentation.
TLSv.1.0 and TLSv1.1 are not supported by Palantir as they are outdated and insecure protocols. Amazon Corretto builds of the OpenJDK used by Data Connection agents explicitly disable TLSv1.0 and TLSv1.1 by default under the jdk.tls.disabledAlgorithms
security property in the java.security
file.
Attempts to connect to a data sources system exclusively supporting TLSv1.0 and TLSv1.1 will fail with various errors including Error: The server selected protocol version TLS10 is not accepted by client preferences
.
We actively discourage the usage of deprecated versions of TLS. Palantir is not responsible for security risks associated with its usage.
If there is a critical need to temporarily support TLSv1.0 and TLSv1.1, perform the following steps:
Bootstrapper
tab.tlsProtocols
entries to both the agent
and explorer
configuration blocks followed by the protocols you want to enable. Be sure to also include TLSv1.2 so any sources using it will not break. For example:Copied!1 2 3 4 5 6 7 8 9 10 11 12
agent: tlsProtocols: - TLSv1 - TLSv1.1 - TLSv1.2 ... explorer: tlsProtocols: - TLSv1 - TLSv1.1 - TLSv1.2 ...
With this configuration, the agent will continue to allow TLSv1.0 and TLSv1.1 across agent upgrades and restarts. Once the datasource has moved to new TLS versions, revert all changes made to the advanced agent configuration.
To adjust the log storage settings for an agent on its host machine, follow the steps below:
Your new configuration should now be in effect.
The files will remain on disk until the Bootvisor cleans up old process folders (30 days or 10 old folders triggers a clean up). These files are encrypted and the keys to decrypt them only existed in the memory of processes that died.