This page describes several common issues with Syncs with steps to debug:
PKIX exceptions and other SSLHandshakeException
s occur when the agent does not have the correct certificates and therefore cannot authenticate with the source. To ensure that you have the correct certificates installed, follow the guide in our Data Connection and Certificates documentation.
If your sync fails with the error Response 421 received. Server closed connection
, this suggests you may be connecting with an unsupported SSL protocol / port combination. An example includes implicit FTPS over port 991, which is an outdated and unsupported standard. Explicit SSL over port 21 is the preferred method in this case.
If your sync is an FTP/S sync, ensure that you are not using an egress proxy load balancer. FTP is a stateful protocol, so using a load balancer can cause the sync to fail if sequential requests don't originate from the same IP.
Note that due to the nature of load balancing, failures will be non-deterministic; syncs and previews may sometimes succeed, even with the load-balancing proxy in place.
If your sync or exploration is failing with the error com.amazonaws.services.s3.model.AmazonS3Exception:Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: null; S3 Extended Request ID: null)
, this means that the command is hitting an error on going through the egress proxy. If you receive this error, you should check whether any of the following scenarios are applicable:
To the S3 source proxyConfiguration block, add:
Copied!1 2 3 4
host: <address of deployment gateway or egress NLB> port: 8888 protocol: http nonProxyHosts: <bucket>.s3.<region>.amazonaws.com,s3.<region>.amazonaws.com,s3-<region>.amazonaws.com
e.g: Allowlisting all VPC buckets would involve a config addition of:
Copied!1 2 3 4 5 6
clientConfiguration: proxyConfiguration: host: <color>-egress-proxy.palantirfoundry.com port: 8888 protocol: http nonProxyHosts: *.s3.<region>.amazonaws.com, s3.<region>.amazonaws.com, s3-<region>.amazonaws.com
To see the exact query that ran against your source system, refer to _data-ingestion.log
.
If your sync is an incremental sync, ensure you have provided a monotonically increasing column (e.g. timestamp or id) and an initial value for this column.
Once you've chosen the incremental column, you need to make sure you have added the ?
operator to the SQL
query in the sync configuration page (the ?
is replaced with the 'incremental' value and only a single ?
may be used). For example, SELECT * FROM table WHERE id > ?
.
If you believe there are rows missing from your synced dataset or that previously synced rows aren't being properly updated, check the following:
ID
as your monotonically increasing column and the last ID
value synced in the last sync was 10, and then you add a row with ID
5, that row with ID
5 won't be synced.If you believe existing rows are being re-synced, check the following:
LONG
or a STRING
(in ISO-8601 format).If a NullPointerException is thrown on your incremental sync, this may indicate that the SQL query is retrieving rows from the database that would cause the incremental column to contain null values.
SELECT * FROM table WHERE col > ? OR timestamp > 1
, where col
is the incremental column being used for the sync. The use of OR
means that the query does not guarantee that col
only contains non-null values. If a null value for col
is synced for any row, then the sync will fail upon Data Connection attempting to update the incremental state for the sync since the current state will be compared with the synced null value and throw an error.SELECT * FROM table WHERE (col > ? OR timestamp > 1) AND col IS NOT NULL
.If you wish to change the incremental column used for your sync, we recommend that you create a new sync.
On the Agent host, in the <bootvisor-directory>/var/data/processes
directory, run ls -lrt
to find the most recently created bootstrapper~<uuid>
directory.
cd
into that directory and navigate to /var/log/
.magritte-agent-output.log
.If you see the error OutOfMemory Exception
, it means that the Agent cannot handle the workload being assigned to it.
Below are some common causes of hanging syncs and their associated fixes:
All syncs: Hanging during the fetching stage
If your sync is hanging during the fetching stage, check if the source is both available and operational:
JDBC syncs: Hanging during the fetching stage
If your sync is taking longer than expected to complete the fetching stage, it could be because the agent is making a large number of network and database calls. In order to tune the number of network and database calls made during a sync, you can alter the Fetch Size
parameter:
Fetch Size
parameter is located within the "advanced options" section of the source configuration and defines the number of rows fetched during each database round trip for a given query. Therefore:
Fetch Size
parameter will result in fewer rows being returned per call to the database, and more calls will be required. However, this means the agent will use less memory as fewer rows will be stored in the Agent's heap at a given time.Fetch Size
parameter will result in more rows being returned per call to the database, and fewer calls will be required. However, this means the agent will use more memory as a larger number of rows will be stored in the Agent's heap at a given time.Fetch Size
: 500 and tuning accordingly.JDBC syncs: Hanging during the upload stage
If your sync is taking a long time to upload files or fails during the upload stage, you could be overloading a network link. In this case we suggest tuning the Max file size
parameter:
Max file size
parameter is located within the "advanced options" section of the source configuration and defines the maximum size (in bytes or rows) of the output files which are uploaded to Foundry. Therefore:
Max file size
parameter can increase pressure on the network as smaller files are uploaded more frequently; if a file upload fails, the cost of re-uploading is less.Max file size
parameter will require less total bandwidth, but such uploads are more likely to fail.Max file size
: 120mb.FTP / SFTP / Directory / syncs: Hanging during the fetching stage
The most common reason why file-based syncs hang during the fetching stage is because the Agent is crawling a large file system.
Syncs that crawl a filesystem will do two complete crawls of the filesystem (unless configured otherwise). This is to ensure the sync does not upload files which are currently being written to or altered in any way.
REQUEST_ENTITY_TOO_LARGE
errorDownloading, processing, and uploading large files is error-prone and slow. REQUEST_ENTITY_TOO_LARGE
service exceptions occur if an individual file exceeds the maximum size configured for the Agent's upload destination. For the data-proxy
upload strategy, this is set to 100GB by default.
Overriding the limit is not recommended; if possible, find a way to access this data as a collection of smaller files. However, if you wish to override this limit as a temporary workaround, use the following steps:
Within Data Connection, navigate to your Agent and select the Advanced configuration tab.
Select the "Agent" tab.
Under the destinations block, include the following to increase the limit to 150Gb:
Copied!1 2 3
uploadStrategy: type: data-proxy maximumUploadedFileSizeBytes: 161061273600
BadPaddingException
errorBadPaddingException
exceptions occur because the source credential encryption key stored within the Agent is not what was expected. This commonly happens when an Agent manager is manually upgraded, but the old /var/data
directory is not copied to the new install location.
The easiest way to resolve this is to re-enter the credentials for each of the sources using the affected Agent.
When rows are synced from a JDBC source and they contain timestamp columns, those timestamp columns will be cast to long columns in Foundry. This behavior exists for backwards compatibility reasons.
To fix the data type for these columns, we recommend using a Python Transform environment to perform this cleaning. Here is an example code snippet that casts the column "mytimestamp"
back into timestamp form:
Copied!1
df = df.withColumn("mytimestamp", (F.col("mytimestamp") / 1000).cast("timestamp"))