Connect Foundry to FTP and FTPS servers to sync data between folders and Foundry datasets.
Capability | Status |
---|---|
Exploration | 🟢 Generally available |
Bulk import | 🟢 Generally available |
Incremental | 🟢 Generally available |
The connector can transfer files of any type into Foundry datasets. File formats are preserved and no schemas are applied during or after the transfer. Apply any necessary schema to the output dataset, or write a downstream transformation to access the data.
There is no limit to the size of transferable files. However, network issues can result in failures of large-scale transfers. In particular, direct cloud syncs that take more than two days to run will be interrupted. To avoid network issues, we recommend using smaller file sizes and limiting the number of files that are ingested in every execution of the sync. Syncs can be scheduled to run frequently.
To configure a direct connection, Foundry must have access to the FTP server over the internet. To access on-premise FTP servers, choose to connect through an agent.
Learn more about setting up a connector in Foundry.
FTP/FTPS authentication is completed using username and password. The FTP/FTPS connection may fail if the FTP user does not have permission for the root directory configured for the connection. Refer to Configuration options for more details about the root directory.
Option | Required? | Description |
---|---|---|
Username | Yes | The FTP login username. |
Password | Yes | The FTP login password. This field can be left empty value for anonymous logins. Contact your server administrator for more information. |
If a direct connection is running the FTP/FTPS connector, you must add a network egress policy to allowlist the connection.
If connecting through a domain name, egress policies should be created for the FTP server domain on both the control port (usually port 21) and the data ports. We recommend creating two network policies: a single port policy for the control port, and a port range policy for the data ports. Data ports are determined by the administrators of the FTP server. If errors continue to occur despite proper egress policy configuration, file an issue quoting the list of policies applied.
If the domain for the server resolves to multiple domains and/or servers, all of the associated domains and their related IPs need to be whitelisted. To verify whether a server resolves to multiple domains and/or servers, run the command dig <domain>
from your terminal for the server you are trying to connect to and review the answer section.
If an agent is running your connector, ensure that the agent's server can establish network connections to the FTP/FTPS servers and that firewalls are configured appropriately. We recommend verifying network connections using netcat ↗ or a similar utility when needed.
Configure additional client or server certificates and private keys to properly set up your connector, using the guidance below.
SSL connections validate servers certificates. Normally, SSL validations happen through a certificate chain; by default, both agent and direct connection run times trust most industry standard certificate chains. If the server to which you are connecting has a self-signed certificate, or if a firewall performs TLS interception on the connection, the connector must trust the certificate. Learn more about using certificates in Data Connection.
The server must provide the full certificate chain in order for SSL verification to work. The certificate chain for the FTP server can be obtained by running the command openssl s_client -connect {hostname}:{port} -showcerts -starttls ftp
. To verify the certificate chain, use the OpenSSL command line utility or any other available tool.
If using FTPS, ensure that the certificate for the FTPS server has been added to the agent's truststore.
Foundry attempts a validation for all egress routes. However, FTP cannot be inspected, resulting in hanging connections and/or timeout errors. If errors continue to occur despite proper egress policy configuration, report an issue with a list of policies for which you want to disable hostname validation.
FTP servers can be configured to support either explicit or implicit SSL. Servers running on port 990 will generally be using implicit SSL.
Confirm the settings of your server with your server administrator. By default, the connector assumes explicit SSL; you may need to change this setting for your environment.
FTP requiresCONTROL
and DATA
connection types. The DATA
connection must be configured to be in ACTIVE
or PASSIVE
mode.
Default FTP/FTPS connector ports:
We recommend using a passive mode networking connection. In passive mode, all connections are initiated by the client. When using passive mode, ensure the control port (typically 21) and port range for data transfer (for example 1024–1123) is allowlisted. Contact your FTP/FTPS server administrator to obtain the connection details.
Active mode is an older method of establishing a file transfer. In active mode, the client connects to server while the server connects to the client. Both the server and client are dependent on each other and require bidirectional network connectivity. This networking method is generally difficult to achieve in most secure environments and is not possible when using direct connections.
Option | Required? | Default | Description |
---|---|---|---|
URL | Yes | The URL of the FTP/FTPS server. The URL can optionally contain the path to a directory on the server which will be used as the root directory for the connection (for example, ftp://server.name/folder/name ). | |
Configure client certificates and private key | No | See Certifications and private keys for more information. | |
Configure server certificates | No | See Certifications and private keys for more information. | |
Connection timeout | No | 30 seconds | Increase timeout in milliseconds. |
Re-login time | No | 15 minutes | Modify interval in minutes. |
File change timeout | No | 2 seconds | Set the amount of time a file must remain constant before being considered for upload. Timeout in milliseconds. |
HTTP proxy URL | No | URL of the proxy server beginning with http:// or https:// . Support for HTTP proxies is highly dependent on the FTP server in use and cannot be used in ACTIVE mode. This is because HTTP proxies do not support client requests to listen on an externally accessible port. ACTIVE mode transfers involve the FTP server connecting back to the client, and this is not possible via an HTTP proxy. | |
SSL method | No | EXPLICIT | Whether to use explicit or implicit SSL for FTPS connection. |
Mode | No | PASSIVE | PASSIVE or ACTIVE |
Time zone | No | Timezone of the connector | Timezone of the FTP server. FTP records timestamps without a timezone. To view accurate modification timestamps, specify the FTP server timezone if it is different than the default. |
Timestamp format string | No | MM-dd-yy hh:mma | A format string to parse timestamps from the FTP server. Timestamps are used to determine the files that were modified since the last sync. See Java documentation ↗ on supported formats. |
Control encoding | No | US-ASCII | The encoding for the FTP control messages. Control encoding can be necessary if filenames are in a different encoding than the data connection server default filesystem encoding. Example: On a Windows FTP server, windows-31j is often used for Japanese, and x-windows-949 is often used for Korean. See the Java documentation ↗ for more information. |
Keep alive | No | false | Choose whether to send FTP NOOP commands to keep the control connection alive while downloading large file. Not supported by all FTP servers. |
The FTP connector uses the file-based sync interface.
Are you having issues setting up an agent connection? Install an FTP/S client and attempt to connect to the server using the same configuration as that of the source. If this connection fails, the issue is not a connector bug. Investigate network connectivity, authentication, and FTP server configurations before filing an issue.
Are you using an egress proxy load balancer? FTP is a stateful protocol, so using a load balancer can cause the sync to fail (non-deterministically) if sequential requests don't originate from the same IP.
Does your server use a self-signed certificate? Have you added it to the source truststore? See the SSL and hostname validation section above.
Does your FTP server only support legacy TLS versions (for example, TLS 1.1)? If so, the connector runtime might not accept any of the Cipher suites offered by the server. File an issue to explore alternatives with a Palantir representative.