PostgreSQL

Connect Foundry to PostgreSQL ↗ to read and sync data between PostgreSQL databases and Foundry. This connector uses the official PostgreSQL driver ↗ on major version 42, which is compatible with all versions of PostgreSQL 8.2 and above.

Supported capabilities

Capability	Status
Exploration	🟢 Generally available
Batch syncs	🟢 Generally available
Incremental	🟢 Generally available
Change data capture syncs	🟡 Beta
Streaming exports	🟡 Beta
Table Exports	🟡 Beta

Setup

Open the Data Connection application and select + New Source in the upper right corner of the screen.
Select PostgreSQL from the available connector types.
Choose to use a direct connection over the Internet or to connect through an agent runtime.
Follow the additional configuration prompts to continue the setup of your connector using the information in the sections below.

Learn more about setting up a connector in Foundry.

Authentication

The Foundry PostgreSQL connector requires the use of a username and password for authentication. We recommend the use of service credentials rather than individual user credentials.

Username and password authentication may be used in conjunction with client and/or server certificates and SSL modes requiring verification of these certificates.

You must ensure that the provided user has the necessary privileges on the target database, as well as permission to read from or write to the target table(s). For change data capture, the user may also require CREATE and REPLICATION permissions on the target database.

Networking

The PostgreSQL connector requires network access to the database instance that you wish to connect to. PostgreSQL connections will normally use a hostname or IP to connect on port 5432.

To enable direct connections out of Foundry to PostgreSQL, the appropriate egress policies must be added when setting up the source in the Data Connection application.

For cloud-hosted PostgreSQL instances accessible over the Internet, such as PostgreSQL in Amazon Relational Database Service (RDS), you must add an egress policy for the hostname of the database or the IP address if you are not using a hostname. Review the official documentation for the provider of your managed, cloud-hosted PostgreSQL instance for more details on the required networking configuration.

You will need to ensure that you have allowed inbound traffic from Foundry to your PostgreSQL instance. You can view the egress IPs where traffic from Foundry will originate in the Network egress page in Control Panel. Review the documentation for your hosting provider to learn how to allow traffic from these IPs to your database instance.

If you are connecting with an agent runtime, you must ensure that the agent host has firewalls open to the hostnames, IP addresses, and ports required to connect to your PostgreSQL database.

PostgreSQL in Amazon RDS

If you are connecting to a managed instance of PostgreSQL hosted in Amazon RDS, you can use a direct connection runtime with the necessary egress policies.

To find the hostname for your PostgreSQL instance hosted in Amazon RDS, navigate to your instance in the AWS console. An example hostname-based egress policy for RDS PostgreSQL is <your-database-name>.<unique-identifier>.<region>.rds.amazonaws.com (port 5432)

When connecting to an RDS PostgreSQL instance, you may need to add the RDS root Certificate Authority (CA) as a server certificate in the source configuration panel. Download the rds-ca-2019-root.pem from the Amazon S3 site ↗, then copy the certificate details into Foundry to trust connections to Amazon RDS. For more information on connecting to RDS database instances using SSL/TLS, review the official AWS documentation ↗.

Connection details

Option	Required?	Description
`Host type`	Yes	Specify how Foundry should connect with your PostgreSQL database. Option 1: Hostname Provide a hostname. This is the recommended option for all PostgreSQL connetions and should always be used when connecting to a cloud-hosted PostgreSQL instance. For example, an instance hosted in Amazon RDS ↗. Option 2: IPv4 Provide an IPv4 address. If you normally connect using an IPv4 address, either within a corporate network or over the Internet, you can use this option. Option 3: IPv6 Provide an IPv6 address. Use this option if you normally connect using an IPv6 address.
`Port`	Yes	Specify a port to use when connecting. The default port for most PostgreSQL instances will be `5432`. For more information on ports, see the official documentation ↗ for PostgreSQL as well as the configuration for your database instance.
`Database name`	Yes	The name of the database you are connecting to within your instance of PostgreSQL.
`Authentication`	Yes	Configure using the Authentication guidance shown above.
`Network Connectivity`	Yes	You must provide a runtime that is networked to your PostgreSQL instance. For instances behind a corporate firewall, you will normally need to use an agent runtime. For cloud-hosted instances, refer to the Networking section for more details.
`SSL Mode`	Yes	Defaults to `verify-full`. For more details, review the official documentation ↗ for the `ssl-mode` connection parameter on the PostgreSQL JDBC driver.

Change data capture [Beta]

Beta

Change data capture syncs for PostgreSQL are in the beta phase of development and may not be available on your enrollment. Functionality may change during active development. Contact Palantir Support to request access to this feature.

The PostgreSQL source supports change data capture (CDC) syncs.

Since PostgreSQL supports logical replication, change data capture can stream changes to configured tables in near realtime. According to the PostgreSQL documentation ↗:

Logical replication is a method of replicating data objects and their changes, based upon their replication identity (usually a primary key).

Foundry change data capture syncs from PostgreSQL function by using an existing replication slot or creating a new replication slot and publication on the target database. Only one replication slot and publication may be configured per data connection source, however any number of tables may be streamed to Foundry over a single connection. If you wish to use multiple replication slots or publications, you can create multiple data connection sources that connect to your database.

Before setting up a change data capture sync, first ensure that you have a working PostgreSQL source connection. Then, navigate to the CDC syncs tab and provide the additional required configuration for change data capture.

CDC syncs tab for a new PostgreSQL source.

Option	Required?	Description
Replication slot name	Yes	The name of the replication slot to use for CDC. If the slot does not exist, it will be created. For more information, see the official PostgreSQL documentation ↗.
Publication name	Yes	The name of the publication to use for CDC. For more information, see the official PostgreSQL documentation ↗.
Auto-create publication	Yes	If enabled, the publication will be created automatically for all selected tables. This requires the user to have the following permissions: `CREATE` and `REPLICATION` on the database, and `SELECT` on the tables.

Once the required settings for change data capture are configured, you can navigate to the Overview page or stay on the CDC syncs page and select + Create CDC sync to create a new change data capture sync.

The exploration runtime must be working to create a change data capture sync. If the runtime is still initializing, you may need to wait a few seconds and refresh the page to proceed with creating a change data capture sync.

For more information on using CDC with PostgreSQL, review the official documentation ↗ on logical replication for the version of PostgreSQL in use.

Use PostgreSQL sources in code

Read from PostgreSQL with an external transform

The following example queries the Postgres source used in the Deep Dive: Creating Your First Data Connection guide.

This example uses the psycopg2 ↗ package to query the Postgres source.

Copied!1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
from transforms.api import transform_pandas, Output
from transforms.external.systems import external_systems, Source
import psycopg2
import pandas as pd


@external_systems(
    postgres_source=Source()
)

@transform_pandas(output=Output())
def compute(postgres_source):

    conn = psycopg2.connect(
        host="host",            # Database host address
        port="5432",            # Database port
        database="postgres",    # Database name
        user="user",            # Database username
        password=postgres_source.get_secret("PASSWORD") # Database password
    )

    query = "SELECT * FROM plants;"

    return pd.read_sql_query(query, conn)