PostgreSQL High Availability with Patroni: 2026 Edition¶
Running a single PostgreSQL instance in production is a liability. When it goes down — and it will — so does your application. High availability is not optional for production databases; it is the baseline.
This guide builds a production-grade PostgreSQL HA cluster using the fully open-source community stack: PostgreSQL 17 from PGDG, Patroni 4.x, etcd 3.5, pgbackrest 2.58, and HAProxy. No vendor-specific packages. No proprietary repositories. Everything here runs on Rocky Linux 9 or AlmaLinux 9 and stays fully open-source.
What You Are Building¶
┌─────────────────────────────┐
Applications │ HAProxy │
──────────► │ :5000 (writes → primary) │
│ :5001 (reads → replicas) │
└──────┬──────┬──────┬─────────┘
│ │ │
▼ ▼ ▼
┌────┐ ┌────┐ ┌────┐
│pg1 │ │pg2 │ │pg3 │ ← PostgreSQL 17 + Patroni 4
│ │ │ │ │ │
│etcd│ │etcd│ │etcd│ ← etcd 3.5 (DCS)
└────┘ └────┘ └────┘
│ │ │
└──────┴──────┘
│
▼
┌─────────────┐
│ S3 Bucket │ ← pgbackrest 2.58 backups
└─────────────┘
4 hosts total:
| Host | IP | Role |
|---|---|---|
| pg1 | 172.31.47.229 | PostgreSQL primary (initially) + etcd |
| pg2 | 172.31.34.169 | PostgreSQL replica + etcd |
| pg3 | 172.31.39.118 | PostgreSQL replica + etcd |
| haproxy | 172.31.10.50 | HAProxy load balancer |
Stack versions (May 2026):
| Component | Version | Source |
|---|---|---|
| Rocky/AlmaLinux | 9.x | OS |
| PostgreSQL | 17.x | PGDG community repo |
| Patroni | 4.1.3 | pip (PyPI) |
| etcd | 3.5.x | GitHub binary release |
| pgbackrest | 2.58.0 | PGDG community repo |
| HAProxy | 2.4+ | EPEL / system repo |
Part 1: OS Preparation (All PostgreSQL Nodes)¶
Run these steps on pg1, pg2, and pg3.
1.1 Hostname and /etc/hosts¶
# On pg1
hostnamectl set-hostname pg1
# On all nodes, add to /etc/hosts:
cat >> /etc/hosts << 'EOF'
172.31.47.229 pg1 pg1.example.com
172.31.34.169 pg2 pg2.example.com
172.31.39.118 pg3 pg3.example.com
EOF
1.2 SELinux and Firewall¶
For a controlled private network, disable both to reduce complexity:
setenforce 0
sed -i 's/^SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config
systemctl stop firewalld
systemctl disable firewalld
If your security policy requires the firewall, open these ports instead:
1.3 Kernel Parameters¶
Add to /etc/sysctl.conf on all PostgreSQL nodes:
# Shared memory
kernel.shmall = 2097152
kernel.shmmax = 134217728
kernel.shmmni = 4096
# Memory overcommit — prevents OOM kills during large sorts
vm.overcommit_memory = 2
vm.overcommit_ratio = 80
# Network tuning
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.core.netdev_max_backlog = 10000
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_keepalive_probes = 10
# File handles
fs.file-max = 65536
Apply immediately:
1.4 Process Limits¶
Add to /etc/security/limits.conf:
postgres soft nofile 65536
postgres hard nofile 65536
postgres soft nproc 16384
postgres hard nproc 16384
Part 2: etcd Installation (All PostgreSQL Nodes)¶
etcd is the distributed configuration store (DCS) that Patroni uses for leader election and cluster state. Run a 3-node etcd cluster co-located on the PostgreSQL hosts.
2.1 Install etcd Binary¶
ETCD_VER=v3.5.21
DOWNLOAD_URL=https://github.com/etcd-io/etcd/releases/download
wget ${DOWNLOAD_URL}/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz
tar xzf etcd-${ETCD_VER}-linux-amd64.tar.gz
sudo mv etcd-${ETCD_VER}-linux-amd64/etcd* /usr/local/bin/
rm -rf etcd-${ETCD_VER}-linux-amd64*
etcd --version # verify
2.2 Create etcd User and Directories¶
sudo groupadd --system etcd
sudo useradd -s /sbin/nologin --system -g etcd etcd
sudo mkdir -p /var/lib/etcd /etc/etcd
sudo chown -R etcd:etcd /var/lib/etcd
sudo chmod 0700 /var/lib/etcd
2.3 etcd Configuration File¶
Create /etc/etcd/etcd.conf on each node — substitute the correct IP for each host.
pg1 (172.31.47.229):
cat > /etc/etcd/etcd.conf << 'EOF'
ETCD_NAME=pg1
ETCD_DATA_DIR=/var/lib/etcd
ETCD_LISTEN_PEER_URLS=http://172.31.47.229:2380
ETCD_LISTEN_CLIENT_URLS=http://172.31.47.229:2379,http://127.0.0.1:2379
ETCD_INITIAL_ADVERTISE_PEER_URLS=http://172.31.47.229:2380
ETCD_ADVERTISE_CLIENT_URLS=http://172.31.47.229:2379
ETCD_INITIAL_CLUSTER=pg1=http://172.31.47.229:2380,pg2=http://172.31.34.169:2380,pg3=http://172.31.39.118:2380
ETCD_INITIAL_CLUSTER_STATE=new
ETCD_INITIAL_CLUSTER_TOKEN=pg-etcd-cluster-01
EOF
pg2 (172.31.34.169) — same content with pg2 values substituted.
pg3 (172.31.39.118) — same content with pg3 values substituted.
2.4 etcd systemd Service¶
cat > /etc/systemd/system/etcd.service << 'EOF'
[Unit]
Description=etcd distributed key-value store
Documentation=https://etcd.io/docs/
After=network-online.target
Wants=network-online.target
[Service]
User=etcd
Type=notify
EnvironmentFile=/etc/etcd/etcd.conf
ExecStart=/usr/local/bin/etcd
Restart=on-failure
RestartSec=10s
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable --now etcd
2.5 Verify etcd Cluster¶
# Set API version for v3 commands
export ETCDCTL_API=3
etcdctl --endpoints=http://127.0.0.1:2379 member list -w table
etcdctl --endpoints=http://172.31.47.229:2379,http://172.31.34.169:2379,http://172.31.39.118:2379 endpoint health -w table
etcdctl --endpoints=http://172.31.47.229:2379,http://172.31.34.169:2379,http://172.31.39.118:2379 endpoint status -w table
Expected output: all 3 members healthy, one showing IS LEADER=true.
Part 3: PostgreSQL Installation (All PostgreSQL Nodes)¶
3.1 Install PGDG Repository¶
The PGDG (PostgreSQL Global Development Group) repository is the official community source for all PostgreSQL versions. One RPM enables all supported versions.
# Install PGDG repo
sudo dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-9-x86_64/pgdg-redhat-repo-latest.noarch.rpm
# Disable the built-in PostgreSQL module (conflicts with PGDG packages)
sudo dnf -qy module disable postgresql
# Install PostgreSQL 17
sudo dnf install -y postgresql17-server postgresql17-contrib
This installs PostgreSQL 17 with binaries at /usr/pgsql-17/bin/ and the default data directory at /var/lib/pgsql/17/data/.
3.2 Install pgbackrest¶
pgbackrest is available directly from the PGDG repo — no extra configuration needed:
3.3 Do NOT Run initdb Yet¶
Patroni will initialize the PostgreSQL data directory (initdb) on the first boot. Running it manually will conflict. Skip postgresql-17-setup initdb.
Part 4: Patroni Installation (All PostgreSQL Nodes)¶
Patroni 4.x is installed via pip for the latest version. It requires Python 3.9+.
4.1 Install Dependencies¶
4.2 Install Patroni¶
Install as the postgres system user with the etcd3 and psycopg3 extras:
sudo -u postgres pip3 install --user 'patroni[psycopg3,etcd3]'
# Add to postgres user's PATH
echo 'export PATH=$HOME/.local/bin:/usr/pgsql-17/bin:$PATH' >> /var/lib/pgsql/.bash_profile
source /var/lib/pgsql/.bash_profile
patroni --version # should show Patroni 4.1.x
Why psycopg3?
Patroni 4.x defaults to psycopg3 (the modern PostgreSQL adapter). The older psycopg2 still works but psycopg3 offers better async support and async connection handling — the recommended choice for new installations.
Part 5: Patroni Configuration (All PostgreSQL Nodes)¶
Create /var/lib/pgsql/patroni.yml on each node. The only values that differ per node are name, restapi.listen, restapi.connect_address, postgresql.listen, and postgresql.connect_address.
5.1 Patroni Configuration File¶
pg1 (172.31.47.229):
scope: pg_cluster
name: pg1
restapi:
listen: '172.31.47.229:8008'
connect_address: '172.31.47.229:8008'
# etcd3 section — uses etcd v3 API (recommended for etcd 3.5+)
etcd3:
hosts: 'pg1:2379,pg2:2379,pg3:2379'
protocol: http
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
hot_standby: 'on'
wal_keep_size: 256MB # replaces deprecated wal_keep_segments
max_wal_senders: 10
max_replication_slots: 10
wal_level: replica
archive_command: 'pgbackrest --stanza=pg_cluster archive-push %p'
archive_mode: 'on'
initdb:
- encoding: UTF8
- locale: en_US.UTF-8
- data-checksums # enables page checksums — catches data corruption
pg_hba:
- local all all trust
- host all postgres 0.0.0.0/0 trust
- host replication replicator 172.31.47.229/32 scram-sha-256
- host replication replicator 172.31.34.169/32 scram-sha-256
- host replication replicator 172.31.39.118/32 scram-sha-256
- host all all 0.0.0.0/0 scram-sha-256
users:
admin:
password: 'ChangeMe2026!'
options:
- createrole
- createdb
postgresql:
listen: '172.31.47.229:5432'
connect_address: '172.31.47.229:5432'
bin_dir: /usr/pgsql-17/bin
data_dir: /var/lib/pgsql/17/data
pgpass: /tmp/pgpass0
authentication:
replication:
username: replicator
password: 'Rep-Pass-2026!'
superuser:
username: postgres
password: 'PgSuperPass2026!'
rewind:
username: rewind_user
password: 'Rewind-Pass-2026!'
create_replica_methods:
- pgbackrest
- basebackup
pgbackrest:
command: 'pgbackrest --stanza=pg_cluster --delta restore'
keep_data: true
no_params: true
basebackup:
checkpoint: fast
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
pg2 and pg3: same file, change name: to pg2/pg3 and update both IP addresses in restapi and postgresql sections.
Change all passwords
Replace all placeholder passwords before deploying. Use a secrets manager or Ansible vault in production — do not store plaintext credentials in config files checked into git.
5.2 Patroni systemd Service¶
cat > /etc/systemd/system/patroni.service << 'EOF'
[Unit]
Description=Patroni — HA PostgreSQL orchestrator
After=syslog.target network.target etcd.service
Wants=etcd.service
[Service]
Type=simple
User=postgres
Group=postgres
Environment=PATH=/var/lib/pgsql/.local/bin:/usr/pgsql-17/bin:/usr/local/bin:/usr/bin:/bin
Environment=PGDATA=/var/lib/pgsql/17/data
ExecStart=/var/lib/pgsql/.local/bin/patroni /var/lib/pgsql/patroni.yml
ExecReload=/bin/kill -s HUP $MAINPID
KillMode=process
TimeoutSec=30
Restart=on-failure
RestartSec=10s
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable --now patroni
5.3 Verify the Cluster¶
Watch the logs as Patroni initializes. On the first start, one node becomes the primary and initializes the data directory:
Check cluster status:
Expected output:
+ Cluster: pg_cluster (7234567890123456789) ----+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+--------+-----------------+---------+---------+----+-----------+
| pg1 | 172.31.47.229:5432 | Leader | running | 1 | |
| pg2 | 172.31.34.169:5432 | Replica | running | 1 | 0 |
| pg3 | 172.31.39.118:5432 | Replica | running | 1 | 0 |
+--------+-----------------+---------+---------+----+-----------+
Part 6: pgbackrest Configuration (All PostgreSQL Nodes)¶
6.1 Create Directories¶
sudo mkdir -p /var/log/pgbackrest /etc/pgbackrest /var/lib/pgbackrest
sudo chmod 0750 /var/lib/pgbackrest
sudo chmod 770 /var/log/pgbackrest
sudo touch /etc/pgbackrest/pgbackrest.conf
sudo chmod 640 /etc/pgbackrest/pgbackrest.conf
sudo chown -R postgres:postgres /var/lib/pgbackrest /var/log/pgbackrest /etc/pgbackrest
6.2 pgbackrest Configuration¶
/etc/pgbackrest/pgbackrest.conf — same content on all three nodes:
[pg_cluster]
pg1-path=/var/lib/pgsql/17/data
pg1-port=5432
pg1-user=postgres
[global]
start-fast=y
process-max=4
archive-timeout=300
compress-type=lz4 # lz4 is faster than gz; available in pgbackrest 2.38+
repo1-type=s3
repo1-path=/pgbackrest/pg_cluster
repo1-retention-full=4
repo1-retention-diff=14
repo1-s3-bucket=your-pgbackrest-bucket
repo1-s3-endpoint=s3.ap-southeast-1.amazonaws.com
repo1-s3-key=YOUR_ACCESS_KEY
repo1-s3-key-secret=YOUR_SECRET_KEY
repo1-s3-region=ap-southeast-1
repo1-s3-uri-style=path
log-level-console=info
log-level-file=detail
log-path=/var/log/pgbackrest
[global:archive-push]
compress-level=3
6.3 Initialize and Verify¶
Run on the primary (pg1) only:
sudo -u postgres pgbackrest --stanza=pg_cluster stanza-create
sudo -u postgres pgbackrest --stanza=pg_cluster check
sudo -u postgres pgbackrest --stanza=pg_cluster backup --type=full
sudo -u postgres pgbackrest info
6.4 Schedule Backups¶
Add to the postgres user's crontab (crontab -u postgres -e):
# Full backup every Sunday at 01:00
0 1 * * 0 pgbackrest --type=full --stanza=pg_cluster backup
# Differential backup Mon-Sat at 01:00
0 1 * * 1-6 pgbackrest --type=diff --stanza=pg_cluster backup
# Incremental backup every 4 hours
0 */4 * * * pgbackrest --type=incr --stanza=pg_cluster backup
Part 7: HAProxy Load Balancer (haproxy host)¶
7.1 Install HAProxy¶
7.2 HAProxy Configuration¶
/etc/haproxy/haproxy.cfg:
global
maxconn 1000
log /dev/log local0
defaults
log global
mode tcp
retries 2
timeout client 30m
timeout connect 4s
timeout server 30m
timeout check 5s
# Statistics dashboard
listen stats
mode http
bind *:7000
stats enable
stats uri /
stats refresh 10s
stats show-node
# Write traffic → primary only
# Patroni REST API: GET /primary returns 200 only on the current leader
listen postgres_write
bind *:5000
option httpchk GET /primary
http-check expect status 200
default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
server pg1 172.31.47.229:5432 maxconn 100 check port 8008
server pg2 172.31.34.169:5432 maxconn 100 check port 8008
server pg3 172.31.39.118:5432 maxconn 100 check port 8008
# Read traffic → replicas, with primary as fallback
# Patroni REST API: GET /replica returns 200 only on replicas
listen postgres_read
bind *:5001
balance leastconn
option httpchk GET /replica
http-check expect status 200
default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
server pg1 172.31.47.229:5432 maxconn 100 check port 8008
server pg2 172.31.34.169:5432 maxconn 100 check port 8008
server pg3 172.31.39.118:5432 maxconn 100 check port 8008
Health check endpoint — 2026 update
The 2024 guide used OPTIONS /master. Patroni 4.x keeps /master as a compatibility alias but the canonical endpoints are /primary (for the leader) and /replica (for standbys). Use GET not OPTIONS — HAProxy's option httpchk GET /primary is cleaner and matches standard HTTP semantics.
7.3 Start HAProxy¶
Monitor via the stats dashboard at http://172.31.10.50:7000.
Part 8: Testing¶
8.1 Verify Replication¶
Create data on the primary and read it from replicas:
# On any host — connects through HAProxy write port
psql -h 172.31.10.50 -p 5000 -U postgres << 'SQL'
CREATE DATABASE demo;
\c demo
CREATE TABLE orders (id serial, product text, created_at timestamptz DEFAULT now());
INSERT INTO orders (product) VALUES ('widget'), ('gadget'), ('doohickey');
SELECT * FROM orders;
SQL
# Read through HAProxy read port (hits a replica)
psql -h 172.31.10.50 -p 5001 -U postgres -d demo -c "SELECT * FROM orders;"
8.2 Test Automatic Failover¶
Stop Patroni on the current primary and watch the cluster elect a new leader:
# On pg1 (the current primary)
sudo systemctl stop patroni
# On pg2 or pg3 — watch the election happen in seconds
watch -n2 "patronictl -c /var/lib/pgsql/patroni.yml list"
Patroni will elect a new primary (typically within 30 seconds — the ttl setting). HAProxy detects the change via health checks within ~10 seconds. Writes automatically route to the new primary.
Restart the old primary — it rejoins as a replica:
8.3 Planned Switchover¶
For maintenance or rolling upgrades, trigger a controlled switchover:
Patroni asks which node to promote, waits for replicas to catch up, and performs a zero-data-loss switchover.
Part 9: Day-2 Operations¶
Enable Synchronous Replication¶
For workloads where zero data loss is mandatory:
patronictl -c /var/lib/pgsql/patroni.yml edit-config --apply - --force << 'JSON'
{
"synchronous_mode": true,
"synchronous_mode_strict": true,
"postgresql": {
"parameters": {
"synchronous_commit": "on"
}
}
}
JSON
patronictl -c /var/lib/pgsql/patroni.yml reload pg_cluster
Reinitialize a Corrupted Replica¶
If a replica's data directory is corrupted or severely behind:
Patroni will recreate the replica from pgbackrest (fast path) or pg_basebackup (fallback).
Change PostgreSQL Parameters¶
Never edit postgresql.conf directly on HA clusters. Use Patroni to distribute changes:
This opens your $EDITOR with the current DCS configuration. Changes are applied cluster-wide and persisted in etcd.
Restore from pgbackrest Backup¶
To restore to a point in time on a new cluster:
sudo -u postgres pgbackrest --stanza=pg_cluster \
--delta \
--target="2026-05-25 14:30:00" \
--target-action=promote \
restore
Part 10: Disaster Recovery¶
HA protects you from node failures within a single site — a dead disk, a crashed process, a bad kernel update. It does not protect you from a site-level event: a failed data center, a cloud region outage, or catastrophic data corruption that replicates to all replicas before anyone notices.
DR requires a separate site that can be promoted to production independently of the primary cluster.
This section covers two complementary approaches:
| Approach | RTO | RPO | Complexity |
|---|---|---|---|
| Patroni Standby Cluster (warm standby) | 2–5 min | seconds | medium |
| pgbackrest PITR Restore (cold standby) | 30–120 min | up to 4 hours | low |
Use both: the standby cluster for planned or fast failover, pgbackrest restore as the backstop if the standby cluster itself is unavailable.
10.1 DR Architecture¶
PRIMARY SITE (ap-southeast-1) DR SITE (ap-southeast-2)
───────────────────────────── ──────────────────────────
HAProxy (:5000 write) HAProxy-DR (:5000 write)
│ │
pg1 (Leader) ──streaming──────────────► dr-pg1 (standby leader)
pg2 (Replica) dr-pg2 (standby replica)
pg3 (Replica) dr-pg3 (standby replica)
│ │
└──── pgbackrest ────► S3 (primary) ──S3 replication──► S3 (DR)
The DR cluster streams from the primary cluster's current leader via a dedicated replication slot. On DR activation, the standby cluster is promoted and becomes an independent primary cluster in minutes.
10.2 Prepare the Primary Cluster for DR¶
The primary cluster needs a permanent replication slot that the DR standby cluster will hold. Add this to the bootstrap.dcs section of patroni.yml on all primary-site nodes and apply via edit-config:
patronictl -c /var/lib/pgsql/patroni.yml edit-config --apply - --force << 'JSON'
{
"slots": {
"patroni_standby_leader": {
"type": "physical"
}
}
}
JSON
Verify the slot was created on the primary:
Monitor this slot
A replication slot that is not consumed causes WAL accumulation on the primary. If the DR cluster is disconnected for a long time, the primary's disk fills with retained WAL. Monitor pg_replication_slots.wal_status — if it shows lost, reinitialize the DR cluster from a pgbackrest backup.
10.3 DR Site: Node Setup¶
Provision 3 new nodes in the DR site. They mirror the primary site setup with different IPs:
| Host | IP | Role |
|---|---|---|
| dr-pg1 | 172.32.47.229 | Standby leader (streams from primary) |
| dr-pg2 | 172.32.34.169 | Standby replica |
| dr-pg3 | 172.32.39.118 | Standby replica |
| dr-haproxy | 172.32.10.50 | HAProxy (serves reads only during standby) |
Run Part 1 through Part 4 of this guide on the DR nodes (OS tuning, etcd, PostgreSQL, Patroni installation). The only difference is the Patroni configuration file.
10.4 DR Patroni Configuration¶
The DR cluster's patroni.yml is identical to the primary cluster's except for the standby_cluster block and the bootstrap.dcs section. The standby_cluster block tells Patroni this cluster should follow the primary, not elect its own leader.
/var/lib/pgsql/patroni.yml on dr-pg1:
scope: dr_cluster # different scope name from primary
name: dr-pg1
restapi:
listen: '172.32.47.229:8008'
connect_address: '172.32.47.229:8008'
etcd3:
hosts: 'dr-pg1:2379,dr-pg2:2379,dr-pg3:2379' # DR-site etcd cluster
protocol: http
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
# This block declares the cluster is a standby of another cluster
standby_cluster:
host: 172.31.10.50 # primary site HAProxy write port
port: 5000
primary_slot_name: patroni_standby_leader # slot created in step 10.2
create_replica_methods:
- pgbackrest
- basebackup
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
hot_standby: 'on'
wal_keep_size: 256MB
max_wal_senders: 10
max_replication_slots: 10
wal_level: replica
archive_command: 'pgbackrest --stanza=pg_cluster archive-push %p'
archive_mode: 'on'
pg_hba:
- local all all trust
- host all postgres 0.0.0.0/0 trust
- host replication replicator 172.32.47.229/32 scram-sha-256
- host replication replicator 172.32.34.169/32 scram-sha-256
- host replication replicator 172.32.39.118/32 scram-sha-256
- host all all 0.0.0.0/0 scram-sha-256
postgresql:
listen: '172.32.47.229:5432'
connect_address: '172.32.47.229:5432'
bin_dir: /usr/pgsql-17/bin
data_dir: /var/lib/pgsql/17/data
pgpass: /tmp/pgpass0
authentication:
replication:
username: replicator
password: 'Rep-Pass-2026!' # must match primary cluster
superuser:
username: postgres
password: 'PgSuperPass2026!'
rewind:
username: rewind_user
password: 'Rewind-Pass-2026!'
create_replica_methods:
- pgbackrest
- basebackup
pgbackrest:
command: 'pgbackrest --stanza=pg_cluster --delta restore'
keep_data: true
no_params: true
basebackup:
checkpoint: fast
Start Patroni on all DR nodes. The DR cluster will bootstrap from the primary via pgbackrest, then switch to streaming replication.
10.5 Verify the Standby Cluster¶
On the DR site:
Expected output — note the standby_leader role:
+ Cluster: dr_cluster ─────────────────+─────────+────────────+
| Member | Host | Role | State | Lag |
+---------+-----------------+-----------------+---------+-----+
| dr-pg1 | 172.32.47.229 | Standby Leader | running | 0 |
| dr-pg2 | 172.32.34.169 | Replica | running | 0 |
| dr-pg3 | 172.32.39.118 | Replica | running | 0 |
+---------+-----------------+-----------------+---------+-----+
All nodes are in read-only standby mode. Write attempts will fail — this is correct behavior.
Monitor replication lag from the primary site:
psql -U postgres -c "
SELECT application_name, state, sent_lsn, replay_lsn,
(sent_lsn - replay_lsn) AS lag_bytes
FROM pg_stat_replication
WHERE application_name LIKE 'dr%';
"
10.6 pgbackrest in the DR Site¶
Configure pgbackrest on DR nodes to point to the same S3 bucket (or a replicated DR bucket). Add a second repository pointing to the DR region:
/etc/pgbackrest/pgbackrest.conf on DR nodes:
[pg_cluster]
pg1-path=/var/lib/pgsql/17/data
pg1-port=5432
pg1-user=postgres
[global]
start-fast=y
process-max=4
archive-timeout=300
compress-type=lz4
# Primary S3 repo (cross-region accessible)
repo1-type=s3
repo1-path=/pgbackrest/pg_cluster
repo1-retention-full=4
repo1-s3-bucket=your-pgbackrest-bucket
repo1-s3-endpoint=s3.ap-southeast-1.amazonaws.com
repo1-s3-key=YOUR_ACCESS_KEY
repo1-s3-key-secret=YOUR_SECRET_KEY
repo1-s3-region=ap-southeast-1
repo1-s3-uri-style=path
# DR S3 repo (local region — faster restore on DR activation)
repo2-type=s3
repo2-path=/pgbackrest/pg_cluster
repo2-retention-full=4
repo2-s3-bucket=your-pgbackrest-dr-bucket
repo2-s3-endpoint=s3.ap-southeast-2.amazonaws.com
repo2-s3-key=YOUR_DR_ACCESS_KEY
repo2-s3-key-secret=YOUR_DR_SECRET_KEY
repo2-s3-region=ap-southeast-2
repo2-s3-uri-style=path
log-level-console=info
log-level-file=detail
log-path=/var/log/pgbackrest
Enable S3 cross-region replication from your primary bucket to the DR bucket in AWS — this happens transparently at the storage layer and keeps the DR repo in sync without any pgbackrest configuration.
10.7 DR Activation Runbook¶
When the primary site is unavailable and you need to promote the DR cluster to production, follow this sequence:
Step 1 — Confirm the primary site is truly down
Do not activate DR on a false alarm. Verify:
# Try connecting to primary HAProxy
psql -h 172.31.10.50 -p 5000 -U postgres -c "SELECT 1;" --connect-timeout=5
# Check Patroni REST API on primary nodes
curl -s http://172.31.47.229:8008/primary
curl -s http://172.31.34.169:8008/primary
curl -s http://172.31.39.118:8008/primary
If all return errors or timeouts, proceed.
Step 2 — Check DR cluster lag
If lag is 0, zero data loss. If lag is non-zero, the DR site has not replicated all transactions from the primary — document the lag before promoting (this is your RPO for this incident).
Step 3 — Promote the DR cluster
Remove the standby_cluster block from the DCS configuration to release the DR cluster from standby mode:
patronictl -c /var/lib/pgsql/patroni.yml edit-config --apply - --force << 'JSON'
{"standby_cluster": null}
JSON
Within seconds, Patroni detects the configuration change and elects a leader in the DR cluster:
Expected: dr-pg1 (or whichever node wins election) shows Leader role.
Step 4 — Redirect application traffic
Update your DNS CNAME or load balancer VIP to point to the DR HAProxy:
Applications reconnect automatically on next connection attempt.
Step 5 — Verify writes succeed
psql -h 172.32.10.50 -p 5000 -U postgres -c "
SELECT pg_is_in_recovery(), now();
CREATE TABLE IF NOT EXISTS dr_test (ts timestamptz DEFAULT now());
INSERT INTO dr_test DEFAULT VALUES;
SELECT * FROM dr_test;
"
pg_is_in_recovery() must return false — the cluster is now a writable primary.
10.8 DR via pgbackrest Restore (Cold Standby)¶
If the standby cluster is also unavailable (total regional failure), restore from the S3 backup in the DR region:
# On new DR infrastructure — after installing PostgreSQL and pgbackrest
# List available backups in DR repo
sudo -u postgres pgbackrest --stanza=pg_cluster --repo=2 info
# Restore latest backup
sudo -u postgres pgbackrest --stanza=pg_cluster \
--repo=2 \
--delta \
--target-action=promote \
restore
# Start PostgreSQL
sudo systemctl start postgresql-17
# Verify recovery completed
psql -U postgres -c "SELECT pg_is_in_recovery(), pg_last_xact_replay_timestamp();"
Then initialize Patroni on the restored node as a new primary cluster (remove the standby_cluster block from patroni.yml).
For point-in-time recovery to a specific moment before a data corruption event:
sudo -u postgres pgbackrest --stanza=pg_cluster \
--repo=2 \
--delta \
--type=time \
--target="2026-05-26 08:45:00 UTC" \
--target-action=promote \
restore
10.9 DR Testing Schedule¶
A DR plan that has never been tested is not a DR plan — it is a wish. Run these tests on a regular cadence:
| Test | Frequency | What it proves |
|---|---|---|
| Verify standby lag < 5 seconds | Daily (automated) | Streaming replication is healthy |
| pgbackrest restore to isolated env | Monthly | Backups are actually restorable |
| Full DR activation (non-prod traffic) | Quarterly | Runbook works end-to-end |
| DNS/VIP cutover drill | Quarterly | Application teams know the procedure |
| Primary-site failure simulation | Annually | Full DR confidence |
Automate the daily lag check:
# Add to crontab on monitoring host
*/5 * * * * psql -h 172.31.10.50 -p 5000 -U postgres -t -c \
"SELECT COALESCE(MAX(write_lag), '0')::text FROM pg_stat_replication \
WHERE application_name LIKE 'dr%';" | \
grep -v "^$" | awk '{if ($1 > 5) print "ALERT: DR lag " $1 " seconds"}'
Ports Reference¶
Primary site:
| Port | Host | Purpose |
|---|---|---|
| 2379 | pg1/2/3 | etcd client |
| 2380 | pg1/2/3 | etcd peer (cluster comms) |
| 5432 | pg1/2/3 | PostgreSQL |
| 8008 | pg1/2/3 | Patroni REST API (health checks) |
| 5000 | haproxy | Write port → primary |
| 5001 | haproxy | Read port → replicas |
| 7000 | haproxy | HAProxy stats dashboard |
DR site (mirrors primary, different IPs):
| Port | Host | Purpose |
|---|---|---|
| 2379 | dr-pg1/2/3 | etcd client |
| 2380 | dr-pg1/2/3 | etcd peer |
| 5432 | dr-pg1/2/3 | PostgreSQL (read-only in standby mode) |
| 8008 | dr-pg1/2/3 | Patroni REST API |
| 5000 | dr-haproxy | Write port (active only after DR promotion) |
| 5001 | dr-haproxy | Read port → DR replicas |
Summary¶
Here is what each layer provides:
-
etcd is the neutral arbitrator. It stores cluster state and leader lock — no PostgreSQL node makes a promotion decision without consensus from the etcd quorum.
-
Patroni 4.x is the orchestration layer. It manages initialization, replication, health checks, and automatic failover. Its REST API exposes the current cluster topology, which HAProxy uses for routing.
-
PostgreSQL 17 from PGDG installs cleanly on RHEL 9 / Rocky Linux with a single repo RPM and stays on the community release track — security patches and new features arrive immediately without waiting for a vendor rebuild.
-
pgbackrest 2.58 handles point-in-time backups to S3 and doubles as Patroni's fast replica creation method —
pg_basebackupfrom a running primary is slow; pgbackrest restores from the backup repository in parallel. -
HAProxy routes writes to the current primary (port 5000) and reads to replicas (port 5001) using Patroni's health check endpoints — applications get read/write splitting without code changes.
-
Patroni Standby Cluster (Part 10) extends HA across sites. The DR cluster streams from the primary cluster's leader via a dedicated replication slot. A five-step runbook promotes it to an independent primary in under 5 minutes — without touching the primary site's etcd or Patroni configuration.
-
pgbackrest PITR Restore is the safety net behind the standby cluster. If the standby is also unavailable, a clean restore from the DR-region S3 bucket delivers the database to any point in time, including the moment before a data corruption event propagated.
What this setup survives:
| Event | Recovery mechanism | RTO |
|---|---|---|
| Single node failure | Patroni automatic failover | < 30 seconds |
| All primary-site nodes down | DR standby cluster promotion | 2–5 minutes |
| Data corruption (replicated) | pgbackrest PITR restore | 30–120 minutes |
| Full region loss | pgbackrest restore from DR S3 | 30–120 minutes |
The result is a database stack that handles node failures automatically, can survive a complete site loss, keeps point-in-time backups off-site, and costs nothing to license. Test the DR runbook quarterly — a plan that has never been executed is not a plan.
Questions or discussion? Connect on LinkedIn, X or reach out via email.
Discussion
Have thoughts on this post? Share them below — questions, corrections, or your own experience are all welcome.