Skip to content

PostgreSQL Server Tuning: The Complete 2026 Configuration Guide

A freshly installed PostgreSQL server is configured to run safely on a shared laptop from 2005. shared_buffers = 128MB. work_mem = 4MB. max_connections = 100. These defaults have not changed in over a decade — and they are deliberately conservative so the server starts without crashing on anything.

Your production server is not a 2005 laptop.

This guide covers every postgresql.conf parameter that matters, the OS-level settings that underpin them, and a ready-to-use configuration reference for common server sizes — all updated for PostgreSQL 17 on Linux in 2026. It is the server configuration companion to the query and index tuning guide, which covers EXPLAIN, indexes, and PgBouncer.


How PostgreSQL Uses Memory

Before touching any parameter, understand the memory model. PostgreSQL has two distinct memory regions:

┌─────────────────────────────────────────────────┐
│                   SYSTEM RAM                     │
│                                                  │
│  ┌────────────────────────┐  ┌────────────────┐  │
│  │   Shared Memory Pool   │  │  OS Page Cache │  │
│  │  (shared_buffers)      │  │  (free memory) │  │
│  │  ┌──────────────────┐  │  │                │  │
│  │  │  Buffer Cache    │  │  │  PostgreSQL    │  │
│  │  │  (data pages)    │  │  │  reads data    │  │
│  │  ├──────────────────┤  │  │  through OS    │  │
│  │  │  WAL Buffers     │  │  │  cache too     │  │
│  │  │  (wal_buffers)   │  │  └────────────────┘  │
│  │  └──────────────────┘  │                      │
│  └────────────────────────┘                      │
│                                                  │
│  ┌────────────────────────┐                      │
│  │  Per-Process Memory    │  (× max_connections) │
│  │  work_mem × ops        │                      │
│  │  + stack + overhead    │                      │
│  └────────────────────────┘                      │
└─────────────────────────────────────────────────┘

Shared memory (shared_buffers) is allocated once at startup and shared by all processes. Per-process memory (work_mem) is allocated per-query-operation and scales with the number of connections. The OS page cache is a second cache layer that PostgreSQL reads through — this is why effective_cache_size matters even though it allocates nothing.

PostgreSQL Process Architecture

Process type Role
postmaster Daemon: spawns all other processes, handles connections
checkpointer Flushes dirty pages from shared_buffers to disk at checkpoints
bgwriter Incrementally writes dirty pages between checkpoints
walwriter Flushes WAL buffers to WAL files
autovacuum launcher Schedules autovacuum workers for each database
autovacuum worker Runs VACUUM and ANALYZE on tables
archiver Copies WAL segments to archive location (if enabled)
stats collector Aggregates runtime statistics (merged into postmaster in PG15+)
backend (per connection) Executes queries for one client session
parallel worker Executes a partition of a parallel query

Each backend process consumes private memory on top of shared memory — which is why max_connections has hard limits.


Part 1: Memory Parameters

shared_buffers — The Primary Cache

PostgreSQL's own page cache. When a query reads a page, PostgreSQL checks shared_buffers first, then the OS page cache, then disk.

shared_buffers = 8GB    # 25% of total RAM

Formula: shared_buffers = total_RAM × 0.25

Server RAM Recommended shared_buffers
4 GB 1 GB
16 GB 4 GB
32 GB 8 GB
64 GB 16 GB
128 GB 16–32 GB

Going above 25% has diminishing returns because PostgreSQL also benefits from OS page cache for sequential scans. Hitting 40% and above can starve the OS cache, slowing sequential reads.

Huge pages: For shared_buffers ≥ 8 GB, enable huge pages to reduce TLB pressure:

# postgresql.conf
huge_pages = try    # try | on | off
# /etc/sysctl.conf — set before PostgreSQL starts
vm.nr_hugepages = 4096   # (shared_buffers_bytes / 2MB) + 10% overhead
sysctl -p

work_mem — Per-Sort/Hash Operation Memory

Allocated per sort, hash join, or bitmap index operation — not per connection. A complex query can run 5–10 operations simultaneously; a server with 200 connections could peak at thousands of concurrent work_mem allocations.

work_mem = 32MB    # Conservative OLTP default

Safe formula: work_mem = (total_RAM × 0.25) / max_connections

Server RAM max_connections Safe work_mem
16 GB 100 40 MB
32 GB 100 80 MB
32 GB 200 40 MB
64 GB 200 80 MB

Identifying spills to disk — the sign that work_mem is too low:

-- Look for "Sort Method: external merge Disk" in query plans
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM large_table ORDER BY column1, column2;

Raise work_mem per session for analytics queries rather than globally:

SET work_mem = '512MB';
-- Now run the heavy analytic query
SELECT ...;

maintenance_work_mem — Vacuum and Index Build Memory

Used by VACUUM, CREATE INDEX, CLUSTER, and ALTER TABLE ADD FOREIGN KEY. Unlike work_mem, only a few maintenance operations run simultaneously, so this can be set much higher.

maintenance_work_mem = 2GB    # 1–4 GB is typical for 32 GB+ servers

Autovacuum workers ignore maintenance_work_mem and use autovacuum_work_mem instead:

autovacuum_work_mem = 512MB   # -1 means fall back to maintenance_work_mem

effective_cache_size — Planner Cache Estimate

This parameter allocates nothing. It tells the query planner how much total cache (shared_buffers + OS page cache) it can assume is available. A higher value makes the planner prefer index scans over sequential scans.

effective_cache_size = 24GB   # 75% of total RAM

Formula: effective_cache_size = total_RAM × 0.75

Check your actual OS cache before setting:

free -h
#              total        used        free      shared  buff/cache   available
# Mem:          31Gi        6.2Gi       1.1Gi     1.5Gi      24Gi       23Gi
# → effective_cache_size ≈ 24GB (available column)

temp_buffers — Temporary Table Cache

Per-session cache for temporary tables. Default 8MB is adequate for most applications. Only increase if you create large temporary tables within sessions.

temp_buffers = 32MB    # Increase only if using large temp tables

Part 2: Connection Management

max_connections

Each connection spawns a backend process consuming ~5–10 MB of private memory plus stack overhead. Setting this too high wastes RAM and increases locking contention.

max_connections = 200   # Direct connections without a pooler
max_connections = 100   # With PgBouncer in front (recommended)

The hard constraint:

Peak memory = shared_buffers + (max_connections × work_mem × ops_per_query) + OS overhead

With work_mem = 32MB, 3 ops per query, and 200 connections: 200 × 32MB × 3 = 19.2 GB in the worst case.

Rule: Use PgBouncer in transaction mode to multiplex application connections. Set PostgreSQL max_connections to the number of real concurrent queries you expect, not the number of application threads.

superuser_reserved_connections

superuser_reserved_connections = 3   # Default: 3

Reserves connections for DBA access even when the server is saturated. Never set to 0 — you'll lock yourself out during connection storms.


Part 3: WAL and Checkpoint Tuning

WAL (Write-Ahead Log) is the durability mechanism: every write is recorded in WAL before data pages are modified. Checkpoint is the process that flushes modified data pages to disk and marks a safe WAL restart point.

Poorly tuned checkpoints cause write storms — bursts of I/O that pause queries while dirty pages are flushed.

wal_buffers

In-memory buffer for WAL data before it's written to WAL files. Default (-1) auto-sets to 1/32 of shared_buffers, capped at 64MB. For high-write workloads, set explicitly:

wal_buffers = 64MB    # -1 auto; 64MB is the practical ceiling

checkpoint_completion_target

Spreads checkpoint I/O over time instead of flushing everything at the checkpoint moment. Default 0.9 means "finish writing all dirty pages within 90% of the checkpoint interval." This is already a good default — don't set below 0.7.

checkpoint_completion_target = 0.9

checkpoint_timeout and max_wal_size

checkpoint_timeout sets the maximum time between checkpoints (default 5min). max_wal_size triggers a checkpoint when WAL grows past this size (default 1GB).

checkpoint_timeout = 15min    # Longer = fewer checkpoints = less write I/O
max_wal_size       = 4GB      # Larger = fewer checkpoints; needs more recovery time after crash
min_wal_size       = 1GB      # Keep recycled WAL segments instead of deleting

Diagnosing checkpoint storms:

log_checkpoints = on    # Log every checkpoint with its statistics
LOG: checkpoint complete: wrote 14823 buffers (22%); 0 WAL file(s) added,
     1 removed, 5 recycled; write=120.192 s, sync=0.014 s, total=120.221 s;
     sync files=12, longest=0.014 s, shortest=0.001 s;
     distance=43901 kB, estimate=44765 kB

If write time is close to checkpoint_timeout, the checkpointer is writing continuously. Increase max_wal_size to spread writes over more time.

wal_compression

PostgreSQL 17 supports LZ4, Zstd, and zlib for WAL compression. Compressing WAL reduces I/O at the cost of CPU — almost always worth it on modern hardware.

wal_compression = lz4    # or zstd for better compression; default: off

synchronous_commit

Controls when WAL is considered durable. Default on waits for WAL to be flushed to disk before confirming a commit to the client — safe but slower.

# For maximum durability (default):
synchronous_commit = on

# For higher write throughput (risk: lose ~wal_writer_delay ms of commits on crash):
synchronous_commit = off      # Never for financial data

# For replication:
synchronous_commit = remote_write   # Waits for standby to receive WAL (not fsync)
synchronous_commit = remote_apply   # Waits for standby to apply WAL (zero data loss)

Part 4: Storage and I/O Tuning

random_page_cost and seq_page_cost

These tell the planner the relative cost of random vs sequential page reads. The defaults (4.0 and 1.0) were calibrated for spinning disk. SSD changes the ratio dramatically.

# SSD (NVMe or SATA):
random_page_cost = 1.1
seq_page_cost    = 1.0

# Spinning disk (7200 RPM):
random_page_cost = 4.0
seq_page_cost    = 1.0

# Cloud block storage (EBS, GCP PD — latency varies):
random_page_cost = 1.5

Impact: Setting random_page_cost too high makes the planner prefer sequential scans over index scans — the opposite of what you want on SSD.

effective_io_concurrency

How many concurrent I/O requests PostgreSQL issues when doing bitmap heap scans. Default 1 is for spinning disk. SSD handles many concurrent requests efficiently.

effective_io_concurrency = 200    # SSD/NVMe
effective_io_concurrency = 2      # Spinning disk

wal_level and archive settings

wal_level = replica    # Minimum for streaming replication; 'logical' for logical decoding

If using WAL archiving for PITR:

archive_mode = on
archive_command = 'pgbackrest --stanza=pg_cluster archive-push %p'
archive_timeout = 60    # Force segment switch every 60s even with low write volume

Part 5: Parallel Query Tuning

PostgreSQL can parallelize sequential scans, joins, aggregations, and index builds. The key parameters:

max_worker_processes      = 8     # Total parallel worker pool (system-wide)
max_parallel_workers      = 8     # Subset available for parallel queries
max_parallel_workers_per_gather = 4    # Max workers for a single query node
max_parallel_maintenance_workers = 4   # For CREATE INDEX, VACUUM

Rule of thumb: max_worker_processes = CPU cores. max_parallel_workers_per_gather = CPU cores / 2.

When Parallelism Kicks In

PostgreSQL only parallelizes a query when the estimated cost exceeds parallel_setup_cost and individual chunk cost exceeds parallel_tuple_cost. For OLTP queries touching < 10M rows, parallelism rarely helps. For analytics queries it can be 4–8× faster.

Force/disable parallelism per session:

-- Enable maximum parallelism for this session
SET max_parallel_workers_per_gather = 8;

-- Disable (useful for debugging plan differences)
SET max_parallel_workers_per_gather = 0;

Parallel Index Builds (PostgreSQL 11+)

max_parallel_maintenance_workers = 4
-- Build index using 4 workers simultaneously
SET max_parallel_maintenance_workers = 4;
CREATE INDEX CONCURRENTLY idx_large_table ON large_table (column1);

Part 6: Autovacuum Server Configuration

Autovacuum is PostgreSQL's background maintenance process. It reclaims dead tuples, updates table statistics, and prevents transaction ID wraparound. Under-tuned autovacuum is the most common cause of unexpected performance degradation on growing databases.

Core Autovacuum Parameters

# Enable (should always be on in production)
autovacuum = on

# Worker pool
autovacuum_max_workers        = 5      # Default: 3; increase for busy databases
autovacuum_naptime            = 30s    # How often the launcher checks for work

# Trigger thresholds (when to start vacuuming)
autovacuum_vacuum_threshold   = 500    # Minimum dead tuples before trigger
autovacuum_vacuum_scale_factor = 0.05  # + 5% of live rows (default: 0.20 = 20%)
autovacuum_analyze_threshold  = 250
autovacuum_analyze_scale_factor = 0.02

# Transaction ID wraparound emergency vacuum
autovacuum_freeze_max_age     = 150000000   # Default: 200M; reduce for safety
autovacuum_multixact_freeze_max_age = 350000000

Why scale_factor matters: With the default 0.20, a 100M row table tolerates 20M dead tuples before vacuum runs. At a write rate of 1,000 updates/sec, that's a 5.5-hour lag. Reduce to 0.05 to trigger at 5M dead tuples (about 80 minutes).

Autovacuum I/O Cost Throttling

Autovacuum is throttled to avoid disrupting foreground queries. The throttle is controlled by a credit system: each page read/write consumes credits; when credits are exhausted, autovacuum sleeps.

autovacuum_vacuum_cost_delay  = 2ms     # Sleep duration when cost limit hit (SSD: 2ms, disk: 20ms)
autovacuum_vacuum_cost_limit  = 400     # Default: 200; credits per vacuum cycle

For SSD servers: Lower delay, higher limit:

autovacuum_vacuum_cost_delay  = 2ms
autovacuum_vacuum_cost_limit  = 800

For spinning disk: Higher delay, lower limit to avoid I/O saturation:

autovacuum_vacuum_cost_delay  = 20ms
autovacuum_vacuum_cost_limit  = 200

Monitor Autovacuum Health

-- Tables where autovacuum is falling behind
SELECT
    schemaname,
    relname AS table,
    n_live_tup,
    n_dead_tup,
    round(n_dead_tup * 100.0 / nullif(n_live_tup + n_dead_tup, 0), 1) AS dead_pct,
    last_autovacuum,
    last_autoanalyze,
    autovacuum_count
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC
LIMIT 20;
-- Currently running autovacuum workers
SELECT
    datname,
    pid,
    now() - xact_start AS runtime,
    query
FROM pg_stat_activity
WHERE query LIKE '%autovacuum%'
  AND query NOT LIKE '%pg_stat_activity%'
ORDER BY xact_start;

Transaction ID Wraparound — the Emergency You Must Never Hit

PostgreSQL uses 32-bit transaction IDs. At 2 billion transactions, it wraps around. PostgreSQL enters read-only emergency mode to prevent data corruption when the wraparound is imminent.

Check how close you are:

SELECT
    datname,
    age(datfrozenxid) AS xid_age,
    2147483647 - age(datfrozenxid) AS xids_remaining
FROM pg_database
ORDER BY age(datfrozenxid) DESC;

If xid_age approaches 1.5 billion, trigger an emergency vacuum immediately:

vacuumdb --all --freeze --analyze-in-stages -j 4

Part 7: Logging and Observability

Turn on the right logs at the right level. Too little logging means you miss slow queries. Too much means the log fills with noise.

# Where to log
log_destination        = 'stderr'
logging_collector      = on
log_directory          = '/var/log/postgresql'
log_filename           = 'postgresql-%Y-%m-%d.log'
log_rotation_age       = 1d
log_rotation_size      = 100MB
log_file_mode          = 0640

# What to log
log_min_messages        = warning      # Server log level
log_min_error_statement = error        # Log failed statements at ERROR+

# Slow query logging — the most important setting
log_min_duration_statement = 1000      # Log queries taking > 1 second (in ms)
                                       # Set to 0 to log everything (dev only)

# Checkpoint and lock logging
log_checkpoints         = on
log_lock_waits          = on
deadlock_timeout        = 1s           # Log deadlocks after 1 second

# Connection logging (enable if debugging connection storms)
log_connections         = off          # High volume on busy servers
log_disconnections      = off

# Temporary file logging (catch queries spilling to disk)
log_temp_files          = 0            # Log any temp file creation (0 = all sizes)

# Statement detail
log_line_prefix         = '%m [%p] %q%u@%d '   # Timestamp [PID] user@database
log_statement           = 'ddl'                  # Log all DDL (CREATE, ALTER, DROP)

auto_explain — Automatic EXPLAIN for Slow Queries

Automatically logs EXPLAIN ANALYZE output for any query exceeding a threshold. Essential for tracking down intermittent slow queries without manual intervention.

shared_preload_libraries = 'pg_stat_statements,auto_explain'

auto_explain.log_min_duration = 2000    # Log plans for queries > 2 seconds
auto_explain.log_analyze      = true    # Include actual vs estimated rows
auto_explain.log_buffers      = true    # Include buffer I/O stats
auto_explain.log_nested_statements = true  # Include subquery plans
auto_explain.sample_rate      = 1.0     # 1.0 = every query; 0.1 = 10% sample

Part 8: OS-Level Tuning

PostgreSQL performance depends on OS settings as much as postgresql.conf. These settings survive reboots when set in /etc/sysctl.conf.

Virtual Memory

# /etc/sysctl.conf

# Prefer swapping file cache over anonymous memory (PostgreSQL manages its own cache)
vm.swappiness = 10           # Default: 60; lower = less aggressive swapping

# Dirty page writeback — controls when kernel starts writing dirty pages
vm.dirty_ratio          = 10   # Flush when dirty pages hit 10% of RAM
vm.dirty_background_ratio = 5  # Start background flush at 5% of RAM

# For large shared_buffers (> 8 GB), increase shared memory limits
kernel.shmmax = 17179869184   # Bytes: >= shared_buffers size
kernel.shmall = 4194304       # Pages: kernel.shmmax / page_size (usually 4096)
sysctl -p   # Apply without reboot

Transparent Huge Pages — Disable for PostgreSQL

THP (Transparent Huge Pages) causes latency spikes as the kernel compacts memory. PostgreSQL manages its own huge pages via huge_pages = try. Disable THP:

# /etc/rc.local or a systemd service
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag
# Verify
cat /sys/kernel/mm/transparent_hugepage/enabled
# [always] madvise never  → should show: always madvise [never]

File Descriptor Limits

Each PostgreSQL connection opens multiple file descriptors (data files, WAL, sockets). The OS default is often 1024 — far too low for a busy database server.

# /etc/security/limits.conf
postgres  soft  nofile  65536
postgres  hard  nofile  65536

# For systemd services (/etc/systemd/system/postgresql-17.service.d/limits.conf):
[Service]
LimitNOFILE=65536
# Verify PostgreSQL's actual limit
cat /proc/$(pgrep -f "postgres -D")/limits | grep "open files"

I/O Scheduler

For NVMe SSDs, use none (no scheduler — the device handles its own queue):

# Check current scheduler
cat /sys/block/nvme0n1/queue/scheduler
# [mq-deadline] none kyber bfq

# Set none for NVMe
echo none > /sys/block/nvme0n1/queue/scheduler

# Persist via udev rule (/etc/udev/rules.d/60-postgresql-io.rules):
ACTION=="add|change", KERNEL=="nvme*", ATTR{queue/scheduler}="none"
ACTION=="add|change", KERNEL=="sd*", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="mq-deadline"

Part 9: Ready-to-Use Configuration Templates

Template A: 16 GB RAM, 8 vCPU, NVMe SSD — OLTP Web Application

# Memory
shared_buffers          = 4GB
effective_cache_size    = 12GB
work_mem                = 32MB
maintenance_work_mem    = 1GB
autovacuum_work_mem     = 512MB
temp_buffers            = 16MB
huge_pages              = try

# Connections
max_connections         = 100    # + PgBouncer for application connections
superuser_reserved_connections = 3

# WAL and Checkpoints
wal_buffers             = 64MB
wal_compression         = lz4
wal_level               = replica
max_wal_size            = 4GB
min_wal_size            = 1GB
checkpoint_timeout      = 15min
checkpoint_completion_target = 0.9
synchronous_commit      = on

# Storage
random_page_cost        = 1.1
seq_page_cost           = 1.0
effective_io_concurrency = 200

# Parallel Query
max_worker_processes    = 8
max_parallel_workers    = 8
max_parallel_workers_per_gather = 2
max_parallel_maintenance_workers = 4

# Autovacuum
autovacuum                        = on
autovacuum_max_workers            = 5
autovacuum_vacuum_scale_factor    = 0.05
autovacuum_analyze_scale_factor   = 0.02
autovacuum_vacuum_cost_delay      = 2ms
autovacuum_vacuum_cost_limit      = 400

# Logging
log_min_duration_statement = 1000
log_checkpoints            = on
log_lock_waits             = on
log_temp_files             = 0
log_line_prefix            = '%m [%p] %q%u@%d '
log_statement              = 'ddl'
deadlock_timeout           = 1s

# Extensions (requires restart)
shared_preload_libraries = 'pg_stat_statements,auto_explain'
pg_stat_statements.max   = 10000
pg_stat_statements.track = all
auto_explain.log_min_duration = 2000
auto_explain.log_analyze      = true
auto_explain.log_buffers      = true

Template B: 64 GB RAM, 32 vCPU, NVMe SSD — Analytics / Mixed Workload

# Memory
shared_buffers          = 16GB
effective_cache_size    = 48GB
work_mem                = 128MB     # Higher for analytics joins/sorts
maintenance_work_mem    = 4GB
autovacuum_work_mem     = 1GB
huge_pages              = on

# Connections
max_connections         = 200
superuser_reserved_connections = 5

# WAL and Checkpoints
wal_buffers             = 64MB
wal_compression         = zstd
wal_level               = replica
max_wal_size            = 16GB
min_wal_size            = 4GB
checkpoint_timeout      = 30min
checkpoint_completion_target = 0.9

# Storage
random_page_cost        = 1.1
effective_io_concurrency = 200

# Parallel Query (analytics benefits from more parallelism)
max_worker_processes    = 32
max_parallel_workers    = 32
max_parallel_workers_per_gather = 8
max_parallel_maintenance_workers = 8

# Autovacuum
autovacuum_max_workers            = 8
autovacuum_vacuum_scale_factor    = 0.05
autovacuum_analyze_scale_factor   = 0.02
autovacuum_vacuum_cost_delay      = 2ms
autovacuum_vacuum_cost_limit      = 800

Part 10: Monitoring Queries Reference

Keep these queries in your DBA toolkit. Run them weekly or trigger them from alerting.

Buffer Cache Hit Rate

-- Target: > 99% for OLTP; < 95% signals shared_buffers is too small
SELECT
    schemaname,
    relname,
    heap_blks_hit,
    heap_blks_read,
    round(heap_blks_hit * 100.0 / nullif(heap_blks_hit + heap_blks_read, 0), 2) AS hit_rate
FROM pg_statio_user_tables
ORDER BY heap_blks_read DESC
LIMIT 15;

Top Tables in Buffer Cache

CREATE EXTENSION IF NOT EXISTS pg_buffercache;

SELECT
    c.relname,
    pg_size_pretty(count(*) * 8192) AS buffered,
    round(count(*) * 100.0 /
        (SELECT setting::int FROM pg_settings WHERE name = 'shared_buffers'), 1) AS pct_of_shared_buffers,
    round(count(*) * 8192 * 100.0 / pg_relation_size(c.oid), 1) AS pct_of_table_cached
FROM pg_class c
JOIN pg_buffercache b ON b.relfilenode = c.relfilenode
JOIN pg_database d ON b.reldatabase = d.oid AND d.datname = current_database()
GROUP BY c.oid, c.relname
ORDER BY count(*) DESC
LIMIT 10;

Active Connections by State

SELECT
    state,
    wait_event_type,
    wait_event,
    count(*) AS connections
FROM pg_stat_activity
WHERE pid <> pg_backend_pid()
GROUP BY state, wait_event_type, wait_event
ORDER BY connections DESC;

Blocking Queries

SELECT
    blocked.pid            AS blocked_pid,
    blocked.usename        AS blocked_user,
    left(blocked.query, 80) AS blocked_query,
    blocking.pid           AS blocking_pid,
    blocking.usename       AS blocking_user,
    left(blocking.query, 80) AS blocking_query,
    now() - blocked.query_start AS wait_duration
FROM pg_stat_activity AS blocked
JOIN pg_stat_activity AS blocking
    ON blocking.pid = ANY(pg_blocking_pids(blocked.pid))
ORDER BY wait_duration DESC;

Transaction ID Age (Wraparound Safety)

SELECT
    datname,
    age(datfrozenxid)                          AS xid_age,
    2147483647 - age(datfrozenxid)             AS xids_remaining,
    round((age(datfrozenxid)::numeric / 2147483647) * 100, 1) AS pct_used
FROM pg_database
ORDER BY age(datfrozenxid) DESC;
-- Alert if pct_used > 50%

Long-Running Queries

SELECT
    pid,
    now() - query_start AS duration,
    state,
    wait_event_type,
    wait_event,
    left(query, 120) AS query
FROM pg_stat_activity
WHERE state != 'idle'
  AND query_start < now() - interval '30 seconds'
ORDER BY duration DESC;

Top Queries by Total Time (pg_stat_statements)

SELECT
    round(total_exec_time::numeric, 0)          AS total_ms,
    calls,
    round(mean_exec_time::numeric, 2)           AS mean_ms,
    round(stddev_exec_time::numeric, 2)         AS stddev_ms,
    round(rows::numeric / nullif(calls, 0), 1) AS avg_rows,
    left(query, 120)                            AS query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 15;

Table Bloat (Live vs Dead Rows)

SELECT
    schemaname,
    relname AS table,
    n_live_tup,
    n_dead_tup,
    round(n_dead_tup * 100.0 / nullif(n_live_tup + n_dead_tup, 0), 1) AS dead_pct,
    last_autovacuum,
    last_autoanalyze,
    autovacuum_count
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC
LIMIT 20;

Summary

PostgreSQL server tuning is a one-time investment that compounds every day. A correctly configured server handles 5–10× the throughput of a default installation — without changing a single query.

The Configuration Hierarchy

Layer Parameters Impact
Memory shared_buffers, work_mem, effective_cache_size Highest — plan choices, disk I/O
WAL / Checkpoint max_wal_size, checkpoint_timeout, wal_compression High — write throughput, I/O spikes
Storage random_page_cost, effective_io_concurrency High — plan choices on SSD vs disk
Autovacuum scale_factor, cost_delay, cost_limit High — table bloat, statistics accuracy
Parallelism max_parallel_workers_per_gather, max_worker_processes Medium — analytics queries
Connections max_connections + PgBouncer Medium — memory safety, stability
OS vm.swappiness, THP, file descriptors, I/O scheduler Medium — underpins all of the above
Logging log_min_duration_statement, auto_explain, log_checkpoints Medium — observability

The 10 Rules That Compound

  1. shared_buffers = 25% of RAM. Never leave it at 128MB. The default dates from PostgreSQL 8.0 era hardware.
  2. effective_cache_size = 75% of RAM. It allocates nothing — it costs zero to set high and wrong plan choices cost throughput.
  3. work_mem is per-operation, not per-connection. Calculate the worst case before setting it: work_mem × ops × connections.
  4. Enable huge pages when shared_buffers ≥ 8 GB. Reduces TLB pressure measurably.
  5. max_wal_size = 4× the data written per checkpoint_timeout. Find the right size from checkpoint logs, not guesswork.
  6. wal_compression = lz4 is free throughput on CPU-bound servers. Compresses WAL with negligible CPU cost.
  7. random_page_cost = 1.1 on SSD. Leaving it at 4.0 (the spinning disk default) makes the planner distrust your indexes.
  8. Autovacuum scale_factor = 0.05. The default 0.20 means a 100M row table accumulates 20M dead tuples. That is table rot, not a feature.
  9. log_min_duration_statement = 1000 always on. Slow query log is the earliest warning system for index regressions.
  10. Monitor transaction ID age weekly. Wraparound emergency mode is one of the few PostgreSQL failure modes that requires a hard stop. Know your xids_remaining before it becomes a crisis.

A database that is correctly configured the first time is a database that doesn't page you at 3 AM.


Tested on PostgreSQL 17.x, Rocky Linux 9.x. Configuration parameters and defaults verified against PostgreSQL 17 official documentation.

Questions or discussion? Connect on LinkedIn, X or reach out via email.

Discussion

Have thoughts on this post? Share them below — questions, corrections, or your own experience are all welcome.