PostgreSQL Backup Strategies: pg_dump vs pg_basebackup

Comprehensive postgresql backup strategies: pg_dump vs pg_basebackup tutorial with step-by-step instructions for PostgreSQL administrators.

Discover expert insights on PostgreSQL backup strategies comparing pg_dump and pg_basebackup to ensure your data is always protected and recoverable.


Overview

Choosing the right backup strategy is critical for database reliability and disaster recovery. PostgreSQL offers two primary backup tools: pg_dump for logical backups and pg_basebackup for physical backups. Understanding when to use each is essential for every DBA.

Critical reminder: Backups are useless unless you regularly test your restore procedures!


Why Backup Strategy Matters

In today's data-driven world, data loss can be catastrophic. A robust backup strategy ensures:

  • Business continuity during disasters
  • Recovery from human errors (accidental deletions, bad updates)
  • Compliance requirements (data retention policies)
  • Development and testing environments

pg_dump: Logical Backups

What is pg_dump?

pg_dump creates logical backups by extracting database objects as SQL statements. It's ideal for selective backups, database migrations, and version upgrades.

💡 Key Features

  • Selective backups: Backup specific tables, schemas, or databases
  • Platform independent: Restore to different OS or PostgreSQL versions
  • Human readable: Plain SQL output (with -Fp format)
  • Online backups: No downtime required
  • Flexible restore: Cherry-pick specific objects during restoration

Usage Examples

Basic Database Backup

# Backup entire database to custom format
pg_dump -U postgres -d mydb -Fc -f mydb_backup.dump

# Backup to SQL file (human readable)
pg_dump -U postgres -d mydb -Fp -f mydb_backup.sql

Selective Backups

# Backup specific table
pg_dump -U postgres -d mydb -t users -Fc -f users_backup.dump

# Backup specific schema
pg_dump -U postgres -d mydb -n public -Fc -f public_schema.dump

# Exclude specific tables
pg_dump -U postgres -d mydb -T logs -T temp_data -Fc -f mydb_without_logs.dump

Backup All Databases

# Use pg_dumpall for cluster-wide backup
pg_dumpall -U postgres -f cluster_backup.sql

# Backup only global objects (roles, tablespaces)
pg_dumpall -U postgres -g -f globals.sql

Restore from pg_dump

# Restore from custom format
pg_restore -U postgres -d mydb -v mydb_backup.dump

# Restore from SQL file
psql -U postgres -d mydb -f mydb_backup.sql

# Restore with parallel jobs (faster)
pg_restore -U postgres -d mydb -j 4 mydb_backup.dump

pg_dump Formats

FormatFlagUse CaseCompressionParallel Restore
Plain-FpHuman readable, version controlNoNo
Custom-FcFlexible restore, best for productionYesYes
Directory-FdLarge databases, parallel dumpYesYes
Tar-FtArchive formatNoNo

✅ When to Use pg_dump

  • Database migrations across PostgreSQL versions
  • Selective backups of specific tables/schemas
  • Development/testing environment refreshes
  • Small to medium databases (< 100GB)
  • Cross-platform migrations
  • Schema-only or data-only backups

⚠️ Limitations

  • Slower for large databases: Dumps and restores can take hours
  • Not point-in-time recovery: Only captures state at backup time
  • Requires more storage: Plain text SQL is less efficient
  • No binary data optimization: Large BLOBs can be slow

pg_basebackup: Physical Backups

What is pg_basebackup?

pg_basebackup creates physical backups by copying the entire PostgreSQL data directory. It's the foundation for Point-in-Time Recovery (PITR) and standby server setup.

💡 Key Features

  • Complete cluster backup: Copies entire data directory
  • Point-in-Time Recovery: Combined with WAL archiving
  • Fast backup and restore: Binary copy is faster than logical
  • Standby server setup: Used for streaming replication
  • Consistent snapshots: Ensures transactional consistency

Usage Examples

Basic Physical Backup

# Backup to directory
pg_basebackup -h localhost -U replicator -D /backup/pgdata -Fp -P

# Backup with compression (tar format)
pg_basebackup -h localhost -U replicator -D /backup -Ft -z -P

Backup for PITR Setup

# Configure WAL archiving in postgresql.conf
# archive_mode = on
# archive_command = 'cp %p /archive/%f'

# Create base backup
pg_basebackup -h localhost -U replicator -D /backup/base -Fp -P -R

# -R flag creates recovery configuration for standby

Streaming Replication Setup

# Create standby server from backup
pg_basebackup -h primary_server -U replicator \
  -D /var/lib/postgresql/data \
  -P --wal-method=stream -R

Command Options Explained

Common flags:

  • -D /path: Target directory for backup
  • -Fp: Plain format (directory copy)
  • -Ft: Tar format (compressed archives)
  • -z: Enable gzip compression
  • -P: Show progress
  • -R: Create standby configuration files
  • --wal-method=stream: Stream WAL during backup

Restore from pg_basebackup

# Stop PostgreSQL service
systemctl stop postgresql

# Replace data directory
rm -rf /var/lib/postgresql/data
cp -r /backup/pgdata /var/lib/postgresql/data

# Set proper permissions
chown -R postgres:postgres /var/lib/postgresql/data
chmod 700 /var/lib/postgresql/data

# Start PostgreSQL
systemctl start postgresql

🚀 Point-in-Time Recovery (PITR)

Combine pg_basebackup with WAL archiving for PITR capabilities:

Step 1: Configure WAL Archiving

# postgresql.conf
wal_level = replica
archive_mode = on
archive_command = 'test ! -f /archive/%f && cp %p /archive/%f'
archive_timeout = 300

Step 2: Create Base Backup

pg_basebackup -D /backup/base -Fp -P

Step 3: Restore to Specific Point in Time

Create recovery.conf (PostgreSQL < 12) or recovery.signal (PostgreSQL 12+):

# recovery.conf (PostgreSQL < 12)
restore_command = 'cp /archive/%f %p'
recovery_target_time = '2025-08-05 14:30:00'
recovery_target_action = 'promote'

For PostgreSQL 12+, add to postgresql.conf:

restore_command = 'cp /archive/%f %p'
recovery_target_time = '2025-08-05 14:30:00'

Then create recovery.signal file:

touch /var/lib/postgresql/data/recovery.signal

✅ When to Use pg_basebackup

  • Large databases (> 100GB)
  • Point-in-Time Recovery requirements
  • Disaster recovery planning
  • Standby server setup for replication
  • Fast backup and restore needs
  • Complete cluster backup including all databases

⚠️ Limitations

  • Version specific: Must restore to same PostgreSQL major version
  • Platform dependent: OS and architecture must match
  • All or nothing: Cannot selectively restore individual tables
  • Larger storage: Includes all databases in cluster
  • Requires replication user: Need proper permissions

Comparison: pg_dump vs pg_basebackup

Featurepg_dumppg_basebackup
Backup TypeLogicalPhysical
SpeedSlowerFaster
SizeLarger (text)Smaller (binary)
Selective Restore✅ Yes❌ No
Cross-Version✅ Yes❌ No
PITR Support❌ No✅ Yes
Online Backup✅ Yes✅ Yes
Best ForSmall DBs, migrationsLarge DBs, DR

Hybrid Backup Strategy

💡 Best Practice: Use Both!

A robust backup strategy combines both methods:

Daily: pg_basebackup + WAL Archiving

  • Full physical backup weekly
  • Continuous WAL archiving for PITR
  • Retention: 30 days

Weekly: pg_dump

  • Logical backup of critical databases
  • Selective backups of important tables
  • Retention: 90 days

Example Backup Schedule

#!/bin/bash
# Daily physical backup
0 1 * * * pg_basebackup -D /backup/daily/$(date +\%Y\%m\%d) -Fp -P

# Weekly logical backup (Sunday)
0 2 * * 0 pg_dump -U postgres -d production -Fc -f /backup/weekly/prod_$(date +\%Y\%m\%d).dump

# Monthly full cluster backup
0 3 1 * * pg_dumpall -U postgres -f /backup/monthly/cluster_$(date +\%Y\%m\%d).sql

Backup Best Practices

✅ Essential Guidelines

  1. Test restores regularly: Schedule quarterly restore drills
  2. Automate backups: Use cron jobs or backup tools
  3. Monitor backup status: Alert on failures
  4. Encrypt backups: Protect sensitive data
  5. Off-site storage: Use cloud storage or remote servers
  6. Document procedures: Maintain runbooks
  7. Validate backups: Check backup integrity
  8. Set retention policies: Balance storage costs and recovery needs

🚀 Performance Tips

  • Use compression: Save storage space (-z flag)
  • Parallel operations: Use -j flag for pg_dump/pg_restore
  • Network bandwidth: Consider local backups for large databases
  • Backup windows: Schedule during low-traffic periods
  • Incremental backups: Use WAL archiving between full backups

⚠️ Common Mistakes to Avoid

  • Not testing restores: Backups without restore tests are unreliable
  • Single backup location: Always have off-site copies
  • Ignoring WAL files: Essential for PITR and consistency
  • Insufficient retention: Keep backups long enough for compliance
  • Missing global objects: Remember to backup roles and tablespaces
  • No monitoring: Silently failing backups are useless

Backup Automation Example

Complete Backup Script

#!/bin/bash
# PostgreSQL Backup Script
# Combines pg_dump and pg_basebackup

BACKUP_DIR="/var/backups/postgresql"
DATE=$(date +%Y%m%d_%H%M%S)
RETENTION_DAYS=30

# Logical backup with pg_dump
echo "Starting logical backup..."
pg_dump -U postgres -d production \
  -Fc -f "${BACKUP_DIR}/logical/prod_${DATE}.dump"

# Physical backup with pg_basebackup (weekly)
if [ $(date +%u) -eq 7 ]; then
  echo "Starting weekly physical backup..."
  pg_basebackup -D "${BACKUP_DIR}/physical/${DATE}" -Fp -P
fi

# Compress old backups
find "${BACKUP_DIR}/logical" -name "*.dump" -mtime +7 -exec gzip {} \;

# Delete old backups
find "${BACKUP_DIR}" -mtime +${RETENTION_DAYS} -delete

# Upload to S3 (optional)
aws s3 sync "${BACKUP_DIR}" s3://my-db-backups/ --storage-class GLACIER

echo "Backup completed successfully!"

Conclusion

Both pg_dump and pg_basebackup are essential tools in a PostgreSQL DBA's arsenal. Use pg_dump for flexibility and migrations, and pg_basebackup for speed and disaster recovery.

Quick Decision Guide

Choose pg_dump if you need:

  • Selective table/schema backups
  • Cross-version database migrations
  • Human-readable backup format
  • Development environment refreshes

Choose pg_basebackup if you need:

  • Fast backup of large databases
  • Point-in-Time Recovery (PITR)
  • Standby server setup
  • Complete disaster recovery solution

Remember: The best strategy uses both methods to maximize protection and flexibility!


About the Author

Jamaurice Holt is a Cloud Database Expert and AWS Solutions Architect with over 10 years of experience in database optimization, high availability, and cloud migrations. Specializing in PostgreSQL, SQL optimization, and enterprise database solutions.

Related Articles

Need help with PostgreSQL optimization? Contact us for expert database consulting.

Continue Reading

Leadership

Leadership in Motion: Technical Leadership Lessons from Database Operations

How a decade of database administration shaped my approach to technical leadership — lessons on incident response, mentoring engineers,...

July 10, 2024 · Read article →
Performance

Database Performance Tuning: Lessons from the Trenches

Real-world strategies for optimizing database performance in high-traffic production environments — query plan analysis, indexing, caching,...

August 22, 2024 · Read article →
AI

AI Governance and Hardware: Why Sovereign Refusal Belongs in Silicon

Software guardrails can be prompted around. The real frontier in AI safety is anchoring refusal below the model — at the hardware level. A...

September 15, 2024 · Read article →