PostgreSQL Point-in-Time Recovery (PITR)

Comprehensive postgresql point-in-time recovery (pitr) tutorial with step-by-step instructions for PostgreSQL administrators.

Master PostgreSQL Point-in-Time Recovery (PITR) to protect your data with the ability to restore your database to any specific moment, minimizing data loss from accidents or disasters.


What is Point-in-Time Recovery?

PITR allows you to restore a database to any point in time by combining a base backup with Write-Ahead Log (WAL) archives.

💡 Key Benefit: Unlike regular backups that restore to backup time, PITR can recover to the exact moment before a disaster, minimizing data loss to seconds.

How PITR Works

Base Backup (Sunday 00:00)
    ↓
WAL Archives: 0001 → 0002 → 0003 → 0004 → 0005 → ...
    ↓                                   ↑
    └─────── Replay to target time ─────┘
                   (Tuesday 14:30)

🔧 Prerequisites Setup

Enable WAL Archiving

# postgresql.conf
wal_level = replica                    # Required for PITR
archive_mode = on                      # Enable archiving
archive_command = 'cp %p /archive/%f'  # Archive command
archive_timeout = 300                  # Force archive every 5 minutes

# Optional: Increase WAL retention
max_wal_size = 2GB
min_wal_size = 1GB

⚠️ Important: wal_level = replica (or logical) is required. The default minimal does not support PITR.

Configure Archive Storage

# Create archive directory
sudo mkdir -p /archive
sudo chown postgres:postgres /archive

# For remote storage (recommended)
# Use rsync, S3, or network storage
archive_command = 'rsync -a %p backup_server:/archive/%f'

# For AWS S3
archive_command = 'aws s3 cp %p s3://my-bucket/wal-archive/%f'

Verify Configuration

-- Check WAL level
SHOW wal_level;

-- Check archive status
SHOW archive_mode;
SHOW archive_command;

-- View archive stats
SELECT * FROM pg_stat_archiver;

Creating Base Backups

Using pg_basebackup

# Basic backup
pg_basebackup -D /backups/base_backup -Fp -Xs -P -h localhost -U replication_user

# With compression
pg_basebackup -D /backups/base_backup -Ft -z -Xs -P -h localhost -U replication_user

# Options explained:
# -D: Target directory
# -Fp: Plain format (directory)
# -Ft: Tar format
# -Xs: Stream WAL during backup
# -z: Compress output
# -P: Show progress

Backup with Checkpoint

# Force immediate checkpoint before backup
pg_basebackup -D /backups/base_backup -Fp -Xs -P --checkpoint=fast

Verify Backup

# Check backup label
cat /backups/base_backup/backup_label

# Verify PostgreSQL can start from backup (test only!)
# pg_ctl -D /backups/base_backup start -o "-p 5433"

📊 WAL Archiving Management

Monitor Archive Status

-- Check archiver statistics
SELECT
    archived_count,
    last_archived_wal,
    last_archived_time,
    failed_count,
    last_failed_wal,
    last_failed_time,
    stats_reset
FROM pg_stat_archiver;

-- Calculate WAL generation rate
SELECT
    pg_current_wal_lsn() AS current_lsn,
    pg_walfile_name(pg_current_wal_lsn()) AS current_wal_file;

Verify WAL Continuity

# List archived WAL files
ls -la /archive/ | head -20

# Check for gaps in WAL sequence
ls /archive/*.gz | sort | awk -F'/' '{print $NF}' | \
  awk '{prev=$1; getline; if ($1 != prev+1) print "Gap after: " prev}'

Clean Old Archives

# Find oldest required WAL file
pg_controldata /var/lib/postgresql/data | grep "REDO WAL file"

# Remove WAL files older than base backup
# Use pg_archivecleanup
pg_archivecleanup /archive 000000010000000000000010

🚀 Performing Point-in-Time Recovery

Scenario: Recover from Accidental DELETE

Accidental deletion occurred at 2025-06-15 14:35:00. We want to recover to 14:34:00.

Step 1: Stop PostgreSQL

sudo systemctl stop postgresql

Step 2: Preserve Current Data

# Rename current data directory
mv /var/lib/postgresql/16/main /var/lib/postgresql/16/main_corrupted

Step 3: Restore Base Backup

# Extract base backup
cp -r /backups/base_backup /var/lib/postgresql/16/main

# Or for tar format
mkdir /var/lib/postgresql/16/main
tar -xvf /backups/base_backup.tar -C /var/lib/postgresql/16/main

Step 4: Configure Recovery

Create postgresql.conf (or recovery.conf for PG < 12):

# postgresql.conf - Add these recovery settings
restore_command = 'cp /archive/%f %p'
recovery_target_time = '2025-06-15 14:34:00'
recovery_target_action = 'promote'

Create recovery.signal file:

touch /var/lib/postgresql/16/main/recovery.signal

Step 5: Set Permissions

chown -R postgres:postgres /var/lib/postgresql/16/main
chmod 700 /var/lib/postgresql/16/main

Step 6: Start PostgreSQL

sudo systemctl start postgresql

# Monitor recovery progress
tail -f /var/log/postgresql/postgresql-16-main.log

Step 7: Verify Recovery

-- Check recovery status
SELECT pg_is_in_recovery();

-- Verify data
SELECT COUNT(*) FROM important_table;

-- Check recovery target
SELECT pg_last_wal_replay_lsn(), pg_last_xact_replay_timestamp();

Recovery Target Options

Recovery to Specific Time

recovery_target_time = '2025-06-15 14:34:00'

Recovery to Specific Transaction

recovery_target_xid = '12345678'

Recovery to Named Restore Point

-- Create restore point before risky operation
SELECT pg_create_restore_point('before_migration');
recovery_target_name = 'before_migration'

Recovery to LSN Position

recovery_target_lsn = '0/1A000060'

Recovery Actions

# What to do when recovery target is reached
recovery_target_action = 'pause'    # Pause and allow inspection
recovery_target_action = 'promote'  # Promote to primary
recovery_target_action = 'shutdown' # Shut down cleanly

Advanced Recovery Techniques

Inclusive vs Exclusive Recovery

# Include the target transaction (default)
recovery_target_inclusive = true

# Stop just before target
recovery_target_inclusive = false

Recovery in New Timeline

# Create new timeline after recovery (default)
recovery_target_timeline = 'latest'

# Stay on specific timeline
recovery_target_timeline = '1'

Parallel Recovery

# Speed up recovery with parallel workers (PG 15+)
max_parallel_workers = 8
max_parallel_recovery_workers = 4

🔧 Automated PITR Setup

Backup Script

#!/bin/bash
# /opt/scripts/pitr_backup.sh

BACKUP_DIR="/backups"
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_PATH="${BACKUP_DIR}/base_${DATE}"
RETENTION_DAYS=7

# Create base backup
echo "Starting base backup at $(date)"
pg_basebackup -D "$BACKUP_PATH" -Ft -z -Xs -P \
    -h localhost -U replication_user

if [ $? -eq 0 ]; then
    echo "Backup completed successfully: $BACKUP_PATH"

    # Record backup info
    echo "$DATE" > "${BACKUP_DIR}/latest_backup"

    # Clean old backups
    find "$BACKUP_DIR" -name "base_*" -mtime +${RETENTION_DAYS} -exec rm -rf {} \;

    # Clean old WAL archives
    OLDEST_BACKUP=$(ls -td ${BACKUP_DIR}/base_* 2>/dev/null | tail -1)
    if [ -n "$OLDEST_BACKUP" ]; then
        OLDEST_WAL=$(cat "${OLDEST_BACKUP}/backup_label" | grep "START WAL" | awk '{print $3}')
        pg_archivecleanup /archive "$OLDEST_WAL"
    fi
else
    echo "Backup FAILED!"
    exit 1
fi

Cron Configuration

# Daily base backup at 2 AM
0 2 * * * /opt/scripts/pitr_backup.sh >> /var/log/pitr_backup.log 2>&1

# Verify archive every hour
0 * * * * /opt/scripts/verify_archive.sh >> /var/log/archive_verify.log 2>&1

Archive Verification Script

#!/bin/bash
# /opt/scripts/verify_archive.sh

# Check archive status
FAILED=$(psql -t -c "SELECT failed_count FROM pg_stat_archiver")
if [ "$FAILED" -gt 0 ]; then
    echo "WARNING: Archive failures detected: $FAILED"
    # Send alert
fi

# Check archive lag
LAST_ARCHIVE=$(psql -t -c "SELECT last_archived_time FROM pg_stat_archiver")
LAG_MINUTES=$(psql -t -c "SELECT EXTRACT(EPOCH FROM (now() - '$LAST_ARCHIVE'::timestamp))/60")

if [ "${LAG_MINUTES%.*}" -gt 10 ]; then
    echo "WARNING: Archive lag: $LAG_MINUTES minutes"
    # Send alert
fi

📊 Monitoring PITR Health

Archive Status Dashboard

-- Comprehensive archive health check
SELECT
    'Archive Mode' AS metric,
    current_setting('archive_mode') AS value
UNION ALL
SELECT
    'WAL Level',
    current_setting('wal_level')
UNION ALL
SELECT
    'Archived Count',
    archived_count::text
FROM pg_stat_archiver
UNION ALL
SELECT
    'Failed Count',
    failed_count::text
FROM pg_stat_archiver
UNION ALL
SELECT
    'Last Archived',
    last_archived_time::text
FROM pg_stat_archiver
UNION ALL
SELECT
    'Current WAL',
    pg_walfile_name(pg_current_wal_lsn());

Calculate Recovery Window

-- Estimate recovery capability
WITH archive_info AS (
    SELECT
        pg_walfile_name(pg_current_wal_lsn()) AS current_wal,
        last_archived_wal,
        last_archived_time
    FROM pg_stat_archiver
)
SELECT
    current_wal,
    last_archived_wal,
    last_archived_time,
    now() - last_archived_time AS archive_lag
FROM archive_info;

⚠️ Common Issues and Solutions

Issue 1: Archive Command Failing

-- Check archive status
SELECT * FROM pg_stat_archiver;

Solution:

# Test archive command manually
cp /var/lib/postgresql/16/main/pg_wal/000000010000000000000001 /archive/test

# Check permissions
ls -la /archive

# Check disk space
df -h /archive

Issue 2: WAL Files Accumulating

# Check for archive backlog
ls /var/lib/postgresql/16/main/pg_wal/ | wc -l

# If archive is failing, WAL will accumulate
# Fix archive command first, then:
# Files will be archived automatically

Issue 3: Recovery Not Finding WAL Files

LOG: could not open file "pg_wal/000000010000000000000010": No such file

Solution:

# Ensure restore_command is correct
restore_command = 'cp /archive/%f %p'

# Test restore command manually
cp /archive/000000010000000000000010 /tmp/test

Issue 4: Recovery Target Not Reached

Solution:

# Check target specification
# Ensure time is in correct timezone
recovery_target_time = '2025-06-15 14:34:00 America/New_York'

# Or use UTC explicitly
recovery_target_time = '2025-06-15 18:34:00 UTC'

✅ Best Practices

Backup Strategy

ComponentFrequencyRetention
Base BackupDaily7-30 days
WAL ArchivesContinuousMatch base backups
Full BackupWeekly4-12 weeks

Testing Requirements

  1. Test recovery monthly - Verify backups actually work
  2. Document procedures - Step-by-step recovery runbooks
  3. Train team members - Multiple people should know the process
  4. Time the recovery - Know your RTO (Recovery Time Objective)

Storage Recommendations

# Separate storage for archives
# - Different physical disk
# - Different server (ideally)
# - Cloud storage (S3, GCS, Azure Blob)

# Example S3 archive command
archive_command = 'aws s3 cp %p s3://my-bucket/wal/%f --storage-class STANDARD_IA'

# With wal-g (recommended)
archive_command = 'wal-g wal-push %p'

Quick Reference Commands

# Create base backup
pg_basebackup -D /backup -Fp -Xs -P

# Check archive status
psql -c "SELECT * FROM pg_stat_archiver"

# Create restore point
psql -c "SELECT pg_create_restore_point('my_point')"

# Start recovery
touch /data/recovery.signal
systemctl start postgresql

# Promote from recovery
psql -c "SELECT pg_promote()"

# Clean old archives
pg_archivecleanup /archive 000000010000000000000010

Conclusion

Point-in-Time Recovery is essential for data protection in PostgreSQL. By properly configuring WAL archiving and regularly testing your recovery procedures, you can minimize data loss and recover from almost any disaster scenario.

Key Takeaways

  • ✅ Enable WAL archiving with proper storage
  • ✅ Take regular base backups
  • Test recovery procedures regularly
  • ✅ Monitor archive health continuously
  • ✅ Document and train on recovery procedures

About the Author

Jamaurice Holt is a Cloud Database Expert and AWS Solutions Architect with over 10 years of experience in database optimization, high availability, and cloud migrations. Specializing in PostgreSQL, SQL optimization, and enterprise database solutions.

Related Articles

Need help with PostgreSQL optimization? Contact us for expert database consulting.

Continue Reading

Leadership

Leadership in Motion: Technical Leadership Lessons from Database Operations

How a decade of database administration shaped my approach to technical leadership — lessons on incident response, mentoring engineers,...

July 10, 2024 · Read article →
Performance

Database Performance Tuning: Lessons from the Trenches

Real-world strategies for optimizing database performance in high-traffic production environments — query plan analysis, indexing, caching,...

August 22, 2024 · Read article →
AI

AI Governance and Hardware: Why Sovereign Refusal Belongs in Silicon

Software guardrails can be prompted around. The real frontier in AI safety is anchoring refusal below the model — at the hardware level. A...

September 15, 2024 · Read article →