PostgreSQL Monitoring with pg_stat

Comprehensive postgresql monitoring with pg_stat tutorial with step-by-step instructions for PostgreSQL administrators.

Master PostgreSQL's built-in statistics views to monitor database health, identify performance bottlenecks, and proactively prevent issues before they impact users.


Overview of pg_stat Views

PostgreSQL provides extensive statistics collectors that gather information about database activity. These statistics are exposed through pg_stat_* and pg_statio_* views.

📊 Key Insight: The statistics collector runs as a separate process and updates statistics asynchronously. Data may be slightly delayed but provides accurate trends.

Statistics Categories

CategoryViewsPurpose
Databasepg_stat_databaseOverall database activity
Tablespg_stat_user_tablesTable access patterns
Indexespg_stat_user_indexesIndex usage
I/Opg_statio_user_tablesBuffer cache hits/misses
Backgroundpg_stat_bgwriterBackground writer stats
Replicationpg_stat_replicationReplication lag
Activitypg_stat_activityCurrent connections

📊 Database-Level Monitoring

pg_stat_database

Monitor overall database health:

SELECT
    datname AS database,
    numbackends AS connections,
    xact_commit AS commits,
    xact_rollback AS rollbacks,
    blks_read,
    blks_hit,
    ROUND(100.0 * blks_hit / NULLIF(blks_hit + blks_read, 0), 2) AS cache_hit_ratio,
    tup_returned,
    tup_fetched,
    tup_inserted,
    tup_updated,
    tup_deleted,
    conflicts,
    deadlocks,
    temp_files,
    pg_size_pretty(temp_bytes) AS temp_bytes,
    stats_reset
FROM pg_stat_database
WHERE datname = current_database();

Key Metrics to Watch

-- Cache hit ratio (should be > 99%)
SELECT
    datname,
    ROUND(100.0 * blks_hit / NULLIF(blks_hit + blks_read, 0), 2) AS cache_hit_ratio
FROM pg_stat_database
WHERE datname NOT LIKE 'template%';

-- Transaction commit/rollback ratio
SELECT
    datname,
    xact_commit,
    xact_rollback,
    ROUND(100.0 * xact_rollback / NULLIF(xact_commit + xact_rollback, 0), 2) AS rollback_pct
FROM pg_stat_database
WHERE datname NOT LIKE 'template%';

-- Deadlock count (should be 0 or very low)
SELECT datname, deadlocks
FROM pg_stat_database
WHERE deadlocks > 0;

⚠️ Alert Thresholds:

  • Cache hit ratio < 95%: Need more shared_buffers or query optimization
  • Rollback ratio > 5%: Application issues
  • Deadlocks > 0: Lock contention problems

📋 Table Statistics

pg_stat_user_tables

Analyze table access patterns:

SELECT
    schemaname,
    relname AS table_name,
    seq_scan,
    seq_tup_read,
    idx_scan,
    idx_tup_fetch,
    n_tup_ins AS inserts,
    n_tup_upd AS updates,
    n_tup_del AS deletes,
    n_tup_hot_upd AS hot_updates,
    n_live_tup AS live_rows,
    n_dead_tup AS dead_rows,
    ROUND(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 2) AS dead_row_pct,
    last_vacuum,
    last_autovacuum,
    last_analyze,
    last_autoanalyze,
    vacuum_count,
    autovacuum_count
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;

Tables Needing Indexes

-- Tables with high sequential scan ratio (candidates for indexes)
SELECT
    schemaname || '.' || relname AS table_name,
    seq_scan,
    idx_scan,
    ROUND(100.0 * seq_scan / NULLIF(seq_scan + idx_scan, 0), 2) AS seq_scan_pct,
    n_live_tup AS rows
FROM pg_stat_user_tables
WHERE seq_scan > 1000
  AND n_live_tup > 10000
  AND ROUND(100.0 * seq_scan / NULLIF(seq_scan + idx_scan, 0), 2) > 50
ORDER BY seq_scan DESC;

Tables Needing Vacuum

-- Tables with high dead tuple ratio
SELECT
    schemaname || '.' || relname AS table_name,
    n_live_tup,
    n_dead_tup,
    ROUND(100.0 * n_dead_tup / NULLIF(n_live_tup, 0), 2) AS dead_pct,
    last_autovacuum,
    last_vacuum
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC
LIMIT 20;

HOT Update Ratio

-- HOT updates ratio (higher is better, reduces index maintenance)
SELECT
    schemaname || '.' || relname AS table_name,
    n_tup_upd AS total_updates,
    n_tup_hot_upd AS hot_updates,
    ROUND(100.0 * n_tup_hot_upd / NULLIF(n_tup_upd, 0), 2) AS hot_ratio
FROM pg_stat_user_tables
WHERE n_tup_upd > 1000
ORDER BY n_tup_upd DESC;

💡 HOT Updates: Heap-Only Tuple updates avoid index updates when indexed columns aren't modified. A low HOT ratio may indicate need for FILLFACTOR tuning.


🔍 Index Statistics

pg_stat_user_indexes

Monitor index usage:

SELECT
    schemaname || '.' || relname AS table_name,
    indexrelname AS index_name,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;

Unused Indexes

-- Indexes that haven't been used (candidates for removal)
SELECT
    schemaname || '.' || relname AS table_name,
    indexrelname AS index_name,
    idx_scan,
    pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexrelname NOT LIKE '%_pkey'
  AND indexrelname NOT LIKE '%_unique%'
ORDER BY pg_relation_size(indexrelid) DESC;

Duplicate Indexes

-- Find potentially duplicate indexes
SELECT
    pg_size_pretty(sum(pg_relation_size(idx))::bigint) AS size,
    (array_agg(idx))[1] AS idx1,
    (array_agg(idx))[2] AS idx2,
    (array_agg(idx))[3] AS idx3,
    (array_agg(idx))[4] AS idx4
FROM (
    SELECT
        indexrelid::regclass AS idx,
        (indrelid::text || E'\n' || indclass::text || E'\n' ||
         indkey::text || E'\n' || coalesce(indexprs::text, '') ||
         E'\n' || coalesce(indpred::text, '')) AS key
    FROM pg_index
) sub
GROUP BY key
HAVING count(*) > 1
ORDER BY sum(pg_relation_size(idx)) DESC;

🔄 I/O Statistics

pg_statio_user_tables

Track buffer cache effectiveness:

SELECT
    schemaname || '.' || relname AS table_name,
    heap_blks_read,
    heap_blks_hit,
    ROUND(100.0 * heap_blks_hit / NULLIF(heap_blks_hit + heap_blks_read, 0), 2) AS table_hit_ratio,
    idx_blks_read,
    idx_blks_hit,
    ROUND(100.0 * idx_blks_hit / NULLIF(idx_blks_hit + idx_blks_read, 0), 2) AS index_hit_ratio,
    toast_blks_read,
    toast_blks_hit
FROM pg_statio_user_tables
ORDER BY heap_blks_read DESC;

Tables with Poor Cache Performance

-- Tables with low cache hit ratio
SELECT
    schemaname || '.' || relname AS table_name,
    heap_blks_read AS disk_reads,
    heap_blks_hit AS cache_hits,
    ROUND(100.0 * heap_blks_hit / NULLIF(heap_blks_hit + heap_blks_read, 0), 2) AS hit_ratio
FROM pg_statio_user_tables
WHERE heap_blks_read > 1000
  AND ROUND(100.0 * heap_blks_hit / NULLIF(heap_blks_hit + heap_blks_read, 0), 2) < 95
ORDER BY heap_blks_read DESC;

👥 Connection Monitoring

pg_stat_activity

Monitor active connections:

-- Current connection summary
SELECT
    state,
    COUNT(*) AS count,
    MAX(EXTRACT(EPOCH FROM (now() - query_start)))::int AS max_duration_sec
FROM pg_stat_activity
WHERE pid != pg_backend_pid()
GROUP BY state;

-- Active queries with duration
SELECT
    pid,
    usename,
    datname,
    state,
    query_start,
    EXTRACT(EPOCH FROM (now() - query_start))::int AS duration_sec,
    wait_event_type,
    wait_event,
    LEFT(query, 100) AS query_preview
FROM pg_stat_activity
WHERE state = 'active'
  AND pid != pg_backend_pid()
ORDER BY query_start;

Long-Running Queries

-- Queries running longer than 5 minutes
SELECT
    pid,
    usename,
    datname,
    state,
    EXTRACT(EPOCH FROM (now() - query_start))::int AS duration_sec,
    wait_event_type,
    wait_event,
    query
FROM pg_stat_activity
WHERE state != 'idle'
  AND query_start < now() - INTERVAL '5 minutes'
  AND pid != pg_backend_pid()
ORDER BY query_start;

Blocked Queries

-- Find blocked and blocking queries
SELECT
    blocked.pid AS blocked_pid,
    blocked.usename AS blocked_user,
    blocking.pid AS blocking_pid,
    blocking.usename AS blocking_user,
    blocked.query AS blocked_query,
    blocking.query AS blocking_query
FROM pg_stat_activity blocked
JOIN pg_stat_activity blocking ON blocking.pid = ANY(pg_blocking_pids(blocked.pid))
WHERE cardinality(pg_blocking_pids(blocked.pid)) > 0;

Idle Connections

-- Idle connections holding resources
SELECT
    pid,
    usename,
    datname,
    state,
    state_change,
    EXTRACT(EPOCH FROM (now() - state_change))::int AS idle_seconds,
    query
FROM pg_stat_activity
WHERE state = 'idle'
  AND state_change < now() - INTERVAL '10 minutes'
ORDER BY state_change;

🔧 Background Writer Statistics

pg_stat_bgwriter

Monitor background writer and checkpoint activity:

SELECT
    checkpoints_timed,
    checkpoints_req,
    checkpoint_write_time / 1000 AS checkpoint_write_sec,
    checkpoint_sync_time / 1000 AS checkpoint_sync_sec,
    buffers_checkpoint,
    buffers_clean,
    maxwritten_clean,
    buffers_backend,
    buffers_backend_fsync,
    buffers_alloc,
    stats_reset
FROM pg_stat_bgwriter;

Checkpoint Analysis

-- Checkpoint frequency and performance
SELECT
    checkpoints_timed AS scheduled,
    checkpoints_req AS requested,
    ROUND(100.0 * checkpoints_req / NULLIF(checkpoints_timed + checkpoints_req, 0), 2) AS pct_requested,
    ROUND(checkpoint_write_time / NULLIF(checkpoints_timed + checkpoints_req, 0) / 1000, 2) AS avg_write_sec,
    ROUND(checkpoint_sync_time / NULLIF(checkpoints_timed + checkpoints_req, 0) / 1000, 2) AS avg_sync_sec
FROM pg_stat_bgwriter;

⚠️ Warning Signs:

  • High checkpoints_req: checkpoint_completion_target too low or heavy write load
  • maxwritten_clean > 0: bgwriter not keeping up
  • buffers_backend_fsync > 0: Backends doing fsync (bad for performance)

📡 Replication Monitoring

pg_stat_replication

Monitor streaming replication:

-- Replication status on primary
SELECT
    pid,
    usename,
    application_name,
    client_addr,
    state,
    sync_state,
    sent_lsn,
    write_lsn,
    flush_lsn,
    replay_lsn,
    pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replication_lag_bytes,
    pg_size_pretty(pg_wal_lsn_diff(sent_lsn, replay_lsn)) AS lag_pretty,
    write_lag,
    flush_lag,
    replay_lag
FROM pg_stat_replication;

Replication Lag Alert

-- Check if replication lag exceeds threshold
SELECT
    application_name,
    client_addr,
    pg_wal_lsn_diff(sent_lsn, replay_lsn) AS lag_bytes,
    CASE
        WHEN pg_wal_lsn_diff(sent_lsn, replay_lsn) > 100000000 THEN 'CRITICAL'
        WHEN pg_wal_lsn_diff(sent_lsn, replay_lsn) > 10000000 THEN 'WARNING'
        ELSE 'OK'
    END AS status
FROM pg_stat_replication;

🚀 Query Performance Statistics

pg_stat_statements

Track query performance (requires extension):

-- Enable extension
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Top queries by total time
SELECT
    LEFT(query, 80) AS query_preview,
    calls,
    ROUND(total_exec_time::numeric, 2) AS total_ms,
    ROUND(mean_exec_time::numeric, 2) AS avg_ms,
    ROUND(stddev_exec_time::numeric, 2) AS stddev_ms,
    rows,
    shared_blks_hit,
    shared_blks_read,
    ROUND(100.0 * shared_blks_hit / NULLIF(shared_blks_hit + shared_blks_read, 0), 2) AS cache_hit_pct
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

Slowest Queries

-- Queries with highest average execution time
SELECT
    LEFT(query, 100) AS query,
    calls,
    ROUND(mean_exec_time::numeric, 2) AS avg_ms,
    ROUND(min_exec_time::numeric, 2) AS min_ms,
    ROUND(max_exec_time::numeric, 2) AS max_ms,
    rows / NULLIF(calls, 0) AS avg_rows
FROM pg_stat_statements
WHERE calls > 10
ORDER BY mean_exec_time DESC
LIMIT 20;

Most I/O Intensive Queries

-- Queries with most disk reads
SELECT
    LEFT(query, 80) AS query,
    calls,
    shared_blks_read AS disk_reads,
    shared_blks_hit AS cache_hits,
    ROUND(100.0 * shared_blks_read / NULLIF(shared_blks_read + shared_blks_hit, 0), 2) AS disk_read_pct,
    temp_blks_read,
    temp_blks_written
FROM pg_stat_statements
WHERE shared_blks_read > 1000
ORDER BY shared_blks_read DESC
LIMIT 20;

📈 Creating a Monitoring Dashboard

Comprehensive Health Check Query

-- Database health snapshot
WITH db_stats AS (
    SELECT
        ROUND(100.0 * blks_hit / NULLIF(blks_hit + blks_read, 0), 2) AS cache_hit_ratio,
        deadlocks,
        conflicts,
        temp_bytes
    FROM pg_stat_database
    WHERE datname = current_database()
),
connection_stats AS (
    SELECT
        COUNT(*) AS total_connections,
        COUNT(*) FILTER (WHERE state = 'active') AS active,
        COUNT(*) FILTER (WHERE state = 'idle') AS idle,
        COUNT(*) FILTER (WHERE state = 'idle in transaction') AS idle_in_tx,
        MAX(EXTRACT(EPOCH FROM (now() - query_start))) FILTER (WHERE state = 'active') AS longest_query_sec
    FROM pg_stat_activity
    WHERE pid != pg_backend_pid()
),
table_stats AS (
    SELECT
        SUM(n_dead_tup) AS total_dead_tuples,
        COUNT(*) FILTER (WHERE n_dead_tup > 10000) AS tables_need_vacuum
    FROM pg_stat_user_tables
)
SELECT
    ds.cache_hit_ratio,
    ds.deadlocks,
    ds.conflicts,
    pg_size_pretty(ds.temp_bytes) AS temp_space_used,
    cs.total_connections,
    cs.active AS active_queries,
    cs.idle AS idle_connections,
    cs.idle_in_tx AS idle_in_transaction,
    ROUND(cs.longest_query_sec::numeric, 2) AS longest_query_sec,
    ts.total_dead_tuples,
    ts.tables_need_vacuum
FROM db_stats ds, connection_stats cs, table_stats ts;

Automated Alert Queries

-- Create function for health alerts
CREATE OR REPLACE FUNCTION get_health_alerts()
RETURNS TABLE(
    alert_type TEXT,
    severity TEXT,
    message TEXT
) AS $$
BEGIN
    -- Cache hit ratio alert
    RETURN QUERY
    SELECT
        'CACHE_HIT'::TEXT,
        CASE WHEN ratio < 90 THEN 'CRITICAL' ELSE 'WARNING' END,
        'Cache hit ratio: ' || ratio || '%'
    FROM (
        SELECT ROUND(100.0 * blks_hit / NULLIF(blks_hit + blks_read, 0), 2) AS ratio
        FROM pg_stat_database WHERE datname = current_database()
    ) sub
    WHERE ratio < 95;

    -- Long running queries alert
    RETURN QUERY
    SELECT
        'LONG_QUERY'::TEXT,
        'WARNING'::TEXT,
        'Query running for ' || duration || ' seconds: ' || LEFT(query, 50)
    FROM (
        SELECT
            EXTRACT(EPOCH FROM (now() - query_start))::int AS duration,
            query
        FROM pg_stat_activity
        WHERE state = 'active'
          AND query_start < now() - INTERVAL '5 minutes'
    ) sub;

    -- Deadlock alert
    RETURN QUERY
    SELECT
        'DEADLOCKS'::TEXT,
        'CRITICAL'::TEXT,
        'Deadlocks detected: ' || deadlocks
    FROM pg_stat_database
    WHERE datname = current_database()
      AND deadlocks > 0;

    -- Bloat alert
    RETURN QUERY
    SELECT
        'TABLE_BLOAT'::TEXT,
        'WARNING'::TEXT,
        'Table ' || relname || ' has ' || n_dead_tup || ' dead tuples'
    FROM pg_stat_user_tables
    WHERE n_dead_tup > 100000;

END;
$$ LANGUAGE plpgsql;

-- Run alerts
SELECT * FROM get_health_alerts();

🔄 Resetting Statistics

-- Reset statistics for specific table
SELECT pg_stat_reset_single_table_counters('tablename'::regclass);

-- Reset all database statistics
SELECT pg_stat_reset();

-- Reset background writer statistics
SELECT pg_stat_reset_shared('bgwriter');

-- Reset pg_stat_statements
SELECT pg_stat_statements_reset();

⚠️ Caution: Resetting statistics loses historical data. Consider exporting important metrics before reset.


✅ Best Practices

Monitoring Checklist

MetricTargetFrequency
Cache hit ratio> 99%Every 5 min
Dead tuple ratio< 10%Every 15 min
Replication lag< 1MBEvery 1 min
Long queries< 5 minEvery 1 min
Deadlocks0Every 5 min
Connection count< 80% maxEvery 1 min

Recommended Tools

  1. pg_stat_statements - Query performance tracking
  2. auto_explain - Automatic query plan logging
  3. pgBadger - Log analysis and reporting
  4. Prometheus + Grafana - Real-time dashboards
  5. PgHero - Quick database insights

Conclusion

PostgreSQL's built-in statistics views provide comprehensive monitoring capabilities without external tools. By understanding and regularly checking these views, you can proactively identify and resolve performance issues.

Key Takeaways

  • ✅ Monitor cache hit ratio (target > 99%)
  • ✅ Track dead tuple ratio for vacuum needs
  • ✅ Watch replication lag on replicas
  • ✅ Identify unused indexes for cleanup
  • ✅ Set up automated alerts for critical metrics

About the Author

Jamaurice Holt is a Cloud Database Expert and AWS Solutions Architect with over 10 years of experience in database optimization, high availability, and cloud migrations. Specializing in PostgreSQL, SQL optimization, and enterprise database solutions.

Related Articles

Need help with PostgreSQL optimization? Contact us for expert database consulting.

Continue Reading

Leadership

Leadership in Motion: Technical Leadership Lessons from Database Operations

How a decade of database administration shaped my approach to technical leadership — lessons on incident response, mentoring engineers,...

July 10, 2024 · Read article →
Performance

Database Performance Tuning: Lessons from the Trenches

Real-world strategies for optimizing database performance in high-traffic production environments — query plan analysis, indexing, caching,...

August 22, 2024 · Read article →
AI

AI Governance and Hardware: Why Sovereign Refusal Belongs in Silicon

Software guardrails can be prompted around. The real frontier in AI safety is anchoring refusal below the model — at the hardware level. A...

September 15, 2024 · Read article →