Name: EVE AI Core
Author: Jamaurice Holt

Understand the PostgreSQL query planner to write better queries, create effective indexes, and optimize database performance through data-driven decisions.

How PostgreSQL Executes Queries

When you submit a query, PostgreSQL goes through several stages:

SQL Query → Parser → Analyzer → Rewriter → Planner → Executor → Results

The planner (also called optimizer) is responsible for finding the most efficient execution strategy by:

Generating possible execution plans
Estimating the cost of each plan
Selecting the plan with the lowest cost

💡 Key Insight: The planner makes decisions based on statistics about your data. Accurate statistics lead to good plans; outdated statistics lead to poor performance.

Understanding EXPLAIN

Basic EXPLAIN

-- Show the execution plan
EXPLAIN SELECT * FROM orders WHERE customer_id = 100;

Output:

Seq Scan on orders  (cost=0.00..1834.00 rows=50 width=68)
  Filter: (customer_id = 100)

EXPLAIN ANALYZE

-- Execute the query and show actual timing
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 100;

Output:

Seq Scan on orders  (cost=0.00..1834.00 rows=50 width=68) (actual time=0.015..15.234 rows=47 loops=1)
  Filter: (customer_id = 100)
  Rows Removed by Filter: 99953
Planning Time: 0.089 ms
Execution Time: 15.267 ms

⚠️ Warning: EXPLAIN ANALYZE actually executes the query. For INSERT/UPDATE/DELETE, wrap in a transaction and rollback.

Full EXPLAIN Options

EXPLAIN (
    ANALYZE,      -- Execute and show actual times
    BUFFERS,      -- Show buffer usage
    COSTS,        -- Show cost estimates (default on)
    TIMING,       -- Show actual timing (default on with ANALYZE)
    VERBOSE,      -- Show additional details
    FORMAT JSON   -- Output format: TEXT, XML, JSON, YAML
)
SELECT * FROM orders WHERE customer_id = 100;

Reading Execution Plans

Cost Estimates

Seq Scan on orders  (cost=0.00..1834.00 rows=50 width=68)
                     └────┬────┘ └───┬──┘ └─┬─┘ └──┬───┘
                   startup   total  rows    row width
                    cost     cost   estimate  (bytes)

Startup cost: Time to return first row
Total cost: Time to return all rows
Rows: Estimated number of rows returned
Width: Average row size in bytes

Actual vs Estimated

EXPLAIN ANALYZE SELECT * FROM orders WHERE status = 'pending';

Seq Scan on orders  (cost=0.00..1834.00 rows=100 width=68) (actual time=0.015..12.456 rows=15234 loops=1)
                                           └──┬──┘                                      └───┬───┘
                                        estimated: 100                               actual: 15,234

⚠️ Problem Indicator: Large differences between estimated and actual rows indicate stale statistics or data skew.

Scan Types

Sequential Scan

Reads every row in the table:

-- Forced when no suitable index exists
EXPLAIN SELECT * FROM orders WHERE notes LIKE '%urgent%';

Seq Scan on orders  (cost=0.00..2334.00 rows=500 width=68)
  Filter: (notes ~~ '%urgent%'::text)

Index Scan

Uses index to find rows, then fetches from table:

-- Efficient for selective queries
EXPLAIN SELECT * FROM orders WHERE customer_id = 100;

Index Scan using idx_orders_customer on orders  (cost=0.29..8.31 rows=1 width=68)
  Index Cond: (customer_id = 100)

Index Only Scan

Returns data directly from index without table access:

-- Requires covering index and recent VACUUM
EXPLAIN SELECT customer_id FROM orders WHERE customer_id = 100;

Index Only Scan using idx_orders_customer on orders  (cost=0.29..4.31 rows=1 width=4)
  Index Cond: (customer_id = 100)
  Heap Fetches: 0  -- No table access needed!

Bitmap Scans

Combines multiple index conditions:

EXPLAIN SELECT * FROM orders
WHERE customer_id = 100 OR status = 'pending';

Bitmap Heap Scan on orders  (cost=9.50..55.76 rows=15 width=68)
  Recheck Cond: ((customer_id = 100) OR (status = 'pending'::text))
  ->  BitmapOr  (cost=9.50..9.50 rows=15 width=0)
        ->  Bitmap Index Scan on idx_orders_customer  (cost=0.00..4.43 rows=5 width=0)
              Index Cond: (customer_id = 100)
        ->  Bitmap Index Scan on idx_orders_status  (cost=0.00..5.03 rows=10 width=0)
              Index Cond: (status = 'pending'::text)

Join Strategies

Nested Loop Join

Best for small tables or highly selective joins:

EXPLAIN SELECT o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE c.id = 100;

Nested Loop  (cost=0.57..16.62 rows=5 width=136)
  ->  Index Scan using customers_pkey on customers c  (cost=0.28..8.29 rows=1 width=68)
        Index Cond: (id = 100)
  ->  Index Scan using idx_orders_customer on orders o  (cost=0.29..8.31 rows=5 width=68)
        Index Cond: (customer_id = 100)

Hash Join

Builds hash table from smaller table, probes with larger:

EXPLAIN SELECT o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id;

Hash Join  (cost=3.25..1859.25 rows=100000 width=136)
  Hash Cond: (o.customer_id = c.id)
  ->  Seq Scan on orders o  (cost=0.00..1334.00 rows=100000 width=68)
  ->  Hash  (cost=2.00..2.00 rows=100 width=68)
        ->  Seq Scan on customers c  (cost=0.00..2.00 rows=100 width=68)

Merge Join

Efficient when both inputs are sorted:

EXPLAIN SELECT o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id
ORDER BY o.customer_id;

Merge Join  (cost=0.57..4537.82 rows=100000 width=136)
  Merge Cond: (o.customer_id = c.id)
  ->  Index Scan using idx_orders_customer on orders o  (cost=0.29..3085.29 rows=100000 width=68)
  ->  Index Scan using customers_pkey on customers c  (cost=0.28..12.78 rows=100 width=68)

📊 Statistics and the Planner

How Statistics Work

PostgreSQL collects statistics via ANALYZE:

-- Analyze a specific table
ANALYZE orders;

-- Analyze entire database
ANALYZE;

-- View statistics
SELECT
    attname,
    n_distinct,
    most_common_vals,
    most_common_freqs,
    histogram_bounds
FROM pg_stats
WHERE tablename = 'orders' AND attname = 'status';

Key Statistics

Statistic	Purpose
n_distinct	Number of distinct values
most_common_vals	Most frequent values
most_common_freqs	Frequencies of common values
histogram_bounds	Distribution of remaining values
correlation	Physical vs logical ordering

Improving Statistics

-- Increase statistics target for better estimates
ALTER TABLE orders ALTER COLUMN status SET STATISTICS 1000;
ANALYZE orders;

-- Default is 100; max is 10000
-- Higher = more accurate but slower ANALYZE

🔧 Planner Configuration

Cost Parameters

-- View current settings
SHOW seq_page_cost;      -- Cost of sequential page read (1.0)
SHOW random_page_cost;   -- Cost of random page read (4.0)
SHOW cpu_tuple_cost;     -- CPU cost per tuple (0.01)
SHOW cpu_index_tuple_cost; -- CPU cost per index tuple (0.005)
SHOW cpu_operator_cost;  -- CPU cost per operator (0.0025)
SHOW effective_cache_size; -- Estimate of OS cache size

Adjusting for SSD

-- SSDs have lower random I/O cost
SET random_page_cost = 1.1;  -- Much lower than default 4.0

-- Permanent change in postgresql.conf
-- random_page_cost = 1.1

Enabling/Disabling Plan Types

-- Disable sequential scans (for testing)
SET enable_seqscan = off;

-- Disable hash joins
SET enable_hashjoin = off;

-- These are useful for understanding why planner chose a particular plan
-- Don't use in production permanently!

Common Plan Problems

Problem 1: Wrong Row Estimates

EXPLAIN ANALYZE SELECT * FROM orders WHERE status = 'rare_status';

Seq Scan on orders  (cost=0.00..1834.00 rows=10000 width=68) (actual time=0.015..12.456 rows=5 loops=1)
                                        └──┬───┘                                       └┬┘
                                      estimated: 10,000                           actual: 5

Solution: Update statistics or increase statistics target:

ANALYZE orders;
-- Or for skewed distributions:
ALTER TABLE orders ALTER COLUMN status SET STATISTICS 1000;
ANALYZE orders;

Problem 2: Index Not Used

EXPLAIN SELECT * FROM orders WHERE customer_id = 100;
-- Shows Seq Scan despite index existing

Possible causes and solutions:

-- 1. Stale statistics
ANALYZE orders;

-- 2. Table is too small (index overhead not worth it)
-- This is often correct behavior

-- 3. Low selectivity (too many matching rows)
-- Index only helps for selective queries

-- 4. Type mismatch
-- Check column type vs query constant type
SELECT * FROM orders WHERE customer_id = '100';  -- String vs integer?

-- 5. Function on indexed column
-- Bad: WHERE LOWER(email) = '[email protected]'
-- Good: WHERE email = '[email protected]' (use CITEXT)
-- Good: CREATE INDEX idx_email_lower ON users (LOWER(email));

Problem 3: Nested Loop with Large Tables

EXPLAIN ANALYZE SELECT * FROM orders o JOIN products p ON o.product_id = p.id;

Nested Loop  (cost=0.29..250000.29 rows=100000 width=136) (actual time=0.015..850.234 rows=100000 loops=1)

Solution: Encourage hash or merge join:

-- Increase work_mem for hash joins
SET work_mem = '256MB';

-- Re-run query
EXPLAIN ANALYZE SELECT * FROM orders o JOIN products p ON o.product_id = p.id;

🚀 Advanced Analysis

Using BUFFERS

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE customer_id = 100;

Index Scan using idx_orders_customer on orders  (cost=0.29..8.31 rows=1 width=68) (actual time=0.015..0.016 rows=1 loops=1)
  Index Cond: (customer_id = 100)
  Buffers: shared hit=3  -- All from cache
Planning Time: 0.089 ms
Execution Time: 0.032 ms

Buffer terminology:

shared hit: Pages found in shared buffer cache
shared read: Pages read from disk
shared written: Pages written (for updates)

Using auto_explain

-- Log plans for slow queries automatically
LOAD 'auto_explain';
SET auto_explain.log_min_duration = '100ms';
SET auto_explain.log_analyze = true;
SET auto_explain.log_buffers = true;

Optimization Strategies

Strategy 1: Create Appropriate Indexes

-- Analyze query patterns first
EXPLAIN ANALYZE SELECT * FROM orders
WHERE customer_id = 100 AND status = 'pending'
ORDER BY created_at DESC;

-- Create composite index matching query
CREATE INDEX idx_orders_customer_status_date
ON orders (customer_id, status, created_at DESC);

Strategy 2: Rewrite Queries

-- Bad: Function prevents index use
SELECT * FROM users WHERE YEAR(created_at) = 2025;

-- Good: Range query uses index
SELECT * FROM users
WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01';

Strategy 3: Use CTEs Wisely

-- PostgreSQL 12+: CTEs can be inlined (not optimization fences)
-- Use MATERIALIZED to prevent inlining if needed

WITH cte AS MATERIALIZED (
    SELECT * FROM orders WHERE status = 'pending'
)
SELECT * FROM cte WHERE customer_id = 100;

Strategy 4: Partition Large Tables

-- Partition pruning eliminates irrelevant partitions
EXPLAIN SELECT * FROM orders_partitioned
WHERE order_date = '2025-06-15';

Append  (cost=0.00..21.50 rows=5 width=40)
  Subplans Removed: 11  -- Only relevant partition scanned
  ->  Seq Scan on orders_2025_06  (cost=0.00..21.50 rows=5 width=40)
        Filter: (order_date = '2025-06-15'::date)

📊 Monitoring Plan Quality

Create Baseline Queries

-- Track query performance over time
CREATE TABLE query_baselines (
    query_hash TEXT,
    query_text TEXT,
    execution_time_ms NUMERIC,
    rows_returned INTEGER,
    plan_hash TEXT,
    captured_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Monitor for Plan Regressions

-- Find queries with degraded performance
SELECT
    query,
    calls,
    mean_exec_time / 1000 AS avg_seconds,
    stddev_exec_time / 1000 AS stddev_seconds
FROM pg_stat_statements
WHERE mean_exec_time > 1000  -- > 1 second
ORDER BY total_exec_time DESC
LIMIT 20;

✅ Best Practices

For DBAs

Run ANALYZE regularly - Enable autovacuum and autoanalyze
Monitor query plans - Use pg_stat_statements
Tune cost parameters - Adjust for SSD, memory, etc.
Increase statistics targets - For skewed columns
Test plan changes - Before production deployment

For Developers

Always use EXPLAIN ANALYZE - Verify actual vs estimated
Check for index usage - Ensure indexes are being used
Avoid functions on indexed columns - Prevents index usage
Use parameterized queries - Enable plan caching
Test with production-like data - Small datasets give different plans

Common EXPLAIN Patterns Cheatsheet

Pattern	Meaning	Action
Seq Scan on large table	Missing index or low selectivity	Add index or refine query
Nested Loop with large outer	Wrong join type	Increase work_mem
Bitmap Heap Scan with many Recheck	Index not selective	Review index or query
Sort with high cost	Missing index for ORDER BY	Add sorted index
Hash with high memory	Large hash table	Increase work_mem
rows=X vs actual rows=Y (big diff)	Stale statistics	Run ANALYZE

Conclusion

Understanding the PostgreSQL query planner is essential for database optimization. By learning to read execution plans and understanding how the planner makes decisions, you can write better queries, create effective indexes, and maintain high-performance databases.

Key Takeaways

✅ Use EXPLAIN ANALYZE to see actual execution
✅ Keep statistics updated with ANALYZE
✅ Match indexes to query patterns
✅ Tune cost parameters for your hardware
✅ Monitor for plan regressions over time

About the Author

Jamaurice Holt is a Cloud Database Expert and AWS Solutions Architect with over 10 years of experience in database optimization, high availability, and cloud migrations. Specializing in PostgreSQL, SQL optimization, and enterprise database solutions.

Need help with PostgreSQL optimization? Contact us for expert database consulting.