Understand the PostgreSQL query planner to write better queries, create effective indexes, and optimize database performance through data-driven decisions.
How PostgreSQL Executes Queries
When you submit a query, PostgreSQL goes through several stages:
SQL Query → Parser → Analyzer → Rewriter → Planner → Executor → Results
The planner (also called optimizer) is responsible for finding the most efficient execution strategy by:
- Generating possible execution plans
- Estimating the cost of each plan
- Selecting the plan with the lowest cost
💡 Key Insight: The planner makes decisions based on statistics about your data. Accurate statistics lead to good plans; outdated statistics lead to poor performance.
Understanding EXPLAIN
Basic EXPLAIN
-- Show the execution plan
EXPLAIN SELECT * FROM orders WHERE customer_id = 100;
Output:
Seq Scan on orders (cost=0.00..1834.00 rows=50 width=68)
Filter: (customer_id = 100)
EXPLAIN ANALYZE
-- Execute the query and show actual timing
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 100;
Output:
Seq Scan on orders (cost=0.00..1834.00 rows=50 width=68) (actual time=0.015..15.234 rows=47 loops=1)
Filter: (customer_id = 100)
Rows Removed by Filter: 99953
Planning Time: 0.089 ms
Execution Time: 15.267 ms
⚠️ Warning:
EXPLAIN ANALYZEactually executes the query. For INSERT/UPDATE/DELETE, wrap in a transaction and rollback.
Full EXPLAIN Options
EXPLAIN (
ANALYZE, -- Execute and show actual times
BUFFERS, -- Show buffer usage
COSTS, -- Show cost estimates (default on)
TIMING, -- Show actual timing (default on with ANALYZE)
VERBOSE, -- Show additional details
FORMAT JSON -- Output format: TEXT, XML, JSON, YAML
)
SELECT * FROM orders WHERE customer_id = 100;
Reading Execution Plans
Cost Estimates
Seq Scan on orders (cost=0.00..1834.00 rows=50 width=68)
└────┬────┘ └───┬──┘ └─┬─┘ └──┬───┘
startup total rows row width
cost cost estimate (bytes)
- Startup cost: Time to return first row
- Total cost: Time to return all rows
- Rows: Estimated number of rows returned
- Width: Average row size in bytes
Actual vs Estimated
EXPLAIN ANALYZE SELECT * FROM orders WHERE status = 'pending';
Seq Scan on orders (cost=0.00..1834.00 rows=100 width=68) (actual time=0.015..12.456 rows=15234 loops=1)
└──┬──┘ └───┬───┘
estimated: 100 actual: 15,234
⚠️ Problem Indicator: Large differences between estimated and actual rows indicate stale statistics or data skew.
Scan Types
Sequential Scan
Reads every row in the table:
-- Forced when no suitable index exists
EXPLAIN SELECT * FROM orders WHERE notes LIKE '%urgent%';
Seq Scan on orders (cost=0.00..2334.00 rows=500 width=68)
Filter: (notes ~~ '%urgent%'::text)
Index Scan
Uses index to find rows, then fetches from table:
-- Efficient for selective queries
EXPLAIN SELECT * FROM orders WHERE customer_id = 100;
Index Scan using idx_orders_customer on orders (cost=0.29..8.31 rows=1 width=68)
Index Cond: (customer_id = 100)
Index Only Scan
Returns data directly from index without table access:
-- Requires covering index and recent VACUUM
EXPLAIN SELECT customer_id FROM orders WHERE customer_id = 100;
Index Only Scan using idx_orders_customer on orders (cost=0.29..4.31 rows=1 width=4)
Index Cond: (customer_id = 100)
Heap Fetches: 0 -- No table access needed!
Bitmap Scans
Combines multiple index conditions:
EXPLAIN SELECT * FROM orders
WHERE customer_id = 100 OR status = 'pending';
Bitmap Heap Scan on orders (cost=9.50..55.76 rows=15 width=68)
Recheck Cond: ((customer_id = 100) OR (status = 'pending'::text))
-> BitmapOr (cost=9.50..9.50 rows=15 width=0)
-> Bitmap Index Scan on idx_orders_customer (cost=0.00..4.43 rows=5 width=0)
Index Cond: (customer_id = 100)
-> Bitmap Index Scan on idx_orders_status (cost=0.00..5.03 rows=10 width=0)
Index Cond: (status = 'pending'::text)
Join Strategies
Nested Loop Join
Best for small tables or highly selective joins:
EXPLAIN SELECT o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE c.id = 100;
Nested Loop (cost=0.57..16.62 rows=5 width=136)
-> Index Scan using customers_pkey on customers c (cost=0.28..8.29 rows=1 width=68)
Index Cond: (id = 100)
-> Index Scan using idx_orders_customer on orders o (cost=0.29..8.31 rows=5 width=68)
Index Cond: (customer_id = 100)
Hash Join
Builds hash table from smaller table, probes with larger:
EXPLAIN SELECT o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id;
Hash Join (cost=3.25..1859.25 rows=100000 width=136)
Hash Cond: (o.customer_id = c.id)
-> Seq Scan on orders o (cost=0.00..1334.00 rows=100000 width=68)
-> Hash (cost=2.00..2.00 rows=100 width=68)
-> Seq Scan on customers c (cost=0.00..2.00 rows=100 width=68)
Merge Join
Efficient when both inputs are sorted:
EXPLAIN SELECT o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id
ORDER BY o.customer_id;
Merge Join (cost=0.57..4537.82 rows=100000 width=136)
Merge Cond: (o.customer_id = c.id)
-> Index Scan using idx_orders_customer on orders o (cost=0.29..3085.29 rows=100000 width=68)
-> Index Scan using customers_pkey on customers c (cost=0.28..12.78 rows=100 width=68)
📊 Statistics and the Planner
How Statistics Work
PostgreSQL collects statistics via ANALYZE:
-- Analyze a specific table
ANALYZE orders;
-- Analyze entire database
ANALYZE;
-- View statistics
SELECT
attname,
n_distinct,
most_common_vals,
most_common_freqs,
histogram_bounds
FROM pg_stats
WHERE tablename = 'orders' AND attname = 'status';
Key Statistics
| Statistic | Purpose |
|---|---|
| n_distinct | Number of distinct values |
| most_common_vals | Most frequent values |
| most_common_freqs | Frequencies of common values |
| histogram_bounds | Distribution of remaining values |
| correlation | Physical vs logical ordering |
Improving Statistics
-- Increase statistics target for better estimates
ALTER TABLE orders ALTER COLUMN status SET STATISTICS 1000;
ANALYZE orders;
-- Default is 100; max is 10000
-- Higher = more accurate but slower ANALYZE
🔧 Planner Configuration
Cost Parameters
-- View current settings
SHOW seq_page_cost; -- Cost of sequential page read (1.0)
SHOW random_page_cost; -- Cost of random page read (4.0)
SHOW cpu_tuple_cost; -- CPU cost per tuple (0.01)
SHOW cpu_index_tuple_cost; -- CPU cost per index tuple (0.005)
SHOW cpu_operator_cost; -- CPU cost per operator (0.0025)
SHOW effective_cache_size; -- Estimate of OS cache size
Adjusting for SSD
-- SSDs have lower random I/O cost
SET random_page_cost = 1.1; -- Much lower than default 4.0
-- Permanent change in postgresql.conf
-- random_page_cost = 1.1
Enabling/Disabling Plan Types
-- Disable sequential scans (for testing)
SET enable_seqscan = off;
-- Disable hash joins
SET enable_hashjoin = off;
-- These are useful for understanding why planner chose a particular plan
-- Don't use in production permanently!
Common Plan Problems
Problem 1: Wrong Row Estimates
EXPLAIN ANALYZE SELECT * FROM orders WHERE status = 'rare_status';
Seq Scan on orders (cost=0.00..1834.00 rows=10000 width=68) (actual time=0.015..12.456 rows=5 loops=1)
└──┬───┘ └┬┘
estimated: 10,000 actual: 5
Solution: Update statistics or increase statistics target:
ANALYZE orders;
-- Or for skewed distributions:
ALTER TABLE orders ALTER COLUMN status SET STATISTICS 1000;
ANALYZE orders;
Problem 2: Index Not Used
EXPLAIN SELECT * FROM orders WHERE customer_id = 100;
-- Shows Seq Scan despite index existing
Possible causes and solutions:
-- 1. Stale statistics
ANALYZE orders;
-- 2. Table is too small (index overhead not worth it)
-- This is often correct behavior
-- 3. Low selectivity (too many matching rows)
-- Index only helps for selective queries
-- 4. Type mismatch
-- Check column type vs query constant type
SELECT * FROM orders WHERE customer_id = '100'; -- String vs integer?
-- 5. Function on indexed column
-- Bad: WHERE LOWER(email) = '[email protected]'
-- Good: WHERE email = '[email protected]' (use CITEXT)
-- Good: CREATE INDEX idx_email_lower ON users (LOWER(email));
Problem 3: Nested Loop with Large Tables
EXPLAIN ANALYZE SELECT * FROM orders o JOIN products p ON o.product_id = p.id;
Nested Loop (cost=0.29..250000.29 rows=100000 width=136) (actual time=0.015..850.234 rows=100000 loops=1)
Solution: Encourage hash or merge join:
-- Increase work_mem for hash joins
SET work_mem = '256MB';
-- Re-run query
EXPLAIN ANALYZE SELECT * FROM orders o JOIN products p ON o.product_id = p.id;
🚀 Advanced Analysis
Using BUFFERS
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE customer_id = 100;
Index Scan using idx_orders_customer on orders (cost=0.29..8.31 rows=1 width=68) (actual time=0.015..0.016 rows=1 loops=1)
Index Cond: (customer_id = 100)
Buffers: shared hit=3 -- All from cache
Planning Time: 0.089 ms
Execution Time: 0.032 ms
Buffer terminology:
- shared hit: Pages found in shared buffer cache
- shared read: Pages read from disk
- shared written: Pages written (for updates)
Using auto_explain
-- Log plans for slow queries automatically
LOAD 'auto_explain';
SET auto_explain.log_min_duration = '100ms';
SET auto_explain.log_analyze = true;
SET auto_explain.log_buffers = true;
Optimization Strategies
Strategy 1: Create Appropriate Indexes
-- Analyze query patterns first
EXPLAIN ANALYZE SELECT * FROM orders
WHERE customer_id = 100 AND status = 'pending'
ORDER BY created_at DESC;
-- Create composite index matching query
CREATE INDEX idx_orders_customer_status_date
ON orders (customer_id, status, created_at DESC);
Strategy 2: Rewrite Queries
-- Bad: Function prevents index use
SELECT * FROM users WHERE YEAR(created_at) = 2025;
-- Good: Range query uses index
SELECT * FROM users
WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01';
Strategy 3: Use CTEs Wisely
-- PostgreSQL 12+: CTEs can be inlined (not optimization fences)
-- Use MATERIALIZED to prevent inlining if needed
WITH cte AS MATERIALIZED (
SELECT * FROM orders WHERE status = 'pending'
)
SELECT * FROM cte WHERE customer_id = 100;
Strategy 4: Partition Large Tables
-- Partition pruning eliminates irrelevant partitions
EXPLAIN SELECT * FROM orders_partitioned
WHERE order_date = '2025-06-15';
Append (cost=0.00..21.50 rows=5 width=40)
Subplans Removed: 11 -- Only relevant partition scanned
-> Seq Scan on orders_2025_06 (cost=0.00..21.50 rows=5 width=40)
Filter: (order_date = '2025-06-15'::date)
📊 Monitoring Plan Quality
Create Baseline Queries
-- Track query performance over time
CREATE TABLE query_baselines (
query_hash TEXT,
query_text TEXT,
execution_time_ms NUMERIC,
rows_returned INTEGER,
plan_hash TEXT,
captured_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Monitor for Plan Regressions
-- Find queries with degraded performance
SELECT
query,
calls,
mean_exec_time / 1000 AS avg_seconds,
stddev_exec_time / 1000 AS stddev_seconds
FROM pg_stat_statements
WHERE mean_exec_time > 1000 -- > 1 second
ORDER BY total_exec_time DESC
LIMIT 20;
✅ Best Practices
For DBAs
- Run ANALYZE regularly - Enable autovacuum and autoanalyze
- Monitor query plans - Use pg_stat_statements
- Tune cost parameters - Adjust for SSD, memory, etc.
- Increase statistics targets - For skewed columns
- Test plan changes - Before production deployment
For Developers
- Always use EXPLAIN ANALYZE - Verify actual vs estimated
- Check for index usage - Ensure indexes are being used
- Avoid functions on indexed columns - Prevents index usage
- Use parameterized queries - Enable plan caching
- Test with production-like data - Small datasets give different plans
Common EXPLAIN Patterns Cheatsheet
| Pattern | Meaning | Action |
|---|---|---|
| Seq Scan on large table | Missing index or low selectivity | Add index or refine query |
| Nested Loop with large outer | Wrong join type | Increase work_mem |
| Bitmap Heap Scan with many Recheck | Index not selective | Review index or query |
| Sort with high cost | Missing index for ORDER BY | Add sorted index |
| Hash with high memory | Large hash table | Increase work_mem |
| rows=X vs actual rows=Y (big diff) | Stale statistics | Run ANALYZE |
Conclusion
Understanding the PostgreSQL query planner is essential for database optimization. By learning to read execution plans and understanding how the planner makes decisions, you can write better queries, create effective indexes, and maintain high-performance databases.
Key Takeaways
- ✅ Use EXPLAIN ANALYZE to see actual execution
- ✅ Keep statistics updated with ANALYZE
- ✅ Match indexes to query patterns
- ✅ Tune cost parameters for your hardware
- ✅ Monitor for plan regressions over time
About the Author
Jamaurice Holt is a Cloud Database Expert and AWS Solutions Architect with over 10 years of experience in database optimization, high availability, and cloud migrations. Specializing in PostgreSQL, SQL optimization, and enterprise database solutions.
Related Articles
- PostgreSQL Index Types: When to Use Each
- PostgreSQL Performance Tuning Configuration
- PostgreSQL Monitoring with pg_stat
Need help with PostgreSQL optimization? Contact us for expert database consulting.