MySQL Partitioning Strategies
Table partitioning is a technique for dividing large tables into smaller, more manageable pieces called partitions. Each partition can be accessed and maintained independently, providing significant performance benefits for large-scale databases.
🎯 Why Use Partitioning?
| Benefit | Description |
|---|---|
| Query Performance | Partition pruning eliminates scanning irrelevant data |
| Maintenance | Manage partitions independently (backup, optimize, drop) |
| Data Lifecycle | Easily archive or delete old data by dropping partitions |
| Parallel Operations | Some operations can run in parallel across partitions |
| I/O Distribution | Spread data across multiple storage devices |
💡 Key Insight: Partitioning is most beneficial for tables with hundreds of millions of rows where queries typically access a subset of the data based on the partition key.
🔧 Partitioning Types
Overview
┌─────────────────────────────────────────────────────────────────────┐
│ PARTITIONING TYPES │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ RANGE LIST HASH KEY │
│ ───── ──── ──── ─── │
│ Values in Values match Hash of column MySQL's │
│ continuous discrete modulo N internal hash │
│ ranges values partitions function │
│ │
│ Best for: Best for: Best for: Best for: │
│ - Time series - Regions - Even - Natural │
│ - Date ranges - Categories distribution distribution │
│ - Numeric IDs - Status codes │
│ │
└─────────────────────────────────────────────────────────────────────┘
📦 Range Partitioning
Range partitioning assigns rows based on column values falling within given ranges.
Basic Syntax
CREATE TABLE orders (
order_id BIGINT AUTO_INCREMENT,
customer_id INT NOT NULL,
order_date DATE NOT NULL,
total_amount DECIMAL(12,2),
status VARCHAR(20),
PRIMARY KEY (order_id, order_date) -- Partition key must be in PK
) ENGINE=InnoDB
PARTITION BY RANGE (YEAR(order_date)) (
PARTITION p2022 VALUES LESS THAN (2023),
PARTITION p2023 VALUES LESS THAN (2024),
PARTITION p2024 VALUES LESS THAN (2025),
PARTITION p2025 VALUES LESS THAN (2026),
PARTITION p_future VALUES LESS THAN MAXVALUE
);
Production Example: Time-Series Data
-- Monthly partitions for high-volume transaction data
CREATE TABLE transactions (
id BIGINT AUTO_INCREMENT,
account_id INT NOT NULL,
transaction_date DATETIME NOT NULL,
amount DECIMAL(15,2) NOT NULL,
transaction_type ENUM('CREDIT', 'DEBIT') NOT NULL,
description VARCHAR(255),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (id, transaction_date),
INDEX idx_account (account_id),
INDEX idx_type_date (transaction_type, transaction_date)
) ENGINE=InnoDB
PARTITION BY RANGE (TO_DAYS(transaction_date)) (
PARTITION p202401 VALUES LESS THAN (TO_DAYS('2024-02-01')),
PARTITION p202402 VALUES LESS THAN (TO_DAYS('2024-03-01')),
PARTITION p202403 VALUES LESS THAN (TO_DAYS('2024-04-01')),
PARTITION p202404 VALUES LESS THAN (TO_DAYS('2024-05-01')),
PARTITION p202405 VALUES LESS THAN (TO_DAYS('2024-06-01')),
PARTITION p202406 VALUES LESS THAN (TO_DAYS('2024-07-01')),
PARTITION p202407 VALUES LESS THAN (TO_DAYS('2024-08-01')),
PARTITION p202408 VALUES LESS THAN (TO_DAYS('2024-09-01')),
PARTITION p202409 VALUES LESS THAN (TO_DAYS('2024-10-01')),
PARTITION p202410 VALUES LESS THAN (TO_DAYS('2024-11-01')),
PARTITION p202411 VALUES LESS THAN (TO_DAYS('2024-12-01')),
PARTITION p202412 VALUES LESS THAN (TO_DAYS('2025-01-01')),
PARTITION p_future VALUES LESS THAN MAXVALUE
);
Automatic Partition Management
-- Create procedure to manage partitions automatically
DELIMITER //
CREATE PROCEDURE sp_maintain_partitions(
IN p_table_name VARCHAR(64),
IN p_months_ahead INT,
IN p_months_to_keep INT
)
BEGIN
DECLARE v_partition_name VARCHAR(64);
DECLARE v_partition_date DATE;
DECLARE v_sql VARCHAR(2000);
DECLARE v_current_date DATE;
DECLARE v_i INT DEFAULT 0;
SET v_current_date = CURRENT_DATE;
-- Add future partitions
WHILE v_i < p_months_ahead DO
SET v_partition_date = DATE_ADD(v_current_date, INTERVAL v_i MONTH);
SET v_partition_name = CONCAT('p', DATE_FORMAT(v_partition_date, '%Y%m'));
-- Check if partition exists
IF NOT EXISTS (
SELECT 1 FROM information_schema.PARTITIONS
WHERE TABLE_SCHEMA = DATABASE()
AND TABLE_NAME = p_table_name
AND PARTITION_NAME = v_partition_name
) THEN
-- Reorganize MAXVALUE partition
SET v_sql = CONCAT(
'ALTER TABLE ', p_table_name,
' REORGANIZE PARTITION p_future INTO (',
'PARTITION ', v_partition_name,
' VALUES LESS THAN (TO_DAYS(''',
DATE_FORMAT(DATE_ADD(v_partition_date, INTERVAL 1 MONTH), '%Y-%m-01'),
''')),',
'PARTITION p_future VALUES LESS THAN MAXVALUE)'
);
PREPARE stmt FROM v_sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END IF;
SET v_i = v_i + 1;
END WHILE;
-- Drop old partitions
SET v_partition_date = DATE_SUB(v_current_date, INTERVAL p_months_to_keep MONTH);
SET v_partition_name = CONCAT('p', DATE_FORMAT(v_partition_date, '%Y%m'));
-- Get partitions older than retention period
SET @drop_list = NULL;
SELECT GROUP_CONCAT(PARTITION_NAME)
INTO @drop_list
FROM information_schema.PARTITIONS
WHERE TABLE_SCHEMA = DATABASE()
AND TABLE_NAME = p_table_name
AND PARTITION_NAME != 'p_future'
AND PARTITION_DESCRIPTION < TO_DAYS(DATE_SUB(CURRENT_DATE, INTERVAL p_months_to_keep MONTH));
IF @drop_list IS NOT NULL THEN
SET v_sql = CONCAT('ALTER TABLE ', p_table_name, ' DROP PARTITION ', @drop_list);
PREPARE stmt FROM v_sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END IF;
END //
DELIMITER ;
-- Schedule partition maintenance
CREATE EVENT evt_partition_maintenance
ON SCHEDULE EVERY 1 DAY
STARTS (TIMESTAMP(CURRENT_DATE) + INTERVAL 1 HOUR)
DO
CALL sp_maintain_partitions('transactions', 6, 24);
📋 List Partitioning
List partitioning assigns rows based on column values matching values in discrete sets.
-- Partition by region
CREATE TABLE customers (
customer_id INT AUTO_INCREMENT,
name VARCHAR(100) NOT NULL,
email VARCHAR(255),
region_code CHAR(2) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (customer_id, region_code)
) ENGINE=InnoDB
PARTITION BY LIST COLUMNS (region_code) (
PARTITION p_northeast VALUES IN ('NY', 'NJ', 'CT', 'MA', 'PA', 'VT', 'NH', 'ME', 'RI'),
PARTITION p_southeast VALUES IN ('FL', 'GA', 'NC', 'SC', 'VA', 'WV', 'TN', 'AL', 'MS'),
PARTITION p_midwest VALUES IN ('IL', 'OH', 'MI', 'IN', 'WI', 'MN', 'IA', 'MO', 'KS'),
PARTITION p_southwest VALUES IN ('TX', 'OK', 'NM', 'AZ'),
PARTITION p_west VALUES IN ('CA', 'WA', 'OR', 'NV', 'CO', 'UT', 'ID', 'MT', 'WY'),
PARTITION p_other VALUES IN ('HI', 'AK', 'DC', 'PR', 'VI')
);
-- Partition by status
CREATE TABLE orders_by_status (
order_id BIGINT AUTO_INCREMENT,
customer_id INT NOT NULL,
order_date DATE NOT NULL,
status TINYINT NOT NULL,
total_amount DECIMAL(12,2),
PRIMARY KEY (order_id, status)
) ENGINE=InnoDB
PARTITION BY LIST (status) (
PARTITION p_pending VALUES IN (1, 2), -- New, Processing
PARTITION p_shipped VALUES IN (3, 4), -- Shipped, In Transit
PARTITION p_completed VALUES IN (5), -- Delivered
PARTITION p_cancelled VALUES IN (6, 7, 8) -- Cancelled, Returned, Refunded
);
🔢 Hash Partitioning
Hash partitioning distributes rows evenly across a predefined number of partitions using a hash function.
-- Even distribution across partitions
CREATE TABLE sessions (
session_id VARCHAR(64) PRIMARY KEY,
user_id INT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
last_activity TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
data JSON
) ENGINE=InnoDB
PARTITION BY HASH (user_id)
PARTITIONS 16;
-- Linear hash for easier partition management
CREATE TABLE cache_entries (
cache_key VARCHAR(255) PRIMARY KEY,
cache_value MEDIUMBLOB,
expires_at DATETIME NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB
PARTITION BY LINEAR HASH (CRC32(cache_key))
PARTITIONS 32;
🔑 Key Partitioning
Key partitioning is similar to hash partitioning but uses MySQL's internal hashing function.
-- Partition on primary key
CREATE TABLE user_activities (
id BIGINT AUTO_INCREMENT,
user_id INT NOT NULL,
activity_type VARCHAR(50),
activity_data JSON,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (id, user_id)
) ENGINE=InnoDB
PARTITION BY KEY (user_id)
PARTITIONS 8;
-- Partition on multiple columns
CREATE TABLE logs (
id BIGINT AUTO_INCREMENT,
server_id INT NOT NULL,
log_date DATE NOT NULL,
log_level TINYINT,
message TEXT,
PRIMARY KEY (id, server_id, log_date)
) ENGINE=InnoDB
PARTITION BY KEY (server_id, log_date)
PARTITIONS 16;
🎯 Subpartitioning
Combine partitioning methods for more granular control.
-- Range-Hash subpartitioning
CREATE TABLE sales (
sale_id BIGINT AUTO_INCREMENT,
store_id INT NOT NULL,
sale_date DATE NOT NULL,
amount DECIMAL(12,2),
PRIMARY KEY (sale_id, store_id, sale_date)
) ENGINE=InnoDB
PARTITION BY RANGE (YEAR(sale_date))
SUBPARTITION BY HASH (store_id)
SUBPARTITIONS 4 (
PARTITION p2023 VALUES LESS THAN (2024),
PARTITION p2024 VALUES LESS THAN (2025),
PARTITION p2025 VALUES LESS THAN (2026),
PARTITION p_future VALUES LESS THAN MAXVALUE
);
-- This creates: p2023sp0, p2023sp1, p2023sp2, p2023sp3, p2024sp0, etc.
📊 Partition Pruning
Partition pruning is MySQL's optimization that skips irrelevant partitions during query execution.
How to Verify Partition Pruning
-- Check which partitions are accessed
EXPLAIN PARTITIONS
SELECT *
FROM transactions
WHERE transaction_date BETWEEN '2024-06-01' AND '2024-06-30';
-- Output shows only relevant partition(s)
-- partitions: p202406
-- Non-partition column doesn't help pruning
EXPLAIN PARTITIONS
SELECT *
FROM transactions
WHERE account_id = 12345;
-- Output shows ALL partitions scanned
-- partitions: p202401,p202402,p202403,...
-- Combine partition key with other filters
EXPLAIN PARTITIONS
SELECT *
FROM transactions
WHERE transaction_date >= '2024-06-01'
AND transaction_date < '2024-07-01'
AND account_id = 12345;
-- Output: partitions: p202406 (pruned to one partition)
Best Practices for Pruning
-- ✅ DO: Use partition column in WHERE clause
SELECT * FROM transactions
WHERE transaction_date >= '2024-06-01'
AND transaction_date < '2024-07-01';
-- ❌ DON'T: Apply functions to partition column
SELECT * FROM transactions
WHERE DATE(transaction_date) = '2024-06-15'; -- No pruning!
-- ✅ DO: Use proper range
SELECT * FROM transactions
WHERE transaction_date = '2024-06-15'; -- Pruning works
-- ✅ DO: Include partition column in JOINs
SELECT o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE o.order_date >= '2024-01-01'
AND o.order_date < '2024-02-01';
🔧 Partition Management Operations
Adding Partitions
-- Add new partition (RANGE)
ALTER TABLE transactions
REORGANIZE PARTITION p_future INTO (
PARTITION p202501 VALUES LESS THAN (TO_DAYS('2025-02-01')),
PARTITION p_future VALUES LESS THAN MAXVALUE
);
-- Add new partition (LIST)
ALTER TABLE customers
ADD PARTITION (
PARTITION p_territories VALUES IN ('GU', 'AS', 'MP')
);
-- Add more hash partitions
ALTER TABLE sessions COALESCE PARTITION 4; -- Reduce from 16 to 12
ALTER TABLE sessions ADD PARTITION PARTITIONS 4; -- Increase from 16 to 20
Dropping Partitions
-- Drop old partitions (fast data deletion!)
ALTER TABLE transactions DROP PARTITION p202301, p202302, p202303;
-- This is MUCH faster than:
-- DELETE FROM transactions WHERE transaction_date < '2023-04-01';
Exchanging Partitions
-- Create archive table with same structure (no partitioning)
CREATE TABLE transactions_archive LIKE transactions;
ALTER TABLE transactions_archive REMOVE PARTITIONING;
-- Exchange partition with archive table
ALTER TABLE transactions
EXCHANGE PARTITION p202301
WITH TABLE transactions_archive;
-- Now p202301 is empty, data is in transactions_archive
-- Drop the empty partition
ALTER TABLE transactions DROP PARTITION p202301;
Rebuilding Partitions
-- Rebuild specific partitions (reclaims space, reorganizes data)
ALTER TABLE transactions REBUILD PARTITION p202406, p202407;
-- Optimize partitions (defragment, update statistics)
ALTER TABLE transactions OPTIMIZE PARTITION p202406;
-- Analyze partition statistics
ALTER TABLE transactions ANALYZE PARTITION p202406;
-- Repair partition
ALTER TABLE transactions REPAIR PARTITION p202406;
📊 Monitoring Partitions
Viewing Partition Information
-- Partition summary
SELECT
TABLE_NAME,
PARTITION_NAME,
PARTITION_METHOD,
PARTITION_EXPRESSION,
PARTITION_DESCRIPTION,
TABLE_ROWS,
AVG_ROW_LENGTH,
DATA_LENGTH,
INDEX_LENGTH
FROM information_schema.PARTITIONS
WHERE TABLE_SCHEMA = 'your_database'
AND TABLE_NAME = 'transactions'
ORDER BY PARTITION_ORDINAL_POSITION;
-- Space usage by partition
SELECT
PARTITION_NAME,
TABLE_ROWS AS estimated_rows,
ROUND(DATA_LENGTH / 1024 / 1024, 2) AS data_mb,
ROUND(INDEX_LENGTH / 1024 / 1024, 2) AS index_mb,
ROUND((DATA_LENGTH + INDEX_LENGTH) / 1024 / 1024, 2) AS total_mb
FROM information_schema.PARTITIONS
WHERE TABLE_SCHEMA = 'your_database'
AND TABLE_NAME = 'transactions'
AND PARTITION_NAME IS NOT NULL
ORDER BY PARTITION_ORDINAL_POSITION;
Create Monitoring View
CREATE VIEW v_partition_stats AS
SELECT
TABLE_SCHEMA,
TABLE_NAME,
COUNT(*) AS partition_count,
SUM(TABLE_ROWS) AS total_rows,
ROUND(SUM(DATA_LENGTH) / 1024 / 1024 / 1024, 2) AS data_gb,
ROUND(SUM(INDEX_LENGTH) / 1024 / 1024 / 1024, 2) AS index_gb,
MIN(PARTITION_NAME) AS oldest_partition,
MAX(PARTITION_NAME) AS newest_partition
FROM information_schema.PARTITIONS
WHERE PARTITION_NAME IS NOT NULL
GROUP BY TABLE_SCHEMA, TABLE_NAME;
SELECT * FROM v_partition_stats;
⚠️ Common Pitfalls
Pitfall 1: Foreign Key Constraints
-- ❌ Foreign keys are NOT supported with partitioned tables
CREATE TABLE orders (
order_id BIGINT,
customer_id INT,
order_date DATE,
PRIMARY KEY (order_id, order_date),
FOREIGN KEY (customer_id) REFERENCES customers(customer_id) -- ERROR!
) PARTITION BY RANGE (YEAR(order_date)) (...);
-- ✅ Enforce referential integrity at application level or with triggers
Pitfall 2: Unique Constraints
-- ❌ All unique keys must include partition columns
CREATE TABLE orders (
order_id BIGINT AUTO_INCREMENT PRIMARY KEY, -- ERROR!
order_date DATE
) PARTITION BY RANGE (YEAR(order_date)) (...);
-- ✅ Include partition column in primary key
CREATE TABLE orders (
order_id BIGINT AUTO_INCREMENT,
order_date DATE,
PRIMARY KEY (order_id, order_date)
) PARTITION BY RANGE (YEAR(order_date)) (...);
Pitfall 3: Wrong Partition Key Choice
-- ❌ Partition on rarely-filtered column
CREATE TABLE orders (...)
PARTITION BY HASH (shipping_address_id)
PARTITIONS 16;
-- Most queries filter by date, not shipping address!
-- ✅ Partition on commonly-filtered column
CREATE TABLE orders (...)
PARTITION BY RANGE (TO_DAYS(order_date)) (...);
-- Queries typically filter by date range
✅ Best Practices Checklist
Design Phase
- [ ] Choose partition key based on common query patterns
- [ ] Include partition column in all unique indexes
- [ ] Plan partition maintenance strategy before implementation
- [ ] Estimate rows per partition (aim for millions, not billions)
- [ ] Consider time-based partitioning for historical data
Operations
- [ ] Automate partition creation and removal
- [ ] Monitor partition sizes regularly
- [ ] Test partition pruning with EXPLAIN
- [ ] Document partition strategy
Performance
- [ ] Ensure queries can leverage partition pruning
- [ ] Avoid functions on partition columns in WHERE clauses
- [ ] Keep number of partitions manageable (< 8192 recommended)
- [ ] Use exchange partition for fast archival
🎓 Conclusion
MySQL partitioning is a powerful tool for managing large tables, improving query performance, and simplifying data lifecycle management. The key is choosing the right partitioning strategy based on your query patterns and maintenance requirements.
💡 Pro Tip: Start with partitioning only when tables exceed 50-100 million rows and queries naturally filter on a specific column. Premature partitioning can add complexity without significant benefits.
About the Author
Jamaurice Holt is a Cloud Database Expert and AWS Solutions Architect with over 10 years of experience in database optimization, high availability, and cloud migrations. Specializing in MySQL, SQL optimization, and enterprise database solutions.
Related Articles
- MySQL Index Optimization Strategies
- MySQL Table Optimization and Maintenance
- MySQL Performance Schema Monitoring
Need help with MySQL partitioning? Contact us for expert database consulting.