MySQL Partitioning Strategies

Complete guide to MySQL table partitioning strategies including range, list, hash, and key partitioning with real-world examples and performance optimization.

MySQL Partitioning Strategies

Table partitioning is a technique for dividing large tables into smaller, more manageable pieces called partitions. Each partition can be accessed and maintained independently, providing significant performance benefits for large-scale databases.


🎯 Why Use Partitioning?

BenefitDescription
Query PerformancePartition pruning eliminates scanning irrelevant data
MaintenanceManage partitions independently (backup, optimize, drop)
Data LifecycleEasily archive or delete old data by dropping partitions
Parallel OperationsSome operations can run in parallel across partitions
I/O DistributionSpread data across multiple storage devices

💡 Key Insight: Partitioning is most beneficial for tables with hundreds of millions of rows where queries typically access a subset of the data based on the partition key.


🔧 Partitioning Types

Overview

┌─────────────────────────────────────────────────────────────────────┐
│                    PARTITIONING TYPES                                │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  RANGE            LIST             HASH              KEY            │
│  ─────            ────             ────              ───            │
│  Values in        Values match     Hash of column    MySQL's       │
│  continuous       discrete         modulo N          internal hash │
│  ranges           values           partitions        function      │
│                                                                      │
│  Best for:        Best for:        Best for:         Best for:     │
│  - Time series    - Regions        - Even            - Natural     │
│  - Date ranges    - Categories     distribution      distribution  │
│  - Numeric IDs    - Status codes                                    │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

📦 Range Partitioning

Range partitioning assigns rows based on column values falling within given ranges.

Basic Syntax

CREATE TABLE orders (
    order_id BIGINT AUTO_INCREMENT,
    customer_id INT NOT NULL,
    order_date DATE NOT NULL,
    total_amount DECIMAL(12,2),
    status VARCHAR(20),
    PRIMARY KEY (order_id, order_date)  -- Partition key must be in PK
) ENGINE=InnoDB
PARTITION BY RANGE (YEAR(order_date)) (
    PARTITION p2022 VALUES LESS THAN (2023),
    PARTITION p2023 VALUES LESS THAN (2024),
    PARTITION p2024 VALUES LESS THAN (2025),
    PARTITION p2025 VALUES LESS THAN (2026),
    PARTITION p_future VALUES LESS THAN MAXVALUE
);

Production Example: Time-Series Data

-- Monthly partitions for high-volume transaction data
CREATE TABLE transactions (
    id BIGINT AUTO_INCREMENT,
    account_id INT NOT NULL,
    transaction_date DATETIME NOT NULL,
    amount DECIMAL(15,2) NOT NULL,
    transaction_type ENUM('CREDIT', 'DEBIT') NOT NULL,
    description VARCHAR(255),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (id, transaction_date),
    INDEX idx_account (account_id),
    INDEX idx_type_date (transaction_type, transaction_date)
) ENGINE=InnoDB
PARTITION BY RANGE (TO_DAYS(transaction_date)) (
    PARTITION p202401 VALUES LESS THAN (TO_DAYS('2024-02-01')),
    PARTITION p202402 VALUES LESS THAN (TO_DAYS('2024-03-01')),
    PARTITION p202403 VALUES LESS THAN (TO_DAYS('2024-04-01')),
    PARTITION p202404 VALUES LESS THAN (TO_DAYS('2024-05-01')),
    PARTITION p202405 VALUES LESS THAN (TO_DAYS('2024-06-01')),
    PARTITION p202406 VALUES LESS THAN (TO_DAYS('2024-07-01')),
    PARTITION p202407 VALUES LESS THAN (TO_DAYS('2024-08-01')),
    PARTITION p202408 VALUES LESS THAN (TO_DAYS('2024-09-01')),
    PARTITION p202409 VALUES LESS THAN (TO_DAYS('2024-10-01')),
    PARTITION p202410 VALUES LESS THAN (TO_DAYS('2024-11-01')),
    PARTITION p202411 VALUES LESS THAN (TO_DAYS('2024-12-01')),
    PARTITION p202412 VALUES LESS THAN (TO_DAYS('2025-01-01')),
    PARTITION p_future VALUES LESS THAN MAXVALUE
);

Automatic Partition Management

-- Create procedure to manage partitions automatically
DELIMITER //

CREATE PROCEDURE sp_maintain_partitions(
    IN p_table_name VARCHAR(64),
    IN p_months_ahead INT,
    IN p_months_to_keep INT
)
BEGIN
    DECLARE v_partition_name VARCHAR(64);
    DECLARE v_partition_date DATE;
    DECLARE v_sql VARCHAR(2000);
    DECLARE v_current_date DATE;
    DECLARE v_i INT DEFAULT 0;

    SET v_current_date = CURRENT_DATE;

    -- Add future partitions
    WHILE v_i < p_months_ahead DO
        SET v_partition_date = DATE_ADD(v_current_date, INTERVAL v_i MONTH);
        SET v_partition_name = CONCAT('p', DATE_FORMAT(v_partition_date, '%Y%m'));

        -- Check if partition exists
        IF NOT EXISTS (
            SELECT 1 FROM information_schema.PARTITIONS
            WHERE TABLE_SCHEMA = DATABASE()
                AND TABLE_NAME = p_table_name
                AND PARTITION_NAME = v_partition_name
        ) THEN
            -- Reorganize MAXVALUE partition
            SET v_sql = CONCAT(
                'ALTER TABLE ', p_table_name,
                ' REORGANIZE PARTITION p_future INTO (',
                'PARTITION ', v_partition_name,
                ' VALUES LESS THAN (TO_DAYS(''',
                DATE_FORMAT(DATE_ADD(v_partition_date, INTERVAL 1 MONTH), '%Y-%m-01'),
                ''')),',
                'PARTITION p_future VALUES LESS THAN MAXVALUE)'
            );
            PREPARE stmt FROM v_sql;
            EXECUTE stmt;
            DEALLOCATE PREPARE stmt;
        END IF;

        SET v_i = v_i + 1;
    END WHILE;

    -- Drop old partitions
    SET v_partition_date = DATE_SUB(v_current_date, INTERVAL p_months_to_keep MONTH);
    SET v_partition_name = CONCAT('p', DATE_FORMAT(v_partition_date, '%Y%m'));

    -- Get partitions older than retention period
    SET @drop_list = NULL;
    SELECT GROUP_CONCAT(PARTITION_NAME)
    INTO @drop_list
    FROM information_schema.PARTITIONS
    WHERE TABLE_SCHEMA = DATABASE()
        AND TABLE_NAME = p_table_name
        AND PARTITION_NAME != 'p_future'
        AND PARTITION_DESCRIPTION < TO_DAYS(DATE_SUB(CURRENT_DATE, INTERVAL p_months_to_keep MONTH));

    IF @drop_list IS NOT NULL THEN
        SET v_sql = CONCAT('ALTER TABLE ', p_table_name, ' DROP PARTITION ', @drop_list);
        PREPARE stmt FROM v_sql;
        EXECUTE stmt;
        DEALLOCATE PREPARE stmt;
    END IF;
END //

DELIMITER ;

-- Schedule partition maintenance
CREATE EVENT evt_partition_maintenance
    ON SCHEDULE EVERY 1 DAY
    STARTS (TIMESTAMP(CURRENT_DATE) + INTERVAL 1 HOUR)
DO
    CALL sp_maintain_partitions('transactions', 6, 24);

📋 List Partitioning

List partitioning assigns rows based on column values matching values in discrete sets.

-- Partition by region
CREATE TABLE customers (
    customer_id INT AUTO_INCREMENT,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(255),
    region_code CHAR(2) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (customer_id, region_code)
) ENGINE=InnoDB
PARTITION BY LIST COLUMNS (region_code) (
    PARTITION p_northeast VALUES IN ('NY', 'NJ', 'CT', 'MA', 'PA', 'VT', 'NH', 'ME', 'RI'),
    PARTITION p_southeast VALUES IN ('FL', 'GA', 'NC', 'SC', 'VA', 'WV', 'TN', 'AL', 'MS'),
    PARTITION p_midwest VALUES IN ('IL', 'OH', 'MI', 'IN', 'WI', 'MN', 'IA', 'MO', 'KS'),
    PARTITION p_southwest VALUES IN ('TX', 'OK', 'NM', 'AZ'),
    PARTITION p_west VALUES IN ('CA', 'WA', 'OR', 'NV', 'CO', 'UT', 'ID', 'MT', 'WY'),
    PARTITION p_other VALUES IN ('HI', 'AK', 'DC', 'PR', 'VI')
);

-- Partition by status
CREATE TABLE orders_by_status (
    order_id BIGINT AUTO_INCREMENT,
    customer_id INT NOT NULL,
    order_date DATE NOT NULL,
    status TINYINT NOT NULL,
    total_amount DECIMAL(12,2),
    PRIMARY KEY (order_id, status)
) ENGINE=InnoDB
PARTITION BY LIST (status) (
    PARTITION p_pending VALUES IN (1, 2),        -- New, Processing
    PARTITION p_shipped VALUES IN (3, 4),        -- Shipped, In Transit
    PARTITION p_completed VALUES IN (5),         -- Delivered
    PARTITION p_cancelled VALUES IN (6, 7, 8)    -- Cancelled, Returned, Refunded
);

🔢 Hash Partitioning

Hash partitioning distributes rows evenly across a predefined number of partitions using a hash function.

-- Even distribution across partitions
CREATE TABLE sessions (
    session_id VARCHAR(64) PRIMARY KEY,
    user_id INT NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    last_activity TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    data JSON
) ENGINE=InnoDB
PARTITION BY HASH (user_id)
PARTITIONS 16;

-- Linear hash for easier partition management
CREATE TABLE cache_entries (
    cache_key VARCHAR(255) PRIMARY KEY,
    cache_value MEDIUMBLOB,
    expires_at DATETIME NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB
PARTITION BY LINEAR HASH (CRC32(cache_key))
PARTITIONS 32;

🔑 Key Partitioning

Key partitioning is similar to hash partitioning but uses MySQL's internal hashing function.

-- Partition on primary key
CREATE TABLE user_activities (
    id BIGINT AUTO_INCREMENT,
    user_id INT NOT NULL,
    activity_type VARCHAR(50),
    activity_data JSON,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (id, user_id)
) ENGINE=InnoDB
PARTITION BY KEY (user_id)
PARTITIONS 8;

-- Partition on multiple columns
CREATE TABLE logs (
    id BIGINT AUTO_INCREMENT,
    server_id INT NOT NULL,
    log_date DATE NOT NULL,
    log_level TINYINT,
    message TEXT,
    PRIMARY KEY (id, server_id, log_date)
) ENGINE=InnoDB
PARTITION BY KEY (server_id, log_date)
PARTITIONS 16;

🎯 Subpartitioning

Combine partitioning methods for more granular control.

-- Range-Hash subpartitioning
CREATE TABLE sales (
    sale_id BIGINT AUTO_INCREMENT,
    store_id INT NOT NULL,
    sale_date DATE NOT NULL,
    amount DECIMAL(12,2),
    PRIMARY KEY (sale_id, store_id, sale_date)
) ENGINE=InnoDB
PARTITION BY RANGE (YEAR(sale_date))
SUBPARTITION BY HASH (store_id)
SUBPARTITIONS 4 (
    PARTITION p2023 VALUES LESS THAN (2024),
    PARTITION p2024 VALUES LESS THAN (2025),
    PARTITION p2025 VALUES LESS THAN (2026),
    PARTITION p_future VALUES LESS THAN MAXVALUE
);

-- This creates: p2023sp0, p2023sp1, p2023sp2, p2023sp3, p2024sp0, etc.

📊 Partition Pruning

Partition pruning is MySQL's optimization that skips irrelevant partitions during query execution.

How to Verify Partition Pruning

-- Check which partitions are accessed
EXPLAIN PARTITIONS
SELECT *
FROM transactions
WHERE transaction_date BETWEEN '2024-06-01' AND '2024-06-30';

-- Output shows only relevant partition(s)
-- partitions: p202406

-- Non-partition column doesn't help pruning
EXPLAIN PARTITIONS
SELECT *
FROM transactions
WHERE account_id = 12345;

-- Output shows ALL partitions scanned
-- partitions: p202401,p202402,p202403,...

-- Combine partition key with other filters
EXPLAIN PARTITIONS
SELECT *
FROM transactions
WHERE transaction_date >= '2024-06-01'
    AND transaction_date < '2024-07-01'
    AND account_id = 12345;

-- Output: partitions: p202406 (pruned to one partition)

Best Practices for Pruning

-- ✅ DO: Use partition column in WHERE clause
SELECT * FROM transactions
WHERE transaction_date >= '2024-06-01'
    AND transaction_date < '2024-07-01';

-- ❌ DON'T: Apply functions to partition column
SELECT * FROM transactions
WHERE DATE(transaction_date) = '2024-06-15';  -- No pruning!

-- ✅ DO: Use proper range
SELECT * FROM transactions
WHERE transaction_date = '2024-06-15';  -- Pruning works

-- ✅ DO: Include partition column in JOINs
SELECT o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE o.order_date >= '2024-01-01'
    AND o.order_date < '2024-02-01';

🔧 Partition Management Operations

Adding Partitions

-- Add new partition (RANGE)
ALTER TABLE transactions
    REORGANIZE PARTITION p_future INTO (
        PARTITION p202501 VALUES LESS THAN (TO_DAYS('2025-02-01')),
        PARTITION p_future VALUES LESS THAN MAXVALUE
    );

-- Add new partition (LIST)
ALTER TABLE customers
    ADD PARTITION (
        PARTITION p_territories VALUES IN ('GU', 'AS', 'MP')
    );

-- Add more hash partitions
ALTER TABLE sessions COALESCE PARTITION 4;  -- Reduce from 16 to 12
ALTER TABLE sessions ADD PARTITION PARTITIONS 4;  -- Increase from 16 to 20

Dropping Partitions

-- Drop old partitions (fast data deletion!)
ALTER TABLE transactions DROP PARTITION p202301, p202302, p202303;

-- This is MUCH faster than:
-- DELETE FROM transactions WHERE transaction_date < '2023-04-01';

Exchanging Partitions

-- Create archive table with same structure (no partitioning)
CREATE TABLE transactions_archive LIKE transactions;
ALTER TABLE transactions_archive REMOVE PARTITIONING;

-- Exchange partition with archive table
ALTER TABLE transactions
    EXCHANGE PARTITION p202301
    WITH TABLE transactions_archive;

-- Now p202301 is empty, data is in transactions_archive
-- Drop the empty partition
ALTER TABLE transactions DROP PARTITION p202301;

Rebuilding Partitions

-- Rebuild specific partitions (reclaims space, reorganizes data)
ALTER TABLE transactions REBUILD PARTITION p202406, p202407;

-- Optimize partitions (defragment, update statistics)
ALTER TABLE transactions OPTIMIZE PARTITION p202406;

-- Analyze partition statistics
ALTER TABLE transactions ANALYZE PARTITION p202406;

-- Repair partition
ALTER TABLE transactions REPAIR PARTITION p202406;

📊 Monitoring Partitions

Viewing Partition Information

-- Partition summary
SELECT
    TABLE_NAME,
    PARTITION_NAME,
    PARTITION_METHOD,
    PARTITION_EXPRESSION,
    PARTITION_DESCRIPTION,
    TABLE_ROWS,
    AVG_ROW_LENGTH,
    DATA_LENGTH,
    INDEX_LENGTH
FROM information_schema.PARTITIONS
WHERE TABLE_SCHEMA = 'your_database'
    AND TABLE_NAME = 'transactions'
ORDER BY PARTITION_ORDINAL_POSITION;

-- Space usage by partition
SELECT
    PARTITION_NAME,
    TABLE_ROWS AS estimated_rows,
    ROUND(DATA_LENGTH / 1024 / 1024, 2) AS data_mb,
    ROUND(INDEX_LENGTH / 1024 / 1024, 2) AS index_mb,
    ROUND((DATA_LENGTH + INDEX_LENGTH) / 1024 / 1024, 2) AS total_mb
FROM information_schema.PARTITIONS
WHERE TABLE_SCHEMA = 'your_database'
    AND TABLE_NAME = 'transactions'
    AND PARTITION_NAME IS NOT NULL
ORDER BY PARTITION_ORDINAL_POSITION;

Create Monitoring View

CREATE VIEW v_partition_stats AS
SELECT
    TABLE_SCHEMA,
    TABLE_NAME,
    COUNT(*) AS partition_count,
    SUM(TABLE_ROWS) AS total_rows,
    ROUND(SUM(DATA_LENGTH) / 1024 / 1024 / 1024, 2) AS data_gb,
    ROUND(SUM(INDEX_LENGTH) / 1024 / 1024 / 1024, 2) AS index_gb,
    MIN(PARTITION_NAME) AS oldest_partition,
    MAX(PARTITION_NAME) AS newest_partition
FROM information_schema.PARTITIONS
WHERE PARTITION_NAME IS NOT NULL
GROUP BY TABLE_SCHEMA, TABLE_NAME;

SELECT * FROM v_partition_stats;

⚠️ Common Pitfalls

Pitfall 1: Foreign Key Constraints

-- ❌ Foreign keys are NOT supported with partitioned tables
CREATE TABLE orders (
    order_id BIGINT,
    customer_id INT,
    order_date DATE,
    PRIMARY KEY (order_id, order_date),
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)  -- ERROR!
) PARTITION BY RANGE (YEAR(order_date)) (...);

-- ✅ Enforce referential integrity at application level or with triggers

Pitfall 2: Unique Constraints

-- ❌ All unique keys must include partition columns
CREATE TABLE orders (
    order_id BIGINT AUTO_INCREMENT PRIMARY KEY,  -- ERROR!
    order_date DATE
) PARTITION BY RANGE (YEAR(order_date)) (...);

-- ✅ Include partition column in primary key
CREATE TABLE orders (
    order_id BIGINT AUTO_INCREMENT,
    order_date DATE,
    PRIMARY KEY (order_id, order_date)
) PARTITION BY RANGE (YEAR(order_date)) (...);

Pitfall 3: Wrong Partition Key Choice

-- ❌ Partition on rarely-filtered column
CREATE TABLE orders (...)
PARTITION BY HASH (shipping_address_id)
PARTITIONS 16;
-- Most queries filter by date, not shipping address!

-- ✅ Partition on commonly-filtered column
CREATE TABLE orders (...)
PARTITION BY RANGE (TO_DAYS(order_date)) (...);
-- Queries typically filter by date range

✅ Best Practices Checklist

Design Phase

  • [ ] Choose partition key based on common query patterns
  • [ ] Include partition column in all unique indexes
  • [ ] Plan partition maintenance strategy before implementation
  • [ ] Estimate rows per partition (aim for millions, not billions)
  • [ ] Consider time-based partitioning for historical data

Operations

  • [ ] Automate partition creation and removal
  • [ ] Monitor partition sizes regularly
  • [ ] Test partition pruning with EXPLAIN
  • [ ] Document partition strategy

Performance

  • [ ] Ensure queries can leverage partition pruning
  • [ ] Avoid functions on partition columns in WHERE clauses
  • [ ] Keep number of partitions manageable (< 8192 recommended)
  • [ ] Use exchange partition for fast archival

🎓 Conclusion

MySQL partitioning is a powerful tool for managing large tables, improving query performance, and simplifying data lifecycle management. The key is choosing the right partitioning strategy based on your query patterns and maintenance requirements.

💡 Pro Tip: Start with partitioning only when tables exceed 50-100 million rows and queries naturally filter on a specific column. Premature partitioning can add complexity without significant benefits.


About the Author

Jamaurice Holt is a Cloud Database Expert and AWS Solutions Architect with over 10 years of experience in database optimization, high availability, and cloud migrations. Specializing in MySQL, SQL optimization, and enterprise database solutions.

Related Articles

Need help with MySQL partitioning? Contact us for expert database consulting.

Continue Reading

Leadership

Leadership in Motion: Technical Leadership Lessons from Database Operations

How a decade of database administration shaped my approach to technical leadership — lessons on incident response, mentoring engineers,...

July 10, 2024 · Read article →
Performance

Database Performance Tuning: Lessons from the Trenches

Real-world strategies for optimizing database performance in high-traffic production environments — query plan analysis, indexing, caching,...

August 22, 2024 · Read article →
AI

AI Governance and Hardware: Why Sovereign Refusal Belongs in Silicon

Software guardrails can be prompted around. The real frontier in AI safety is anchoring refusal below the model — at the hardware level. A...

September 15, 2024 · Read article →