SQL Server Data Compression Techniques

Expert guide on sql server data compression techniques with practical examples and best practices for database administrators.

🚀 SQL Server Data Compression: The Complete DBA Guide

Data compression is one of the most powerful yet underutilized features in SQL Server. When implemented correctly, it can reduce storage costs by 50-90%, improve I/O performance, and even accelerate query execution. This comprehensive guide covers everything you need to know about implementing data compression in production environments.


📊 Understanding SQL Server Compression Types

SQL Server offers several compression options, each designed for specific use cases and workload patterns.

Row Compression

Row compression is the lighter option that stores fixed-length data types using variable-length storage formats. It's the foundation upon which page compression builds.

-- Enable row compression on an existing table
ALTER TABLE Sales.OrderDetails
REBUILD WITH (DATA_COMPRESSION = ROW);

-- Enable row compression on a specific partition
ALTER TABLE Sales.OrderHistory
REBUILD PARTITION = 5 WITH (DATA_COMPRESSION = ROW);

-- Create a new table with row compression
CREATE TABLE Sales.CompressedOrders
(
    OrderID INT NOT NULL,
    CustomerID INT NOT NULL,
    OrderDate DATETIME2 NOT NULL,
    TotalAmount DECIMAL(18,2) NOT NULL,
    Status VARCHAR(20) NOT NULL,
    CONSTRAINT PK_CompressedOrders PRIMARY KEY CLUSTERED (OrderID)
)
WITH (DATA_COMPRESSION = ROW);

Page Compression

Page compression includes row compression plus additional algorithms: prefix compression and dictionary compression. This typically achieves 50-80% compression ratios.

-- Enable page compression on an existing table
ALTER TABLE Sales.OrderDetails
REBUILD WITH (DATA_COMPRESSION = PAGE);

-- Enable page compression on an index
ALTER INDEX IX_OrderDetails_CustomerID
ON Sales.OrderDetails
REBUILD WITH (DATA_COMPRESSION = PAGE);

-- Create index with page compression
CREATE NONCLUSTERED INDEX IX_Orders_OrderDate
ON Sales.Orders (OrderDate)
INCLUDE (CustomerID, TotalAmount)
WITH (DATA_COMPRESSION = PAGE);

Columnstore Compression

Columnstore indexes use a highly efficient compression algorithm designed for analytics workloads. They can achieve 10x or better compression compared to row-store tables.

-- Create clustered columnstore index
CREATE CLUSTERED COLUMNSTORE INDEX CCI_FactSales
ON DW.FactSales
WITH (DATA_COMPRESSION = COLUMNSTORE_ARCHIVE);

-- Create nonclustered columnstore index for hybrid workloads
CREATE NONCLUSTERED COLUMNSTORE INDEX NCCI_Orders_Analytics
ON Sales.Orders (OrderDate, CustomerID, TotalAmount, Quantity)
WITH (DATA_COMPRESSION = COLUMNSTORE);

💡 Estimating Compression Savings

Before implementing compression, always estimate the potential savings using the built-in stored procedure.

Using sp_estimate_data_compression_savings

-- Estimate row compression savings
EXEC sp_estimate_data_compression_savings
    @schema_name = 'Sales',
    @object_name = 'OrderDetails',
    @index_id = NULL,
    @partition_number = NULL,
    @data_compression = 'ROW';

-- Estimate page compression savings
EXEC sp_estimate_data_compression_savings
    @schema_name = 'Sales',
    @object_name = 'OrderDetails',
    @index_id = NULL,
    @partition_number = NULL,
    @data_compression = 'PAGE';

Comprehensive Compression Analysis Script

-- Analyze all tables for compression opportunities
SET NOCOUNT ON;

CREATE TABLE #CompressionAnalysis
(
    SchemaName SYSNAME,
    TableName SYSNAME,
    IndexName SYSNAME NULL,
    IndexID INT,
    PartitionNumber INT,
    CurrentSizeKB BIGINT,
    RowCompressedSizeKB BIGINT,
    PageCompressedSizeKB BIGINT,
    RowSavingsPercent DECIMAL(5,2),
    PageSavingsPercent DECIMAL(5,2)
);

DECLARE @SchemaName SYSNAME, @TableName SYSNAME;
DECLARE @SQL NVARCHAR(MAX);

DECLARE table_cursor CURSOR FOR
SELECT
    s.name AS SchemaName,
    t.name AS TableName
FROM sys.tables t
INNER JOIN sys.schemas s ON t.schema_id = s.schema_id
WHERE t.is_ms_shipped = 0
  AND t.type = 'U'
ORDER BY s.name, t.name;

OPEN table_cursor;
FETCH NEXT FROM table_cursor INTO @SchemaName, @TableName;

WHILE @@FETCH_STATUS = 0
BEGIN
    BEGIN TRY
        -- Get current size
        DECLARE @CurrentSize BIGINT;
        SELECT @CurrentSize = SUM(ps.reserved_page_count) * 8
        FROM sys.dm_db_partition_stats ps
        INNER JOIN sys.tables t ON ps.object_id = t.object_id
        INNER JOIN sys.schemas s ON t.schema_id = s.schema_id
        WHERE s.name = @SchemaName AND t.name = @TableName;

        -- Estimate row compression
        CREATE TABLE #RowEst
        (
            object_name SYSNAME,
            schema_name SYSNAME,
            index_id INT,
            partition_number INT,
            size_with_current_compression_setting BIGINT,
            size_with_requested_compression_setting BIGINT,
            sample_size_with_current_compression_setting BIGINT,
            sample_size_with_requested_compression_setting BIGINT
        );

        INSERT INTO #RowEst
        EXEC sp_estimate_data_compression_savings
            @schema_name = @SchemaName,
            @object_name = @TableName,
            @index_id = NULL,
            @partition_number = NULL,
            @data_compression = 'ROW';

        -- Estimate page compression
        CREATE TABLE #PageEst
        (
            object_name SYSNAME,
            schema_name SYSNAME,
            index_id INT,
            partition_number INT,
            size_with_current_compression_setting BIGINT,
            size_with_requested_compression_setting BIGINT,
            sample_size_with_current_compression_setting BIGINT,
            sample_size_with_requested_compression_setting BIGINT
        );

        INSERT INTO #PageEst
        EXEC sp_estimate_data_compression_savings
            @schema_name = @SchemaName,
            @object_name = @TableName,
            @index_id = NULL,
            @partition_number = NULL,
            @data_compression = 'PAGE';

        -- Insert results
        INSERT INTO #CompressionAnalysis
        SELECT
            @SchemaName,
            @TableName,
            i.name,
            r.index_id,
            r.partition_number,
            r.size_with_current_compression_setting,
            r.size_with_requested_compression_setting,
            p.size_with_requested_compression_setting,
            CASE WHEN r.size_with_current_compression_setting > 0
                 THEN (1.0 - (CAST(r.size_with_requested_compression_setting AS DECIMAL) / r.size_with_current_compression_setting)) * 100
                 ELSE 0 END,
            CASE WHEN p.size_with_current_compression_setting > 0
                 THEN (1.0 - (CAST(p.size_with_requested_compression_setting AS DECIMAL) / p.size_with_current_compression_setting)) * 100
                 ELSE 0 END
        FROM #RowEst r
        INNER JOIN #PageEst p ON r.index_id = p.index_id AND r.partition_number = p.partition_number
        LEFT JOIN sys.indexes i ON i.object_id = OBJECT_ID(@SchemaName + '.' + @TableName) AND i.index_id = r.index_id;

        DROP TABLE #RowEst;
        DROP TABLE #PageEst;
    END TRY
    BEGIN CATCH
        -- Skip tables that can't be analyzed
        PRINT 'Error analyzing: ' + @SchemaName + '.' + @TableName;
    END CATCH

    FETCH NEXT FROM table_cursor INTO @SchemaName, @TableName;
END

CLOSE table_cursor;
DEALLOCATE table_cursor;

-- Display results sorted by potential savings
SELECT
    SchemaName,
    TableName,
    IndexName,
    CurrentSizeKB / 1024 AS CurrentSizeMB,
    RowCompressedSizeKB / 1024 AS RowCompressedMB,
    PageCompressedSizeKB / 1024 AS PageCompressedMB,
    RowSavingsPercent,
    PageSavingsPercent,
    (CurrentSizeKB - PageCompressedSizeKB) / 1024 AS PotentialSavingsMB
FROM #CompressionAnalysis
WHERE CurrentSizeKB > 10240  -- Only show tables > 10MB
ORDER BY (CurrentSizeKB - PageCompressedSizeKB) DESC;

DROP TABLE #CompressionAnalysis;

⚠️ When to Use Each Compression Type

Row Compression Best Candidates

CharacteristicWhy It Helps
Fixed-length columns (INT, CHAR, DATETIME)Maximum storage savings
OLTP workloads with frequent updatesLower CPU overhead
Tables with high insert ratesMinimal performance impact
Mixed workloadsGood balance of compression and speed

Page Compression Best Candidates

CharacteristicWhy It Helps
Read-heavy workloadsDecompression overhead acceptable
Tables with repetitive data patternsDictionary compression excels
Large tables (> 1GB)Significant storage savings
Archive/historical dataRarely accessed, maximum compression

💡 Pro Tip: Start with row compression for OLTP tables and page compression for reporting/archive tables. Monitor CPU impact before expanding.


🔧 Implementing Compression in Production

Online vs Offline Compression

-- Online index rebuild with compression (Enterprise Edition)
ALTER INDEX PK_Orders ON Sales.Orders
REBUILD WITH (
    DATA_COMPRESSION = PAGE,
    ONLINE = ON,
    MAXDOP = 4,
    SORT_IN_TEMPDB = ON
);

-- Offline compression (all editions)
ALTER TABLE Sales.OrderHistory
REBUILD WITH (DATA_COMPRESSION = PAGE);

Compression During Index Creation

-- Create compressed indexes from the start
CREATE NONCLUSTERED INDEX IX_Customer_Email
ON Sales.Customers (Email)
INCLUDE (FirstName, LastName, Phone)
WITH (
    DATA_COMPRESSION = PAGE,
    FILLFACTOR = 90,
    ONLINE = ON
);

Partition-Level Compression Strategy

-- Different compression for hot vs cold partitions
-- Hot partitions (recent data): Row compression for write performance
-- Cold partitions (historical data): Page compression for storage savings

-- Create partition function
CREATE PARTITION FUNCTION PF_OrderDate (DATETIME2)
AS RANGE RIGHT FOR VALUES (
    '2023-01-01', '2023-04-01', '2023-07-01', '2023-10-01',
    '2024-01-01', '2024-04-01', '2024-07-01', '2024-10-01',
    '2025-01-01'
);

-- Create partition scheme
CREATE PARTITION SCHEME PS_OrderDate
AS PARTITION PF_OrderDate
ALL TO ([PRIMARY]);

-- Create table with mixed compression
CREATE TABLE Sales.PartitionedOrders
(
    OrderID BIGINT NOT NULL,
    OrderDate DATETIME2 NOT NULL,
    CustomerID INT NOT NULL,
    TotalAmount DECIMAL(18,2) NOT NULL,
    CONSTRAINT PK_PartitionedOrders PRIMARY KEY CLUSTERED (OrderID, OrderDate)
)
ON PS_OrderDate (OrderDate);

-- Apply different compression to each partition
ALTER TABLE Sales.PartitionedOrders
REBUILD PARTITION = 1 WITH (DATA_COMPRESSION = PAGE);  -- Oldest, archive
ALTER TABLE Sales.PartitionedOrders
REBUILD PARTITION = 2 WITH (DATA_COMPRESSION = PAGE);
ALTER TABLE Sales.PartitionedOrders
REBUILD PARTITION = 3 WITH (DATA_COMPRESSION = PAGE);
ALTER TABLE Sales.PartitionedOrders
REBUILD PARTITION = 4 WITH (DATA_COMPRESSION = PAGE);
ALTER TABLE Sales.PartitionedOrders
REBUILD PARTITION = 5 WITH (DATA_COMPRESSION = ROW);   -- Recent, active
ALTER TABLE Sales.PartitionedOrders
REBUILD PARTITION = 6 WITH (DATA_COMPRESSION = ROW);
ALTER TABLE Sales.PartitionedOrders
REBUILD PARTITION = 7 WITH (DATA_COMPRESSION = ROW);
ALTER TABLE Sales.PartitionedOrders
REBUILD PARTITION = 8 WITH (DATA_COMPRESSION = ROW);
ALTER TABLE Sales.PartitionedOrders
REBUILD PARTITION = 9 WITH (DATA_COMPRESSION = NONE);  -- Current, hottest

📊 Monitoring Compression Performance

Check Current Compression Status

-- View compression settings for all objects
SELECT
    SCHEMA_NAME(t.schema_id) AS SchemaName,
    t.name AS TableName,
    i.name AS IndexName,
    i.type_desc AS IndexType,
    p.partition_number,
    p.data_compression_desc AS CompressionType,
    ps.reserved_page_count * 8 / 1024 AS SizeMB,
    ps.row_count AS RowCount
FROM sys.tables t
INNER JOIN sys.indexes i ON t.object_id = i.object_id
INNER JOIN sys.partitions p ON i.object_id = p.object_id AND i.index_id = p.index_id
INNER JOIN sys.dm_db_partition_stats ps ON p.partition_id = ps.partition_id
WHERE t.is_ms_shipped = 0
ORDER BY ps.reserved_page_count DESC;

Monitor Compression CPU Impact

-- Track CPU usage for compression operations
SELECT
    qs.total_worker_time / 1000 AS TotalCPU_ms,
    qs.execution_count,
    qs.total_worker_time / qs.execution_count / 1000 AS AvgCPU_ms,
    SUBSTRING(st.text, (qs.statement_start_offset/2)+1,
        ((CASE qs.statement_end_offset
            WHEN -1 THEN DATALENGTH(st.text)
            ELSE qs.statement_end_offset
        END - qs.statement_start_offset)/2) + 1) AS QueryText
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) st
WHERE st.text LIKE '%COMPRESSION%'
ORDER BY qs.total_worker_time DESC;

Compression Savings Report

-- Generate compression savings report
WITH CompressionStats AS (
    SELECT
        SCHEMA_NAME(t.schema_id) AS SchemaName,
        t.name AS TableName,
        p.data_compression_desc AS CompressionType,
        SUM(ps.reserved_page_count) * 8 / 1024 AS CurrentSizeMB,
        SUM(ps.row_count) AS TotalRows
    FROM sys.tables t
    INNER JOIN sys.partitions p ON t.object_id = p.object_id
    INNER JOIN sys.dm_db_partition_stats ps ON p.partition_id = ps.partition_id
    WHERE t.is_ms_shipped = 0
    GROUP BY SCHEMA_NAME(t.schema_id), t.name, p.data_compression_desc
)
SELECT
    SchemaName,
    TableName,
    CompressionType,
    CurrentSizeMB,
    TotalRows,
    CASE
        WHEN TotalRows > 0
        THEN CAST(CurrentSizeMB AS DECIMAL(18,2)) / TotalRows * 1000000
        ELSE 0
    END AS BytesPerRow
FROM CompressionStats
ORDER BY CurrentSizeMB DESC;

✅ Best Practices for Data Compression

Pre-Implementation Checklist

  1. Analyze workload patterns

    -- Check read vs write ratio
    SELECT
        OBJECT_NAME(s.object_id) AS TableName,
        SUM(user_seeks + user_scans + user_lookups) AS Reads,
        SUM(user_updates) AS Writes,
        CASE
            WHEN SUM(user_updates) > 0
            THEN CAST(SUM(user_seeks + user_scans + user_lookups) AS FLOAT) / SUM(user_updates)
            ELSE 999999
        END AS ReadToWriteRatio
    FROM sys.dm_db_index_usage_stats s
    WHERE database_id = DB_ID()
    GROUP BY s.object_id
    HAVING SUM(user_seeks + user_scans + user_lookups) + SUM(user_updates) > 0
    ORDER BY ReadToWriteRatio DESC;
    
  2. Test compression impact on CPU

    -- Baseline CPU before compression
    SELECT
        record_id,
        SQLProcessUtilization,
        SystemIdle,
        100 - SystemIdle - SQLProcessUtilization AS OtherProcesses
    FROM (
        SELECT
            record.value('(./Record/@id)[1]', 'int') AS record_id,
            record.value('(./Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]', 'int') AS SystemIdle,
            record.value('(./Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]', 'int') AS SQLProcessUtilization
        FROM (
            SELECT CONVERT(XML, record) AS record
            FROM sys.dm_os_ring_buffers
            WHERE ring_buffer_type = N'RING_BUFFER_SCHEDULER_MONITOR'
              AND record LIKE '%<SystemHealth>%'
        ) AS x
    ) AS y
    ORDER BY record_id DESC;
    
  3. Schedule compression during maintenance windows

    -- Automated compression job
    DECLARE @StartTime TIME = '02:00:00';  -- Start at 2 AM
    DECLARE @EndTime TIME = '06:00:00';    -- End by 6 AM
    
    IF CAST(GETDATE() AS TIME) BETWEEN @StartTime AND @EndTime
    BEGIN
        ALTER TABLE Sales.OrderHistory
        REBUILD WITH (DATA_COMPRESSION = PAGE, ONLINE = ON);
    END
    

Compression Decision Matrix

ScenarioRecommended CompressionRationale
High-volume OLTPRowMinimal CPU overhead
Data warehouse fact tablesPage or ColumnstoreMaximum compression
Archive tablesPageRarely accessed
Temp tablesNone or RowShort-lived data
Indexes on OLTPRowBalance of performance
Reporting indexesPageRead-heavy access

🚀 Advanced Compression Scenarios

Backup Compression (Separate from Data Compression)

-- Enable backup compression at server level
EXEC sp_configure 'backup compression default', 1;
RECONFIGURE;

-- Compress individual backup
BACKUP DATABASE AdventureWorks
TO DISK = 'D:\Backups\AdventureWorks.bak'
WITH COMPRESSION, STATS = 10;

-- Compare compressed vs uncompressed backup size
SELECT
    database_name,
    backup_start_date,
    backup_size / 1024 / 1024 AS BackupSizeMB,
    compressed_backup_size / 1024 / 1024 AS CompressedSizeMB,
    CAST(100.0 - (compressed_backup_size * 100.0 / backup_size) AS DECIMAL(5,2)) AS CompressionPercent
FROM msdb.dbo.backupset
WHERE database_name = 'AdventureWorks'
ORDER BY backup_start_date DESC;

Unicode Compression (SQL Server 2019+)

-- Unicode compression is automatic in page compression
-- Verify Unicode compression is working
SELECT
    OBJECT_NAME(object_id) AS TableName,
    name AS ColumnName,
    system_type_name,
    max_length,
    collation_name
FROM sys.dm_exec_describe_first_result_set(
    N'SELECT * FROM Sales.Customers', NULL, 0
)
WHERE collation_name IS NOT NULL;

📈 Real-World Compression Results

Case Study: E-Commerce Order History

MetricBeforeAfter (Page)Improvement
Table Size450 GB125 GB72% reduction
Backup Size180 GB45 GB75% reduction
Backup Time4.5 hours1.2 hours73% faster
I/O per Query125,00042,00066% reduction
CPU ImpactBaseline+8%Acceptable

⚠️ Warning: Always test compression in a non-production environment first. Monitor CPU utilization closely during initial implementation.


🔧 Troubleshooting Compression Issues

Common Problems and Solutions

-- Issue: Compression taking too long
-- Solution: Use MAXDOP to control parallelism
ALTER INDEX ALL ON Sales.LargeTable
REBUILD WITH (
    DATA_COMPRESSION = PAGE,
    ONLINE = ON,
    MAXDOP = 2  -- Limit to 2 CPUs
);

-- Issue: High CPU during business hours
-- Solution: Use Resource Governor
CREATE RESOURCE POOL CompressionPool
WITH (MAX_CPU_PERCENT = 20);

CREATE WORKLOAD GROUP CompressionGroup
WITH (REQUEST_MAX_CPU_TIME_SEC = 0)
USING CompressionPool;

-- Issue: Log file growing during compression
-- Solution: Rebuild in batches
DECLARE @PartitionNumber INT = 1;
WHILE @PartitionNumber <= 12
BEGIN
    ALTER TABLE Sales.PartitionedTable
    REBUILD PARTITION = @PartitionNumber
    WITH (DATA_COMPRESSION = PAGE);

    -- Allow log backup between partitions
    WAITFOR DELAY '00:01:00';

    SET @PartitionNumber = @PartitionNumber + 1;
END

Conclusion

SQL Server data compression is a powerful feature that can dramatically reduce storage costs and improve I/O performance. The key to success is:

  • Analyze before implementing - Use sp_estimate_data_compression_savings
  • Choose the right type - Row for OLTP, Page for analytics/archive
  • Monitor CPU impact - Compression trades CPU for I/O
  • Implement incrementally - Start with largest, least-active tables
  • Combine with partitioning - Different compression for hot vs cold data

With proper planning and implementation, you can achieve 50-80% storage savings while maintaining or even improving query performance.


About the Author

Jamaurice Holt is a Cloud Database Expert and AWS Solutions Architect with over 10 years of experience in database optimization, high availability, and cloud migrations. Specializing in SQL Server, SQL optimization, and enterprise database solutions.

Related Articles

Need help with SQL Server optimization? Contact us for expert database consulting.

Continue Reading

Leadership

Leadership in Motion: Technical Leadership Lessons from Database Operations

How a decade of database administration shaped my approach to technical leadership — lessons on incident response, mentoring engineers,...

July 10, 2024 · Read article →
Performance

Database Performance Tuning: Lessons from the Trenches

Real-world strategies for optimizing database performance in high-traffic production environments — query plan analysis, indexing, caching,...

August 22, 2024 · Read article →
AI

AI Governance and Hardware: Why Sovereign Refusal Belongs in Silicon

Software guardrails can be prompted around. The real frontier in AI safety is anchoring refusal below the model — at the hardware level. A...

September 15, 2024 · Read article →