SQL Data Compression is no Scrooge

OK, so it’s one week to Christmas and I have to resort to a holiday theme of some sort. Things are a bit laid back this year. Our family has resorted to the Secret Santa thing to reduce the number of presents under the tree so they can actually fit under the tree itself. Times are tough but maybe that makes us a bit more sensible in our priorities. Reduce, Reuse, Recycle. That applies to Data Storage too. (Flagrant contrived segue…). Let’s take a look…

Data Storage has been relatively cheap for a long time now, but that has lulled us into a false sense of security making our disk volumes a “dumping ground” for anything and everything you could possibly imagine. Wouldn’t it be great if we could reduce the amount of storage we use without having to pay for it with poor performance? If you have ever used Windows compression (“Compress Drive to save space”), you will have made a mental note to never do that again! That’s because it is an unintelligent compression strategy. All files are compressed to save space but every time they are accessed they have to be decompressed. This takes time and produces horrendous performance as you watch your machine become severely “IO bound”. Do not try this at home.

SQL Server 2008 Data Compression is an intelligent compression strategy. It saves space and delivers better performance through less IO. (By the way, SQL Server 2008 also includes Backup Compression which is a completely separate technology – I’ll discuss that another time. Yes, both technologies can be used together).

Microsoft started down this road with SQL Server 2005 SP2 with the introduction of Vardecimal compression. Previously a column with data type decimal(19,4) would always take up a fixed 19 bytes regardless of the data it contained. Setting the table option “vardecimal storage format” to “on” would reduce the storage depending on the data value stored in the column. For instance, a value of 4.00 would only need 5 bytes of storage instead of 19 bytes. This might be significant in a Data Warehouse and was a good start in the right direction. This table option now defaults to “on” in 2008.

Data Compression in 2008 is different but follows a similar strategy. There are two levels:  “Row Compression” and “Page Compression”. Row Compression is considered a light-weight compression with each row being handled individually. In this mode, all columns are effectively stored as variable data (even traditional fixed length columns) so that space is optimized based on the data contained in each column. Things like leading zeroes and trailing spaces are removed, meaning spaces, zeroes and NULL take up no space at all. There is an overhead per column, but even that is minimized to 4-bits versus 2 bytes for classic variable length columns. The vardecimal option is also active in this mode saving more space. The effect is that less space is required, less data pages and therefore less IO. There is no formal decompression phase as the data pages are read into memory as is. The database engine has the intelligence to understand the new data formats.

Page Compression includes all the features of Row Compression but adds some significant intelligence to the compression strategy. In this mode, the database engine will look for similar column values across multiple rows on a single data page. Two page-compression algorithms are used to try to save space:  Column-Prefix and Dictionary algorithms. Column-Prefix will look for common starting values to a column in multiple rows. If it finds this condition then the prefix will only be stored once to save space. If the algorithm cannot save space, it will do nothing. The Dictionary algorithm is run after Column-Prefix and looks for repetitive values within columns. If found, the value can be replaced by a single byte integer symbol. Up to 256 values can be in the dictionary per page. Again, the algorithm will only act if net space can be saved on the data page.

One word of note, there is no compression for “Off Row” LOB data so applications that use lots of large object data columns, like Sharepoint, may not benefit from this type of compression. However, at the SQL PASS Summit in Seattle, the Microsoft SQLCAT team (sqlcat.com) let us know that NASDAQ realized 40-60% better throughput for its applications using SQL Server 2008 Page Compression. Another customer from the Healthcare industry only saw 1% better throughput but realized significant space savings. As they kept telling us: “Your mileage may vary”.

The good news is, once compression is enabled on a table, all the intelligence is handled by SQL Server, your application code does not have to change and you may see immediate space savings with better performance. The bad news is familiar: this is an Enterprise Edition feature so you have to pay for it. However, one customer, ServiceU, estimated it would save between $25-$50K in its first year using SQL Server 2008 Data Compression.

Now, that news will even make Ebeneezer Scrooge happy. (Any similarity to actual persons or your boss is purely coincidental!)

Happy Holidays!

Brian

Recent Posts:

Kilimanjaro or Krakatoa East of Java?

Sharepoint Integrated mode – worth the hassle?

Microsoft goes for Dundas – the one that got away?

Dear Diary - SQL PASS in Seattle – Day 3

Related:

Copyright © 2008 IDG Communications, Inc.

The 10 most powerful companies in enterprise networking 2022