Make Big Data small with data compression in BLU Acceleration

Amit Patel, Data Warehousing Expert

by Amit Patel, Program Director IBM Data Warehouse Marketing

In this post I’d like to discuss the importance of data compression for analytical systems and how BLU Acceleration in-memory technology delivers 10x-20x storage savings to maximize both business value and query performance.

Leverage storage savings and I/O performance enhancements

A key attribute of Big Data is an explosive growth in data volumes. Enterprise storage can be the most expensive component of a data warehouse. So, reducing your data footprint with the help of compression is a no-brainer when it comes to being efficient with storage. Yet, cost savings from storage efficiency is only one side of the coin. The other is performance improvement. Spinning disk is often the weak link when it comes to the performance of analytic queries, so reducing Input/Output (I/O) is a well-accepted method to accelerate performance. By keeping data compressed, I/O can be significantly reduced. (Of course, BLU Acceleration in-memory technology does much more to reduce I/O by taking full advantage of the CPU cache and the system memory, along with techniques like data skipping.)

Organize data in columns and compress more common values more tightly
BLU Acceleration takes data compression to the next level. To begin with, BLU Acceleration organizes data based on columns instead of rows. Columns tend to have similar and repeating data patterns (e.g. States, last names, zip codes, etc.) that compress better. This technology also further optimizes compression based on the frequency of data, such that more commonly repeating data values are compressed more tightly. For example, a more common last name like “Smith” will be compressed more tightly than uncommon last names. The entire column value is encoded and packed as tightly as possible in a collection of bits to best fit in the register width of the CPU.

Preserve the order of data and analyze while compressed
A key aspect of compression in BLU Acceleration is that it is order-preserving. This allows a broad range of analytical operations (e.g. data comparisons based on Predicates, joins, aggregates) to be performed on compressed data. When you can operate on compressed data, the CPU does not have to spend the cycles to decompress the data until it’s time to materialize the results and present them to the user. This is the reason we refer to compression in BLU Acceleration as “Actionable Compression” because the data is actionable for analytics while it’s still compressed. Compression in BLU Acceleration is automatic and always on.

Eliminate the need for indexes to save on storage and maintenance
Can it get any better? Yes, indeed. BLU Acceleration is designed for ease of use and load and go simplicity. This means there is no longer a need for building and maintaining indexes. In traditional data warehouse systems, up to half of the storage can be used by indexes. BLU Acceleration in-memory technology eliminates the need for these indexes and this can provide massive additional savings in storage and maintenance.

Ask our clients
There is no better validation than the success our clients are seeing first hand with BLU Acceleration. Clients commonly see 10x compression with this technology, where some specific tables compress 20x or more. Here are a few client quotes:

Client quotes on compression in BLU Acceleration

“The DB2 Cancun Release with BLU Acceleration gave us astounding compression results. In one of our largest customer databases, we saw compression ranging from 7x to 20x as compared to the uncompressed tables. Our largest and most critical operations table saw a compression rate of 11x! This compression improvement is only on table data. With BLU tables we don’t need all the indices either; this provides even more storage savings. These amazing results will save us a great deal of space on disk and in-memory.”

Mike Petkau – Director of Database Architecture & Administration, TMW Systems (A Trimble Company)

“Using DB2 10.5 with BLU Acceleration, our storage consumption went down by about 10x compared to our storage requirements for uncompressed tables and indexes.

Kent Collins, Database Solutions Architect, BNSF Railway

“10x. That’s how much smaller our tables are with BLU Acceleration. Moreover, I don’t have to create indexes or aggregates, or partition the data, among other things. When I take that into account in our mixed table-type environment, that number becomes 10-25x.”

Andrew Juarez, Lead SAP Basis and DBA, Coca Cola Bottling Company Consolidated

To learn more about Actionable Compression and other technical innovations in BLU Acceleration in-memory technology, visit the Get Technical section or learn about next generation capabilities in BLU Acceleration in-memory technology.