Blog

Use BLU Acceleration to speed up analytics on z Systems on Linux

By Michael Kwok and Peter Kokosielis 

DB2 with BLU Acceleration is a native column-oriented analytic engine capable of high compression rates and high-speed processing on compressed data. Simply put, when it comes to analytics and reporting it’s fast, easy to use (load and go) and hardware-efficient. BLU Acceleration became available for Linux on z Systems beginning with DB2 10.5, fix pack 5.

For customers who would like to run analytics on z Systems data in Linux, BLU Acceleration offers a great path forward. Whether you want to maximize your investment in a new or existing z System, it is not hard to imagine that BLU Acceleration is the preferred solution. Why? In our testing, it delivered a 62x improvement in query processing over a traditional row oriented database running on Linux on z System and the latest IBM z13 system (see below). It also delivers a lower TCO (total cost of ownership) along with these fast analytics.

DB2 with BLU Acceleration is new to Linux on z Systems, but keep in mind that it is a proven technology on Linux for IBM Power Systems and x86 servers. It works pretty much out of the box. Just set the registry variable to DB2_WORKLOAD=ANALYTICS and let the magic happen!

BLU Acceleration support for Linux on z Systems is included as a feature in DB2 for Linux, UNIX and Windows. There is a 99%+ common codebase and a very similar look and feel and skill set as on other platforms. It is supported on both RHEL and SLES, where RHEL 7 or SLES 11 SP3 is preferred. It also has the same 98%+ Oracle compatibility in terms of PL/SQL and data types, which is continually enhanced.

Let’s highlight some benefits that BLU Acceleration brings to the z Systems:

Super Compression

The biggest impact of BLU Acceleration is its ability to compress data at much higher rates compared to traditional row-oriented databases. Fundamentally this is because of the nature of native column organization and the principle of “like data compresses better than unlike data”. If you think about it, a column would represent a particular data type such as an item name, or an item price. All the values of the column would then be of the same data type, typically a string or a number and may even be further constrained by range (e.g., prices may be within a range of 9.99 to 19.99) possibly with many duplicates, or similar looking pieces of data.

Contrast this to trying to compress a row which can have many different data types, patterns and an arbitrarily large number of columns. All of this makes compression more difficult in a row-oriented database. It is not uncommon to see a compression ratio of 10 – 20x or even better in actual customer scenarios using BLU Acceleration with column-organized data. A side note: BLU Acceleration can compress a column values as low as 1 bit!

Better Memory Utilization

BLU Acceleration stores data in its bufferpool (i.e., memory) in compressed format. In the section above, we’ve seen how well data can be compressed with BLU Acceleration. In other words, more data can now fit into the same amount of memory. Also, due to the nature of columnar storage, only the relevant column data (rather than the entire row) needs to be in memory. These factors significantly increase our data density in memory.

A state-of-the-art scan-friendly cache replacement algorithm optimally and dynamically determines at query runtime which data should reside in memory. Together with the prefetching algorithm, BLU Acceleration can dramatically reduce CPU memory requirements. All of these technologies combine to give you much better memory utilization. In simple terms, you get more out of your existing resources!

CPU Efficiency

CPU Acceleration with its multi-core parallelism provides improved performance and better utilization of available CPU resources. Data is operated on in chunks and vectorized. That is, more data is processed per cycle than if it had to be operated on piece by piece. This can be significant in environments such as z Systems running multiple virtual machines. CPU capacity can be freed up so that virtual machines can perform other tasks.

Actionable compression

We said that column data is compressed in the bufferpool or memory. For some operations like group-by, joins and predicate evaluations, BLU Acceleration can process data in the compressed format—no decompression is required. This is known as “actionable compression” and it significantly saves CPU cycles and further improves CPU efficiency!

Last but not least, we ran an internal analytic benchmark on BLU Acceleration on Linux on z Systems. The benchmark showed that the average workload elapsed time improved by 54 times on BLU Acceleration when compared to the traditional row-oriented DB2 for Linux on z database on the exact same Linux on zEnterprise EC12 configuration. By upgrading to a z13 system, the improvement was 62x over the baseline! This is a stunning speed up!

 

DB2 for zLinux

As performance experts, we always feel that no matter how good things may look, there is always room for improvement. Our research and development continue to look for opportunities to make BLU Acceleration run even faster, more efficient and with more scalability on z Systems. Our journey has just begun! Join us and let us know what you think.

 

About Michael,

Michael Kwok 86 x 109 Michael Kwok is the Program Director and Architect of Analytic Warehouse Performance (dashDB, BLU Acceleration and DB2 Warehouse) in the IBM Analytics Platform.  He focuses on performance in the analytic warehouse space, helping to ensures that the products continue to deliver the best performance. He works extensively with customers and provides technical consultation.  He is also one of the authors of the Best Practices paper on “Optimizing Analytic Workloads using DB2 10.5 with BLU Acceleration.” Michael Kwok holds a Ph.D. degree in the area of scalability analysis of distributed systems.

About Peter,

Peter Kokosielus Peter is an advisory software developer in the IBM Analytics organization. As a 15 year veteran of IBM, his interests are in adapting, optimizing and analyzing performance of DB2 on various hardware platforms, virtualized platforms, hardware accelerators, and new and upcoming technology.

0 comments