Hybrid Columnar Compression
Disclaimer: This post was written in 2012 and covers the Exadata X2 model. I have no plans to update it to cover the new X3 system, so please be aware that the information below may no longer be up to date.
Exadata Hybrid Columnar Compression (HCC) is a feature introduced in Oracle Database 11g Release 2 which Oracle only allows to be used on Exadata storage, Pillar Axiom storage systems and Sun ZFS storage appliances. Oracle claims that HCC allows “10x to 50x compression for most datasets” and can give 3x to 5x reductions in storage footprint over EMC and NetApp solutions.
In HCC, tables are stored in column-order to allow for better compression and deduplication of data. Traditionally, column-order databases have had performance issues because of the expensive nature of reading a single-row; to fetch the value of each column in a single row the entire table would need to be scanned. Exadata HCC attempts to alleviate this issue by splitting tables up into chunks known as compression units. Each compression unit contains a subset of the rows – and is arranged in column order. Thus a single row lookup would only require an entire compression unit to be read rather than the entire table. Note however that updates result in the target rows being migrated out of HCC compression, which can lead to unpredictable performance during subsequent scans.
Two types of HCC are available: Warehouse Compression and Archive Compression. Warehouse compression is aimed at providing a reduction in storage footprint combined with a performance increase (gained through the ability for Smart Scan queries to decompress HCC blocks on the storage cells), whilst Archive Compression is intended as an Information Lifecycle Management (ILM) solution to be used on data that is rarely accessed. Consequently Archive Compression gives better compression ratios at the expense of performance.
In order to load data into HCC format loads must be in bulk and use direct path i.e. bypass the buffer cache. In addition to the two types of HCC there are two levels, HIGH and LOW. The HIGH level offers better compression ratios but at the expense of increased data load times.
Comment: My experience of HCC has been that some customers have failed to understand its scope and ideal use-case. HCC can give very good compression ratios on table data but it is not designed to be used with data which is frequently modified. Additionally, it needs to be understood that HCC only applies to table segments, not index or LOG segments. During my time at Oracle I encountered a number of customers who apparently came out of the sales cycle believing that HCC would take their (for example) 30TB database and compress it to 3TB, saving 27TB of space. In reality it would have been more likely that only 50% of the database was table segments. If a portion of this table data was then regularly updated this would result in only around 10TB of data being eligible for HCC. Applying the 10x compression claim to that gives a total saving of 8TB… That’s a significant difference in impact.
These are not real figures, obviously I cannot talk about actual customers that I worked with. But the point is, HCC is a technology which has benefits and drawbacks, like all technologies I suppose. It cannot be suitable for every customer’s situation, so it needs to be understood before Exadata machines are bought based on its predicted impact. And that doesn’t mean just looking at the sales and marketing slides…
Note that there is a patch for the DBMS_COMPRESSION utility which allows the compression advisor to test HCC on real data (last time I checked it was patch 8896202). I strongly recommend using this with as much real data as possible before sizing any Exadata machine or other HCC-enabled storage.
If you are interested in what real-world customers are seeing when they attempt to use HCC, I recommend reading this blog series by Ofir Manor.