Storage Myths: Dedupe for Databases
November 26, 2013 15 Comments
Storage for DBAs: Data deduplication – or “dedupe” – is a technology which falls under the umbrella of data reduction, i.e. reducing the amount of capacity required to store data. In very simple terms it involves looking for repeating patterns and replacing them with a marker: as long as the marker requires less space than the pattern it replaces, you have achieved a reduction in capacity. Deduplication can happen anywhere: on storage, in memory, over networks, even in database design – for example, the standard database star or snowflake schema. However, in this article we’re going to stick to talking about dedupe on storage, because this is where I believe there is a myth that needs debunking: databases are not a great use case for dedupe.
Deduplication Basics: Inline or Post-Process
If you are using data deduplication either through a storage platform or via software on the host layer, you have two basic choices: you can deduplicate it at the time that it is written (known as inline dedupe) or allow it to arrive and then dedupe it at your leisure in some transparent manner (known as post-process dedupe). Inline dedupe affects the time taken to complete every write, directly affecting I/O performance. The benefit of post-process dedupe therefore appears to be that it does not affect performance – but think again: post-process dedupe first requires data to be written to storage, then read back out into the dedupe algorithm, before being written to storage again in its deduped format – thus magnifying the amount of I/O traffic and indirectly affecting I/O performance. In addition, post-process dedupe requires more available capacity to provide room for staging the inbound data prior to dedupe.
Deduplication Basics: (Block) Size Matters
In most storage systems dedupe takes place at a defined block size, whereby each block is hashed to produce a unique key before being compared with a master lookup table containing all known hash keys. If the newly-generated key already exists in the lookup table, the block is a duplicate and does not need to be stored again. The block size is therefore pretty important, because the smaller the granularity, the higher the chances of finding a duplicate:
In the picture you can see that the pattern “1234”repeats twice over a total of 16 digits. With an 8-digit block size (the lower line) this repeat is not picked up, since the second half of the 8-digit pattern does not repeat. However, by reducing the block size to 4 digits (the upper line) we can now get a match on our unique key, meaning that the “1234” pattern only needs to be stored once.
This sounds like great news, let’s just choose a really small block size, right? But no, nothing comes without a price – and in this case the price comes in the size of the hashing lookup table. This table, which contains one key for every unique block, must range in size from containing just one entry (the “ideal” scenario where all data is duplicated) to having one entry for each block (the worst case scenario where every block is unique). By making the block size smaller, we are inversely increasing the maximum size of the hashing table: half the block size means double the potential number of hash entries.
Hash Abuse
Why do we care about having more hash entries? There are a few reasons. First there is the additional storage overhead: if your data is relatively free of duplication (or the block size does not allow duplicates to be detected) then not only will you fail to reclaim any space but you may end up using extra space to store all of the unique keys associated with each block. This is clearly not a great outcome when using a technology designed to reduce the footprint of your data. Secondly, the more hash entries you have, the more entries you need to scan through when comparing freshly-hashed blocks during writes or locating existing blocks during reads. In other words, the more of a performance overhead you will suffer in order to read your data and (in the case of inline dedupe) write it.
If this is sounding familiar to you, it’s because the hash data is effectively a database in which storage metadata is stored and retrieved. Just like any database the performance will be dictated by the volume of data as well as the compute resource used to manipulate it, which is why many vendors choose to store this metadata in DRAM. Keeping the data in memory brings certain performance benefits, but with the price of volatility: changes in memory will be lost if the power is interrupted, so regular checkpoints are required to persistent storage. Even then, battery backup is often required, because the loss of even one hash key means data corruption. If you are going to replace your data with markers from a lookup table, you absolutely cannot afford to lose that lookup table, or there will be no coming back.
Database Deduplication – Don’t Be Duped
Now that we know what dedupe is all about, let’s attempt to apply it to databases and see what happens. You may be considering the use of dedupe technology with a database system, or you may simply be considering the use of one of a number of recent storage products that have inline dedupe in place as an “always on” option, i.e. you cannot turn it off regardless of whether it helps or hinders. The vendor may make all sorts of claims about the possibilities of dedupe, but how much benefit will you actually see?
Let’s consider the different components of a database environment in the context of duplication:
- Oracle datafiles contain data blocks which have block headers at the start of the block. These contain numbers which are unique for each datafile, making deduplication impossible at the database block size. In addition, the end of each block contains a tailcheck section which features a number generated using data such as the SCN, so even if the block were divided into two the second half would offer limited opportunity for dedupe while the first half would offer none.
- Even if you were able to break down Oracle blocks into small enough chunks to make dedupe realistic, any duplication of data is really a massive warning about your database design: normalise your data! Also, consider features like index key compression which are part of the Enterprise Edition license.
- Most Oracle installations have multiplexed copies of important files like online redo logs and controlfiles. These files are so important that Oracle synchronously maintains multiple copies in order to ensure against data loss. If your storage system is deduplicating these copies, this is a bad thing – particularly if it’s an always on feature that gives you no option.
- While unallocated space (e.g. in an ASM diskgroup) might appear to offer the potential for dedupe, this is actually a problem which you should solve using another storage technology: thin provisioning.
- You may have copies of datafiles residing on the same storage as production, which therefore allow large-scale deduplication to take place; perhaps they are used as backups or test/development environments. However, in the latter case, test/dev environments are a use case for space-efficient snapshots rather than dedupe. And if you are keeping your backups on the same storage system as your production data, well… good luck to you. There is nothing more for you here.
- Maybe we aren’t talking about production data at all. You have a large storage array which contains multiple copies of your database for use with test/dev environments – and thus large portions of the data are duplicated. Bingo! The perfect use case for storage dedupe, right? Wrong. Database-level problems require database-level solutions, not storage-level workarounds. Get yourself some licenses for Delphix and you won’t look back.
To conclude, while dedupe is great in use cases like VDI, it offers very limited benefit in database environments while potentially making performance worse. That in itself is worrying, but what I really see as a problem is the way that certain storage vendors appear to be selling their capacity based on assumed levels of dedupe, i.e. “Sure we are only giving you X terabytes of storage for Y price, but actually you’ll get 10:1 dedupe which means the price is really ten times lower!”
Sizing should be based on facts, not assumptions. Just like in the real world, nothings comes for free in I.T. – and we’ve all learnt that the hard way at some point. Don’t be duped.
Hi Flash,
Glad to see folks pointing out that non-duplicate data can’t be de-duped. A tired topic for sure!
May I call out one of your points? You state:
“[..]Secondly, the more hash entries you have, the more entries you need to scan through when comparing freshly-hashed blocks during writes or locating existing blocks during reads. ”
I need to point out that nobody with computer science skills would implement a scan for locating such structures. Even in a case of zero de-duplication, at 100% capacity, there would be no reason to implement a metadata scan to service a read from a particular offset in a LUN. Content aware writes don’t result in a scan either. Compute a hash and store the hashes in a hash. If not a perfect hash then, sure, there are collisions but hash collision resolution options abound. One can implement a re-hash (sub-hash) or a hash-btree or any other such collision resolution. We aren’t talking Oracle SGA db block hash chains here (which is a hash and *chain-walk* for collision resolution).
I’m not suggesting metadata is free. However, once metadata is larger than processor cache on the storage processor then the next cliff is DRAM. Unless someone makes an array with a storage processor based on a COMA architecture then we are talking about DRAM metadata–but not scanning DRAM.
Hey Kevin, thanks for keeping me honest. The term “scan” is incorrect – I hold my hands up. In fact, in trying (but probably failing) to make this article short enough that people will actually read it, I’ve made a number of simplifications which someone of your experience will immediately spot. The other obvious one that you’ve mentioned is hash collisions – a topic which I find fascinating but which doesn’t easily lead itself to bite-size reading. I’ve chosen to ignore it entirely for now, but may revisit it at a later date.
I guess the main point I want to get across is that if you reduce your dedupe block size you’ll (potentially) end up with more metadata, which has consequences you need to consider. And yes, you are absolutely right – it is a tired topic, I’d love it if someone could put it to bed once and for all. Yet I still hear of vendors talking about usable capacity *after* dedupe… as if it’s guaranteed!
Banking on a “usable capacity after dedup” is as wrong as saying stupid things like “average compression is up to 10x” when ignoramuses speak of technology like Oracle Hybrid Columnar Compression. It’s just wrong because one can’t presume duplication across the board. The word “if” needs to be used more often in our industry.
The only product out there I’m aware of that has a sufficiently small dedup window (as to effect deduplication on database blocks) would be Pure Storage. I don’t know if that plays as a strength or a metadata overload for them. I’m just not willing to sit here and suggest that all products other than those of the company I currently work for are total garbage. I don’t do that even with the most shamelessly over-marketed products out there–Oracle’s “Engineered Systems.”
“The word “if” needs to be used more often in our industry”
Amen to that.
Incidentally, you mentioned Pure Storage, but I’m not planning on specifically commenting on their product for the reasons I’ve outlined here:
https://flashdba.com/about/competition/
…I have the same creed
…a topic which I find fascinating but which doesn’t easily lead itself to bite-size reading. I’ve chosen to ignore it entirely for now, but may revisit it at a later date…
I’m looking froward to a topic on this, not byte size from either of you…
G
Thanks, flash – very useful information and rather timely!
Thanks you! Finally someone wrote blog entry about it, i wanted to do it years ago, because I was tried of vendors marketing… now I”ll just pass the link 😉
.. and yes i can also state that my tests from the past showed more random service times [ms] when it came down to I/O latency on deduplicating storage (sometimes even as bad as spikes up to 2-3 ms more with the feature enabled)..
You could also state that encrypted (TDE) and/or compressed (ACO) DBs won’t benefit from it too, so why pay for the same stuff twice.
-J.
Reblogged this on Easy Oracle DBA and commented:
One of the most interesting needs on De-duplication from a DBA’s perspective.
We are mostly talking about Flash Arrays here but we should also probably mention the possible impact of de-dupe for DBMS’s if you happen to be using Disk based or Hybrid Arrays with Flash and Disks.
This is the performance angle. Now even assuming that de-dupe has no impact on the Arrays ingestion performance and that is a tricky assumption you also have to factor in the effect on read performance.
For reads de-dupe using spinning drives becomes problematical because if de-dupe is at all effective it will also have the effect of randomising the block layout on the backend storage system. What was a nice contiguous data object becomes a data object which in practice is scattered all over the backend storage platform and the more effective the de-dupe is the more scattered the blocks become.
While this is much less of a problem for most flash arrays turning what should be sequential reads into random reads on a spinning disk based Array at best is going to put more pressure on the caches in the controllers and at worst is going to increase IO wait time.
Pingback: Scaling Deduplication – Sepaton’s Big Data Backup appliance | The Pseudo Random Bit Bucket
Pingback: De-Duplication Fears From Those That Don’t Offer It | Architecting Beyond Twenty Fourteen
Hi Flash,
Have you done any testing showing Oracle’s performance with and without deduplication?
I have – and deduplication does tend to have a measurable (and often unwanted) effect on Oracle database performance. But I’m afraid I don’t have data I can share. However, you should test it yourself and see how your specific systems are affected.