Understanding Disk: Caching and Tiering



When I was a child, about four or five years old, my dad told me a joke. It wasn’t a very funny joke, but it stuck in my mind because of what happened next. The joke went like this:

Dad: “What’s big at the bottom, small at the top and has ears?”

Me: “I don’t know?”

Dad: “A mountain!”

Me: “Er…<puzzled>…  What about the ears?”

Dad: (Triumphantly) “Haven’t you heard of mountaineers?!”

So as I say, not very funny. But, by a twist of fate, the following week at primary school my teacher happened to say, “Now then children, does anybody know any jokes they’d like to share?”. My hand shot up so fast that I was immediately given the chance to bring the house down with my new comedy routine. “What’s big at the bottom, small at the top and has ears?” I said, with barely repressed glee. “I don’t know”, murmured the teacher and other children expectantly. “A mountain!”, I replied.

Silence. Awkwardness. Tumbleweed. Someone may have said “Duuh!” under their breath. Then the teacher looked faintly annoyed and said, “That’s not really how a joke works…” before waiving away my attempts to explain and moving on to hear someone else’s (successful and funny) joke. Why had it been such a disaster? The joke had worked perfectly on the previous occasion, so why didn’t it work this time?

There are two reasons I tell this story: firstly because, as you can probably tell, I am scarred for life by what happened. And secondly, because it highlights what happens when you assume you can predict the unpredictable.*

Gambling With Performance

So far in this mini-series on Understanding Disk we’ve covered the design of hard drives, their mechanical limitations and some of the compromises that have to be made in order to achieve acceptable performance. The topic of this post is more about bandaids; the sticking plasters that users of disk arrays have to employ to try and cover up their performance problems. Or as my boss likes to call it, lipstick on a pig.

roulette-wheelIf you currently use an enterprise disk array the chances are it has some sort of DRAM cache within the array. Blocks stored in this cache can be read at a much lower latency than those residing only on disk, because the operation avoids paying the price of seek time and rotational latency. If the cache is battery-backed, it can also be used to accelerate writes too. But DRAM caches in storage area networks are notoriously expensive in relation to their size, which is often significantly smaller than the size of the active data set. For this reason, many array vendors allow you to use SSDs as an additional layer of slower (but higher capacity) cache.

Another common approach to masking the performance of disk arrays is tiering. This is where different layers of performance are identified (e.g. SATA disk, fibre-channel disk, SSD, etc) and data moved about according to its performance needs. Tiering can be performed manually, which requires a lot of management overhead, or automatically by software – perhaps the most well-known example being EMC’s Fully Automated Storage Tiering (or “FAST”) product. Unlike caching, which creates a copy of the data somewhere temporary, tiering involves permanently relocating the persistent location of the data. This relocation has a performance penalty, particularly if data is being moved frequently. Moreover, some automatic tiering solutions can take 24 hours to respond to changes in access patterns – now that’s what I call bad latency.

The Best Predictor of Future Behaviour is Past Behaviour

The problem with automatic tiering is that, just like caching, it relies on past behaviour to predict the future. That principle works well in psychology, but isn’t always as successful in computing. It might be acceptable if your workload is consistent and predictable, but what happens when you run your end of month reporting? What happens when you want to run a large ad-hoc query? What happens when you tell a joke about mountains and expect everyone to ask “but what about the ears”? You end up looking pretty stupid, I can tell you.

las-vegasI have no problem with caching or tiering in principle. After all, every computer system uses cache in multiple places: your CPUs probably have two or three levels of cache, your server is probably stuffed with DRAM and your Oracle database most likely has a large block buffer cache. What’s more, in my day job I have a lot of fun helping customers overcome the performance of nasty old spinning disk arrays using Violin’s Maestro memory services product.

But ultimately, caching and tiering are bandaids. They reduce the probability of horrible disk latency but they cannot eliminate it. And like a gambler on a winning streak, if you become more accustomed to faster access times and put more of your data at risk, the impact when you strike out is felt much more deeply. The more you bet, the more you have to lose.

Shifting the Odds in Your Favour

I have a customer in the finance industry who doesn’t care (within reason) what latency their database sees from storage. All they care about is that their end users see the same consistent and sustained performance. It doesn’t have to be lightning fast, but it must not, ever, feel slower than “normal”. As soon as access times increase, their users’ perception of the system’s performance suffers… and the users abandon them to use rival products.

poker-gameThey considered high-end storage arrays but performance was woefully unpredictable, no matter how much cache and SSD they used. They considered Oracle Exadata but ruled it out because Exadata Flash Cache is still a cache – at some point a cache miss will mean fetching data from horrible, spinning disk. Now they use all flash arrays, because the word “all” means their data is always on flash: no gambling with performance.

Caching and tiering will always have some sort of place in the storage world. But never forget that you cannot always win – at some point (normally the worst possible time) you will need to access data from the slowest media used by your platform. Which is why I like all flash arrays: you have a 100% chance of your data being on flash. If I’m forced to gamble with performance, those are the odds I prefer…

* I know. It’s a tenuous excuse for telling this story, but on the bright side I feel a lot better for sharing it with you.