Understanding Disk: Caching and Tiering



When I was a child, about four or five years old, my dad told me a joke. It wasn’t a very funny joke, but it stuck in my mind because of what happened next. The joke went like this:

Dad: “What’s big at the bottom, small at the top and has ears?”

Me: “I don’t know?”

Dad: “A mountain!”

Me: “Er…<puzzled>…  What about the ears?”

Dad: (Triumphantly) “Haven’t you heard of mountaineers?!”

So as I say, not very funny. But, by a twist of fate, the following week at primary school my teacher happened to say, “Now then children, does anybody know any jokes they’d like to share?”. My hand shot up so fast that I was immediately given the chance to bring the house down with my new comedy routine. “What’s big at the bottom, small at the top and has ears?” I said, with barely repressed glee. “I don’t know”, murmured the teacher and other children expectantly. “A mountain!”, I replied.

Silence. Awkwardness. Tumbleweed. Someone may have said “Duuh!” under their breath. Then the teacher looked faintly annoyed and said, “That’s not really how a joke works…” before waiving away my attempts to explain and moving on to hear someone else’s (successful and funny) joke. Why had it been such a disaster? The joke had worked perfectly on the previous occasion, so why didn’t it work this time?

There are two reasons I tell this story: firstly because, as you can probably tell, I am scarred for life by what happened. And secondly, because it highlights what happens when you assume you can predict the unpredictable.*

Gambling With Performance

So far in this mini-series on Understanding Disk we’ve covered the design of hard drives, their mechanical limitations and some of the compromises that have to be made in order to achieve acceptable performance. The topic of this post is more about bandaids; the sticking plasters that users of disk arrays have to employ to try and cover up their performance problems. Or as my boss likes to call it, lipstick on a pig.

roulette-wheelIf you currently use an enterprise disk array the chances are it has some sort of DRAM cache within the array. Blocks stored in this cache can be read at a much lower latency than those residing only on disk, because the operation avoids paying the price of seek time and rotational latency. If the cache is battery-backed, it can also be used to accelerate writes too. But DRAM caches in storage area networks are notoriously expensive in relation to their size, which is often significantly smaller than the size of the active data set. For this reason, many array vendors allow you to use SSDs as an additional layer of slower (but higher capacity) cache.

Another common approach to masking the performance of disk arrays is tiering. This is where different layers of performance are identified (e.g. SATA disk, fibre-channel disk, SSD, etc) and data moved about according to its performance needs. Tiering can be performed manually, which requires a lot of management overhead, or automatically by software – perhaps the most well-known example being EMC’s Fully Automated Storage Tiering (or “FAST”) product. Unlike caching, which creates a copy of the data somewhere temporary, tiering involves permanently relocating the persistent location of the data. This relocation has a performance penalty, particularly if data is being moved frequently. Moreover, some automatic tiering solutions can take 24 hours to respond to changes in access patterns – now that’s what I call bad latency.

The Best Predictor of Future Behaviour is Past Behaviour

The problem with automatic tiering is that, just like caching, it relies on past behaviour to predict the future. That principle works well in psychology, but isn’t always as successful in computing. It might be acceptable if your workload is consistent and predictable, but what happens when you run your end of month reporting? What happens when you want to run a large ad-hoc query? What happens when you tell a joke about mountains and expect everyone to ask “but what about the ears”? You end up looking pretty stupid, I can tell you.

las-vegasI have no problem with caching or tiering in principle. After all, every computer system uses cache in multiple places: your CPUs probably have two or three levels of cache, your server is probably stuffed with DRAM and your Oracle database most likely has a large block buffer cache. What’s more, in my day job I have a lot of fun helping customers overcome the performance of nasty old spinning disk arrays using Violin’s Maestro memory services product.

But ultimately, caching and tiering are bandaids. They reduce the probability of horrible disk latency but they cannot eliminate it. And like a gambler on a winning streak, if you become more accustomed to faster access times and put more of your data at risk, the impact when you strike out is felt much more deeply. The more you bet, the more you have to lose.

Shifting the Odds in Your Favour

I have a customer in the finance industry who doesn’t care (within reason) what latency their database sees from storage. All they care about is that their end users see the same consistent and sustained performance. It doesn’t have to be lightning fast, but it must not, ever, feel slower than “normal”. As soon as access times increase, their users’ perception of the system’s performance suffers… and the users abandon them to use rival products.

poker-gameThey considered high-end storage arrays but performance was woefully unpredictable, no matter how much cache and SSD they used. They considered Oracle Exadata but ruled it out because Exadata Flash Cache is still a cache – at some point a cache miss will mean fetching data from horrible, spinning disk. Now they use all flash arrays, because the word “all” means their data is always on flash: no gambling with performance.

Caching and tiering will always have some sort of place in the storage world. But never forget that you cannot always win – at some point (normally the worst possible time) you will need to access data from the slowest media used by your platform. Which is why I like all flash arrays: you have a 100% chance of your data being on flash. If I’m forced to gamble with performance, those are the odds I prefer…

* I know. It’s a tenuous excuse for telling this story, but on the bright side I feel a lot better for sharing it with you.


4 Responses to Understanding Disk: Caching and Tiering

  1. sshdba says:

    Interesting Read. But Oracle claims in Exadata most of the time your database will be sitting in the flash cache instead of fetching it from disks. And what about IBM flashsystem 820. I thought it is supposed to have some neat auto-tiering tricks beneath its sleeve.

    I love your blog, no offence but as an honest critic i sometimes feel you push all flash arrays way too much due to the nature of your employers business :))

    • flashdba says:

      “Oracle claims…” – those two words tend to crop up a lot, don’t they? If the flash cache is smaller than the total usable space on disk, how can that be possible? Magic? I know that Oracle also claims (there’s that phrase again) to have hardware compression built into their OEMed LSI Nytro Warpdrive flash cards but you need to purchase the Advanced Compression Option to use that, which hikes up the price considerably.

      The IBM FlashSystem 820 (which I’m no expert on, but I think it’s being phased out now for the newer 840?) does not have integral tiering capabilities that I am aware of. At least, if it does, it’s not mentioned in the IBM Red Book, which is freely available. What is mentioned is the ability to use it as a tier within IBM’s Easy Tier solution, which is just another form of tiering to rival EMC’s FAST and all the other solutions out there.

      Tiering is what it is – a way to try and overcome poor latency through the use of probability to predict behaviour. My point is that you can use it, but you just need to keep in mind that your slowest tier will still, at some point, be the place where you need to go to access your data.

      As for your other comment, I welcome your honest criticism. And yes, you are right, I am a believer in all flash arrays. I cannot hide this or deny any inherent bias I have as a result. But what I can do is be open about it so anyone reading can form their own opinion…

  2. sshdba says:

    I have worked with the IBM Flashsystem 820 and the auto-tiering on that thing is horrible. It is a 24-hr cycle before the algorithm takes the decision to move the blocks to the flash tier. We have seen some horrible results after the obvious IBM sales drivel. We went to IBM to tweak the refresh cycle but apparently it is some sort of a proprietary setting and IBM wouldn’t let us touch it, Though to their credit they are now focussing on All-Flash arrays and seem to have a solid road map ahead.
    With Exadata my only gripe is that the compression on the flash cache comes through Advanced Compression licenses. I think thats unfair to customers who have put in a significant investment into it’s hardware/software to have something as basic as compression with a hefty license fees.

  3. Pingback: Visualize the IO source thanks to Tableau and AWR | bdt's oracle blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: