The Biggest Gap In The Clouds? High Performance RDBMS

Over the course of the last few blog posts, we’ve looked at how an increasing number of database workloads are migrating to the cloud, how there is more than one path to get there… and why overprovisioning is one of the biggest challenges to overcome.

We’re talking about business-critical application workloads here: big, complex, demanding, mission-critical, sensitive, performance-hungry… When on-prem, they are almost certainly running on dedicated, high-end infrastructure. And that’s a potential issue when you then migrate them to run on “someone else’s computer“.

As we’ve discussed before, the cloud is really a big pool of discrete resources and services, all of which are available on demand. You want a managed PostgreSQL instance? Click! It’s yours. You want three hundred virtual machines on which you can install your own software? Clickety-click! Off you go. If you’ve got the budget, the cloud has got a way for you to spend it. But underneath it all, whether you are using PaaS databases supplied by the cloud provider or installing the database software on IaaS systems, you are sharing that infrastructure – and the available performance – with the rest of the world.

Cloud Outcomes: Optimization versus Modernization

For some database workloads moving to the cloud, the modernization path will be the best fit, which means they will likely move to Platform-as-a-Service solutions where the day-to-day management of the database, operating system and infrastructure is taken care of by the cloud provider. Some examples of this path: on-prem SQL Server databases moving to Azure SQL Managed Instances; Oracle Databases moving to AWS’s Oracle RDS solution, etc.

But there is usually a certain class of database workload which doesn’t easily fit into these pre-packaged PaaS solutions: the big, the complex, the gnarly… the monsters of your data centre. And they inevitably end up in Infrastructure-as-a-Service… or stuck on-prem. For customers choosing the IaaS route (the “optimization path” in cloud-speak), the cloud provider manages the infrastructure but the customer is still responsible for the database and operating system.

Obviously, IaaS has a higher management overhead than PaaS, but often the journey to IaaS is simpler (essentially more of a lift and shift approach), while PaaS solutions often require a more complex migration. Especially with some cloud providers, where the recommended PaaS solution is actually a different database product entirely (for example, Oracle customers moving to Google Cloud or Microsoft Azure will be recommended by those cloud providers to move to Cloud SQL and Managed PostgreSQL respectively).

I/O Performance Is The Biggest Challenge

My view is that PaaS solutions are the best path for all appropriate workloads, but there will always be some outliers which need to move to IaaS. Almost by definition, those are the most high-profile, demanding, expensive, revenue-affecting… in fact… the most interesting workloads. And in all the cases I’ve seen*, I/O performance has been the limiting factor.

It’s relatively easy to get a lot of compute power in the cloud. But as soon as you start ramping up the amount of data you need to read and write, or demanding that those reads and writes have very fast, predictable response times, you hit problems. In other words, if latency, IOPS or throughput are your metrics of choice, you’d better be ready to start doing unnatural things.

And it’s not necessarily the case that your required level of performance cannot be achieved. Often, it’s more correct to say that your required levels of performance cannot be achieved at an acceptable cost. Because it turns out that the following statement is just as true in the cloud as it ever was on-prem:

Performance and Cost are two sides of the same coin…

This is why I believe that the biggest gap in the cloud providers’ product portfolios today is in the area of high performance relational databases: primarily Oracle Database and Microsoft SQL Server. The PaaS solutions are designed for the average workloads, not the high-end. A complex database running on, for example, Oracle Exadata will struggle to run on a vanilla IaaS deployment – while the refactoring required to take that database and migrate it to Managed PostreSQL is almost unimaginable.

And so, now, we have finally come to the point: the Silk Platform. I have spent the last few years at Silk bringing to market a solution specifically for this problem. Over the next few posts, I’m going to explain what Silk is and how we built our solution to deliver very high performance for databases in the cloud.

But you won’t have to rely on just my word for it. I have backup 🙂

* I work primarily with customers of Microsoft Azure and Google Cloud

Advertisement

Cloud Compromises: Constrained and Optimized CPUs

Imagine the scenario where you wonder into a clothing store to buy a t-shirt. You find a design you like in size “Medium” but it’s too tight (I guess #lockdown has been unkind to us all…) so you ask for the next size up. But when it arrives, you notice something bizarre: the “Large” is not only wider and longer, it also has an extra arm hole. Yes, there are enough holes for three arms as well as your head. Even more bizarrely, the “XL” size has four sets of sleeves, while the “Small” has only one and the “XS” none at all!

Surprisingly, this analogy is very applicable to cloud computing, where properties like compute power, memory, network bandwidth, capacity and performance are often tied together. As we saw in the previous post, a requirement for a certain amount of read I/O Operations Per Second (IOPS) can result in the need to overprovision unwanted capacity and possibly even unnecessary amounts of compute power.

But there is one situation where this causes extra levels of pain: when the workload in question is database software which is licensable by CPU cores (e.g. Oracle Database, Microsoft SQL Server).

To extend the opening analogy into total surrealism, imagine that the above clothing store exists in a state which collects a Sleeve Tax of %100 of the item value per sleeve. Now, your chosen t-shirt might be $40 but the Medium size will cost you $120, the Large $160 and the XXXXXL (suitable for octopods) a massive $360.

Luckily, the cloud providers have a way to help you out here. But it kind of sucks…

Constrained / Optimized VM Sizes

If you need large amounts of memory or I/O, the chances are you will have to pick a VM type which has a larger number of cores. But if you don’t want to buy databases licenses for these additional cores (because you don’t need the extra CPU power), you can choose to restrict the VM instance so that it only uses a subset of the total available cores. This is similar to the concept of logical partitioning which you may already have used on prem. Here are two examples of this practice from the big hyperscalers:

Microsoft Azure: Constrained vCPU capable VM sizes

Amazon Web Services: Introducing Optimize CPUs for Amazon EC2 Instances

As you can see, Microsoft and AWS have different names for this concept, but the idea is the same. You provision, let’s say, a 128 vCPU instance and then you restrict it to only using, for example, 32 vCPUs. Boom – you’ve dropped your database license requirement to 25% of the total number of vCPUs. Ok so you only get the compute performance of 25% too, but that’s still a big win on the license cost… right?

Well yes but…

There’s a snag. You still have to pay the full cost of the virtual machine despite only using a fraction of its resources. The monthly cost from the cloud provider is the same as if you were using the whole machine!

To quote Amazon (emphasis mine):

Please note that CPU optimized instances will have the same price as full-sized EC2 instances of the same size.

Or to quote the slightly longer version from Microsoft (emphasis mine):

The licensing fees charged for SQL Server or Oracle are constrained to the new vCPU count, and other products should be charged based on the new vCPU count. This results in a 50% to 75% increase in the ratio of the VM specs to active (billable) vCPUs. These new VM sizes allow customer workloads to use the same memory, storage, and I/O bandwidth while optimizing their software licensing cost. At this time, the compute cost, which includes OS licensing, remains the same one as the original size.

It’s great to be able to avoid the (potentially astronomical) cost of unnecessary database licences, but this is still a massive compromise – and the cost will add up over each month you are billed for compute cores that you literally cannot use. Again, this is the public cloud demonstrating that inefficiency and overprovisioning are to be accepted as a way of life.

Surely there must be a better way?

Spoiler alert: there IS a better way

Overprovisioning: The Curse Of The Cloud

I want you to imagine that you check in to a nice hotel. You’ve had a good day and you feel like treating yourself, so you decide to order breakfast in your room for the following morning. Why not? You fill out the menu checkboxes… Let’s see now: granola, toast, coffee, some fruit. Maybe a juice. That will do nicely.

You hang the menu on the door outside, but later a knock at the door brings bad news: You can only order a maximum of three items for breakfast. What? That’s crazy… but no amount of arguing will change their rules. Yet you really don’t want to choose just three of your five items. So what do you do? The answer is simple: you pay for a second hotel room so you can order a second breakfast.

Welcome to the world of overprovisioning.

Overprovisioning = Inefficiency

Overprovisioning is the act of deploying – and paying for – resources you don’t need, usually as a compromise to get enough of some other resource. It’s a technical challenge which results in a commercial or financial penalty. More simply, it’s just inefficiency.

The history of Information Technology is full of examples of this as well as technologies to overcome it: virtualization is a solution designed to overcome the inefficiency of deploying multiple physical servers; containerisation overcomes the inefficiency of virtualising a complete operating system many times… it’s all about being more efficient so you don’t have to pay for resources you don’t really need.

In the cloud, the biggest source of overprovisioning is the way that cloud resources like compute, memory, network bandwidth, storage capacity and performance are packaged together. If you need one of these in abundance, the chances are you will need to pay for more of the others regardless of whether they are required or not.

Overprovisioning = Compromise

As an example, at the time of writing, Google Cloud Platform’s pd-balanced block storage options provide 6 read IOPS and 6 write IOPS per GB of capacity:

* Persistent disk IOPS and throughput performance depends on disk size, instance vCPU count, and I/O block size, among other factors.

Consider a 1TB database with a reasonable requirement of 30,000 read IOPS during peak load. To build a solution capable of this, 5000GB (i.e. 5TB) of capacity would need to be provisioned… meaning 80% of the capacity is wasted!

Worse still, the “Read IOPS per instance” row of the table tells us that some of the available GCP instance types may not be able to hit our 30,000 requirement, meaning we may have to (over)provision a larger virtual machine type and pay for cores and RAM that aren’t necessary (by the way, I’m not picking on GCP here, this is common to all public clouds).

But the real sucker punch is that, if this database is licensed by CPU cores (e.g. Oracle, SQL Server) and we are having to overprovision CPU cores to get the required IOPS numbers, we now have to pay for additional, unwanted – and very expensive – database licenses.

Overprovisioning = Overpaying

My (old) front door

Let’s not imagine that this is a new phenomenon. If you’ve ever over-specced a server in your data centre (me), if you’ve ever convinced your boss that you need the Enterprise Edition of something because you thought it would be better for your career prospects (also me), or if you’ve ever spent £350 on a thermal imaging camera just so you can win an argument about whether you need a new front door (I neither admit nor deny this) then you have been overprovisioning.

It’s just that the whole nature of cloud computing, with it’s self-service, on-demand, limitlessly-scalable charateristics make it so easy to overprovision things all the time. So while the amounts may seem small when shown on the cloud provider’s Price per hour list, when you multiply them by the number of VMs, the number of regions and the number of hours in a year, they start to look massive on your bill.

And when you consider the knock on effects on database licensing, things really get painful. But let’s save that for the next blog post

The Public Cloud: The Hotel For Your Applications

Unless you are Larry Ellison (hi Larry!), the chances are you probably live in a normal house or an apartment, maybe with your family. You have a limited number of bedrooms, so if you want to have friends or relatives come to stay with you, there will come point where you cannot fit anybody else in without it being uncomfortable. Of course, for a large investment of time and money, you could extend your existing accommodation or maybe buy somewhere bigger, but that feels a bit extreme if you only want to invite a few people On to your Premises for the weekend.

Another option would be to sell up and move into a hotel. Pick the right hotel and you have what is effectively a limitless ability to scale up your accommodation – now everybody can come and stay in comfort. And as an added bonus, hotels take care of many dull or monotonous daily tasks: cooking, cleaning, laundry, valet parking… Freeing up your time so you can concentrate on more important, high-level tasks – like watching Netflix. And the commercial model is different too: you only pay for rooms on the days when you use them. There is no massive up-front capital investment in property, no need to plan for major construction works at the end of your five year property refresh cycle. It’s true pay-as-you-go!

It’s The Cloud, Stupid

The public cloud really is the hotel for your applications and databases. Moving from an investment model to a consumption-based expense model? Tick. Effectively limitless scale on demand? Tick. Being relieved of all the low-level operational tasks that come with running your own infrastructure? Tick. Watching more Netflix? Definite Tick.

But, of course, the public cloud isn’t better (or worse) than On Prem, it’s just different. It has potential benefits, like those above, but it also has potential disadvantages which stem from the fact that it’s a pre-packaged service, a common offering. Everyone has different, unique requirements but the major cloud providers cannot tailor everything they do to you individual needs – that level of customisation would dilute their profit margins. So you have to adapt your needs to their offering.

To illustrate this, we need to talk about car parking:

Welcome To The Hotel California

So… you decide to uproot your family and move into one of Silicon Valley’s finest hotels (maybe we could call it Hotel California?) so you can take advantage of all those cloud benefits discussed above. But here’s the problem, your $250/day suite only comes with one allocated parking bay in the hotel garage, yet your family has two cars. You can “burst” up by parking in the visitor spaces, but that costs $50/day and there is no guarantee of availability, so the only solution which guarantees you a second allocated bay is to rent a second room from the hotel!

This is an example of how the hotel product doesn’t quite fit with your requirements, so you have to bend your requirement to their offering – at the sacrifice of cost efficiency. (Incurring the cost of a second room that you don’t always need is called overprovisioning.) It happens all the time in every industry: any time a customer has to fit a specific requirement to a vendor’s generic offering, something somewhere won’t quite fit – and the only way to fix it is to pay more.

The public cloud is full of situations like this. The hyperscalers have extensive offerings but their size means they are less flexible to individual needs. Smaller cloud companies can be more attentive to an individual customer’s requirements, but lack the economies of scale of companies like Amazon Web Services, Microsoft and Google, meaning their products are less complete and their prices potentially higher. The only real way to get exactly what you want 100% of the time is… of course… to host your data on your own kit, managed by you, on your premises.

Such A Lovely Place

I should state here for the record that I am not anti-public cloud. Far from it. I just think it’s important to understand the implications of moving to the public cloud. There are a lot of articles written about this journey – and many of them talk about “giving up control of your data”. I’m not sure I entirely buy that argument, other than in a literal data-sovereignty sense, but one thing I believe to be absolutely beyond doubt is that a move to the public cloud will require an inevitable amount of compromise.

That should be the end of this post, but I’m afraid that I cannot now pass up the opportunity to mention one other compromise of the public cloud, purely because it fits into the Hotel California theme. I know, I’m a sucker for a punchline.

You and your family have enjoyed your break at the hotel, but you feel that it’s not completely working – those car parking charges, the way you aren’t allowed to decorate the walls of your room, the way the hotel suddenly discontinued Netflix and replaced it with Crackle. What the …? So you decide to move out, maybe to another hotel or maybe back to your own premises. But that’s when you remember about the egress charges; for every family member checking out of the hotel, you have to pay $50,000. Yikes!

I guess it turns out that, just like with the cloud, you can check out anytime you like… but you can never leave.

Don’t Call It A Comeback

I’ve Been Here For Years…

Ok, look. I know what I said before: I retired the jersey. But like all of the best superheroes, I’ve been forced to come out of retirement and face a fresh challenge… maybe my biggest challenge yet.

Back in 2012, I started this blog at the dawn of a new technology in the data centre: flash memory, also known as solid state storage. My aim was to fight ignorance and misinformation by shining the light of truth upon the facts of storage. Yes, I just used the phrase “the light of truth”, get over it, this is serious. Over five years and more than 200 blog posts, I oversaw the emergence of flash as the dominant storage technology for tier one workloads (basically, databases plus other less interesting stuff). I’m not claiming 100% of the credit here, other people clearly contributed, but it’s fair to say* that without me you would all still be using hard disk drives and putting up with >10ms latencies. Thus I retired to my beach house, secure in the knowledge that my legend was cemented into history.

But then, one day, everything changed…

Everybody knows that Information Technology moves in phases, waves and cycles. Mainframes, client/server, three-tier architectures, virtualization, NoSQL, cloud… every technology seems to get its moment in the sun…. much like me recently, relaxing by the pool with a well-earned mojito. And it just so happened that on this particular day, while waiting for a refill, I stumbled across a tech news article which planted the seed of a new idea… a new vision of the future… a new mission for the old avenger.

It’s time to pull on the costume and give the world the superhero it needs, not the superhero it wants…

Guess who’s back?

* It’s actually not fair to say that at all, but it’s been a while since I last blogged so I have a lot of hyperbole to get off my chest.

The Final Post: Hardware Is Dead

Hanging up the jersey

Well, my friends, this is it. The time has come to retire the flashdba jersey after more than seven years of fun and frolics. In part one of this post, I looked back at my time in the All-Flash storage industry and marvelled at the crazy, Game of Thrones-style chaos that saw so many companies arrive, fight, merge, split up and burn out. Throughout that time, I wrote articles on this blog site which attempted to explain the technical aspects of All-Flash as the industry went from niche to mainstream. Like many technical bloggers, I found this writing process enjoyable and fulfilling, because it helped me put some order to my own thoughts on the subject. But back in 2017, something changed and my blogs became less and less frequent… eventually leading here. I’ll explain why in a minute, but first we need to talk about the title of this post.

Hardware Is Meh

Back in my Oracle days, I worked with a product called Exadata – a converged database appliance which Oracle marketed as “hardware and software engineered to work together”. For a time, Oracle’s “Engineered Systems” were the future of the company and, therefore, the epicentre of their marketing campaigns. Today? It’s all about the Oracle Cloud. And this is actually a perfect representation of the I.T. industry as a whole… because, here in 2019, nobody wants to talk about hardware anymore. Whether it’s hyper-converged systems, All-Flash storage, “Engineered” database appliances or basic server and networking infrastructure, hardware is just not cool anymore.

Hardware is MehFor a long time, companies have purchased hardware systems as a capital expense, the cost then being written off over a number of years, at which point the dreaded hardware refresh is required. Choosing the correct specifications of for hardware (capacity, performance, number of ports etc) has always been extremely challenging because business is unpredictable: buy too small and you will need to upgrade at some point down the line, which could be expensive; buy too big and you are overpaying for resources you may never use. And also, if you are a small company or a startup, those capital expenses can be very hard to fund while you wait for revenue to build.

Today, nobody needs to do this anymore. The cloud – and in particular the public cloud – allows companies to consume exactly what they need, just when they need it – and fund it as an operating expense, with complete flexibility. One of the great joys of the public cloud is that hardware has been commoditised and abstracted to such a degree that you just don’t need to care about it anymore. Serverless, you might say… (IT has aways been fond of a ridiculous buzz word)

The Vendor View: AWS Is The New Enemy

For infrastructure vendors, the industry has reached a new tipping point. A few years ago, if you worked in sales for a storage startup (like me), you found business by targeting EMC customers who were unhappy with the prices they were paying / service they were getting / quality of steaks being bought for them by their EMC rep. Ditto, to a lesser extent, with HP and IBM, but EMC was the big gorilla of the marketplace. Today, everybody in storage has a new number #1 enemy: Amazon Web Services, with Microsoft Azure and Google Cloud Platform making up the top three. But make no mistake, AWS is eating everybody’s lunch – and the biggest challenge for the rest is that in many customer’s eyes, the public cloud is Amazon Web Services. (EMC, meanwhile, doesn’t even exist anymore but is instead a part of Dell… that would have been impossible to imagine five years ago).

Cloud ≠ Public Cloud

However, nobody (sane) is predicting that 100% of workloads will end up in the public cloud (and let’s be honest now, when we say “public cloud” we basically mean AWS, Azure and GCP). For some companies – where I.T. is not their core business – it makes perfect sense to do everything in the cloud. But for others, various reasons relating to control, risk, performance, security and regulation will mean that at least some data remains on premises, in private or hybrid clouds. You can argue among yourselves about how much.

So, for those people who still require their own infrastructure, what now? Once you’ve seen how easy it is to use the public cloud, sampled all the rich functionality of AWS and fallen into the trap of having staff paying for AWS instances on their credit cards (so-called “Shadow IT“), how do you go back to the old days of five-year up-front capital investments into large boxes of tin which sit in the corner of your data centre and remain stubbornly inflexible?

Consumption-Based Infrastructure

Consumption-Based Infrastructure

Ok let’s get to the conclusion. A couple of years ago, Kaminario (my employer) decided to exit the hardware business and become a software company. Like most (almost all) All-Flash storage vendors, Kaminario uses commodity whitebox components (basically, Intel x86 servers and enterprise-class SSDs) for the hardware chassis and then runs their own software on top to turn them into high-performance, highly-resilient and feature-rich storage platforms. Everybody does it: DellEMC’s XtremIO, Pure Storage, Kaminario, HP Nimble, NetApp… all of the differentiation in the AFA business is in software. So why purchase hardware components, manufacture and integrate them, keep them in inventory and then pass on all that extra cost to customers when your core business is actually software?

Kaminario decided to take a new route by disaggregating the hardware from the software and then handing over the hardware part to someone who already sells millions of hardware units all around the globe. Now, when you buy a Kaminario storage array, you get exactly the same physical appliance, but you (or your reseller) actually buy the hardware from Tech Data at commodity component cost. You then buy a consumption-based license to use the software from Kaminario based on the number of terabytes of data stored. This can be on a monthly Pay As You Go model or via a pre-paid subscription for a number of years. In a real sense, it is the cloud consumption model for people who require on-prem infrastructure.

There are all sorts of benefits to this (most customers never fill their storage arrays above 80% capacity, so why always pay for 100%?), but I’m not going to delve into them here because this is not a sales pitch, it’s an explanation for what I did next.

What I Did Next

Seeing as Kaminario decided to make a momentous shift, I thought it was a good time to make one of my own. So, two years ago, I took the decision to leave the world of technical presales and become a software sales executive. As in, a quota-carrying, non-technical, commercial sales guy with targets to hit and commission to earn. Presales people also earn commission, but are far more protected from the “lumpy” highs and lows that come with complex and lengthy high-value sales cycles (what sales people call “big ticket sales”). In commercial sales, the highs are higher and the lows are lower – and the risks are definitely riskier. Since my new role coincided with the company going through an entire change of business model, the risk was pretty hard to quantify, but I’m pleased to say that 2018 was the company’s best ever year, not just globally but also in the territory that I now manage (the United Kingdom).

More importantly for me, I’m now two years into this new journey and I have zero regrets about the decision to leave my technical past behind. I’ve learnt more than ever before (often the hard way) and I’ve experienced all the highs and lows one might expect, but I still get the same excitement from this role that I used to get in the early days of my technical career.

So, the time is right to hang up the technical jersey and bid flashdba farewell. It’s been fun and I want to say thank you to everybody who read, commented, agreed or disagreed with my content. There are almost 200 posts and pages on this site which I will leave here in the hope that they remain useful to others – and as a sort of virtual monument to my former career.

In the meantime, I’ve got to go now, because there are meetings to be had, customers to be entertained, dinners to be expensed and (hopefully) deals to be closed. Farewell, my friends, stay in touch… and remember, if you need to buy something… call me, yeah?

— flashdba —

[September 2020 Spoiler Alert: I couldn’t stay away]

Flash Debrief: The End (part 1)

Seven years ago this month, I created a blog and online presence called flashdba to mark the start of my journey away from Oracle databases (and DBAing) into the newly-born All-Flash Storage industry. Six years ago this month, I posted the first in what transpired to be a very long blog series attempting to explain the concepts of All Flash to those few who were interested. At least, I always assumed it would be a few, but now here we are in 2019 and the flashdba.com blog has been read over a million times, referenced in all sorts of surprising places and alluded to by Chris Mellor at The Register. One of my articles even (allegedly) got a mention by Mark Hurd during an Oracle forecast call!

But now, for various reasons that I will explain later, it’s time to draw it to a conclusion.

Review

Wow! What a ride it’s been, huh? Seven years ago, I joined a company called Violin Memory who were at the forefront of the infant (or should that be infantile?) flash industry. At one point, Violin had a global partnership with HP to make an “Exadata-killer” machine and had a valuation estimated to be around $2bn. EMC even wrote a secret briefing document in which they said, “Violin … is XtremIO’s #1 competitor in the all-flash storage market”. Meanwhile, numerous other small flash companies were being acquired for ridiculous, crazy and obscene money despite often being “pre-product” or pre-GA.So it took a particularly special effort for Violin Memory to take that head start and end up in Chapter 11 bankruptcy in December 2016. (The company is reborn as Violin Systems now, of course – and I still have friends there, so out of respect for them I have to keep my Violin stories under wraps. Which is a shame, because boy do I have some great stories…)divorce

Meanwhile, back in 2015, I’d decided to leave Violin Memory and join another All Flash pioneer, Kaminario – where I remain today. It’s fair to say that Violin Memory didn’t appreciate that decision, with the result that I had to spend a lot of time dealing with their lawyers. You feel very small when you are a sole person engaged in a legal dispute with a corporation who can afford an expensive legal team – you become enormously aware of the difference in spending power (although, in hindsight, perhaps Violin could have used those legal fees elsewhere to better effect). So, when the CEO of Kaminario interrupted his family holiday to call me and assure me that they would stand by me throughout the dispute, it left me with a real glimpse into the different in culture between my former employer and my current one. Also, Kaminario’s lawyers were a lot better!

The Flash Storage Wars (available now as a boxset)

The road from 2012 to 2019 is littered with the bloody carcasses of failed flash companies. From the disasters (Violin Memory, Skyera, FusionIO, Tintri, Whiptail, DSSD) to the acquisitions (Texas Memory Systems, XtremIO, Virident, SolidFire, Nimble) – not all of which could be considered successful – to the home-grown products which never really delivered (I’m looking at you, Oracle FS1). One company, Pure Storage, managed to beat the odds, ride out some stormy times and go from startup to fully-established player, although following their IPO the stock market has never really given them a lot of love. Meanwhile, EMC – the ultimate big dog of storage – was acquired by Dell, while HP split into two companies and NetApp continued to be linked with an acquisition by Lenovo or Cisco. Someday, somebody is going to turn the whole story into a boxset and sell it to Netflix for millions. Game of Thrones eat your heart out.

Yet there can be no doubt that All Flash itself has succeeded in its penetration of the previously disk-dominated enterprise storage market, with IDC regularly reporting huge year-on-year growth figures (e.g. 39.3% between Q3-2017 and Q4-2018). I vividly remember, back in 2012, having to explain to every prospective customer what flash was and why it was important. Today, every prospect has already decided they want All Flash. In fact, AFAs have become so mainstream now that, starting this year, Gartner will be merging its Solid State Array Magic Quadrant with the more traditional MQ for General-Purpose Disk Arrays. It just doesn’t make sense to have two separate models now.

So Who Won?

Good question. Was it DellEMC, the biggest company in storage and the current #1 in market share? Was it Pure Storage, who led Gartner’s most recent Solid State Array Magic Quadrant (but have it all to lose when the SSA MQ merges with the general-purpose MQ)? Or was it any number of investors and venture capitalists who managed to make money on the back of such market disruption? It’s a subjective question so you can choose your own answer. But for me, it’s very clear that there was only one winner… and back in 2012 we had no idea (although my old boss called it over a decade ago… I should have paid more attention). The ultimate winner of this war – and many other wars besides – is the cloud.

In part two – the final ever blog in this series (and possibly at all – spoiler alert), I’ll explain why I think the cloud is the ultimate winner… and why I’m calling time on flashdba after all these years. Wipe away those tears, my friends – not long now.

See also: this article apparently inspired the highly respected storage-industry journalist Chris Mellor to write A Potted History of All-Flash Arrays over at Blocks and Files. Thanks Chris, I’m honoured!

Oracle ASM and Thin Provisioning – How To Reclaim Space

It came to my attention last November that I had crossed the one year anniversary since my last post on flashdba.com. I was so surprised that I immediately decided to write a new post, which took another three months. There are reasons why I’m no longer posting technical blogs about databases and flash, but I’ll cover them in a later post. No, not that late – I hope.

In the meantime, I thought I’d write a note on this subject because I’ve lost count of the number of times I’ve been asked questions on the topic of Oracle ASM and Thin Provisioning. Normally, I’m asked by customers or prospects who think there is an issue with their storage system… whereas, in fact, the problem is entirely storage-agnostic.

But first, some background.

Thin Provisioning

Thin Provisioning (TP) is used to describe the overcommitment of storage capacity. Your host may think it’s been allocated 10TB of capacity and is currently using 2TB, but the storage platform has only really allocated the 2TB used and the remaining 8TB may not even exist. Why would you want this? Because in a multi-host environment (where hosts could be virtual or physical), the amount of allocated-but-unused capacity could be significant. Without TP, serious amounts of capacity would need to be provisioned which may never be used, but with TP all the hosts can be “fooled” into thinking they have been allocated what they want while the actual utilised capacity is only the sum of what each host has used.

Where things can get a bit complicated with TP is that many layers in your stack may be thin provisioning storage to the layers above them. Most storage arrays are capable of TP (or indeed mandate its use), but hypervisors often have thin provisioning options too. Meanwhile, some applications which create data store structures have options which can help or hinder the use of TP. For example, VMware has the ability to create virtual disks which are thin, thick (lazy zeroed) or thick (eager zeroed). As a result, it isn’t always obvious to the underlying storage whether a particular set of allocated blocks are really in use or not. Won’t somebody think of the poor storage array?

Trim and Unmap

Consider the situation where a large file is created and then deleted in a filesystem on a typical operating system. Commonly, the deletion process doesn’t really delete anything other than the metadata telling the filesystem where the file resided. Thus the underlying file data remains until such time as something else comes along and overwrites it. This is beneficial because it is faster and requires less work than trying to overwrite the file with (for example) zeros. But if the filesystem resides on a storage array which uses TP, how will the storage array know that the space allocated to the file is now free? It can’t – unless the filesystem has a way of telling it.

For this purpose there exists a set of OS calls known as trim commands – and for the SCSI protocol (used by most block storage devices such as SANs) the command is known as UNMAP. Issuing one of these commands allows the calling layer (the filesystem, or perhaps a volume manager) to notify the storage platform that a specific set of blocks are no longer in use and can be “unmapped”, freeing space. As a side note, large calls to UNMAP can often have temporary but unexpected consequences on storage performance, as large amounts of metadata may need to be updated.

Oracle ASM: Unmap is for Wimps

Let’s get straight to the point here: Oracle’s Automatic Storage Manager doesn’t natively use UNMAP commands. Quelle surprise. But there are still ways to free up space back to thin provisioned arrays. Two in fact: let’s call them the bad way and the good way. First though, let’s set up the scenario:

Test Scenario

Consider the situation where an Oracle ASM diskgroup is created on a 10TB volume group presented from a thin provisioning All-Flash storage array. The DBA then creates a large “bigfile” tablespace in the diskgroup, with a 5TB datafile (the rest of the database resides elsewhere). Anyone who has sat waiting for the CREATE TABLESPACE command for any period of time will be aware that, during the datafile creation process, Oracle likes to fill the whole file with empty blocks. From Oracle’s perspective, this has the advantage of ensuring that the entire datafile capacity has been marked as used by the storage array. In other words, it’s not “fake” thin provisioned space which may or may not be available, but real available capacity which now belongs to Oracle. (You may also recall that Oracle no longer takes this approach with tempfiles, instead using the faster “sparse” allocation method.)

At this point, what will the volumes on the storage array will be showing? We know that 10TB has been allocated, of which 5TB has been used. So shouldn’t that leave 5TB free? Probably not, because almost every All-Flash storage array uses data reduction technologies such as compression, deduplication and zero-detect. Since each block in the tablespace contains a unique block number, deduplication isn’t going to add any value (which is why arrays like the Kaminario allow dedupe to be disabled on a per-volume basis), but compression is going to have great fun with all the emptiness inside each Oracle block so the storage array will probably show significantly less than 5TB used.

Next, our enterprising DBA watches a Connor McDonald video about DBMS_RANDOM and gets a little overexcited, then fills the entire tablespace with random data to the point that the storage array can hardly achieve any compression. The outcome? Allocated = 10TB, Used = 5TB, Free = 5TB.

Finally, after watching a video of Larry Ellison explaining that the Oracle Autonomous Database needs “no human intervention” and thus fearing for his job, the DBA deletes the tablespace and goes home. Back to 10TB free? No.
The tablespace deletion command does a number of things, including notifying Oracle ASM that the file’s allocation units are no longer in use and removing the datafile from the database’s controlfile. But at no point does anybody bother to tell the storage array that the used space is now free, so the array’s capacity statistics remain: Allocated = 10TB, Used = 5TB, Free = 5TB.

ASRU: The Bad Way

ASRU is Oracle’s ASM Reclamation Utility, a PERL script developed in conjunction with 3PAR (a storage array now owned by HPE) and designed to free up space from scenarios such as the one above. It is, in my personal opinion, a terrible botched solution which was created to serve a purpose which no longer exists – although, interestingly, many storage vendors still seem to recommend it by default (for example, Pure Storage still describe it as the only solution for reclaiming unused space with Oracle ASM).
ASRU doesn’t issue UNMAP commands. Instead, it takes advantage of the fact that most modern storage platforms (including 3PAR, Pure Storage and Kaminario) treat blocks full of zeros as free space (a feature known as zero detect). Thus what ASRU does – when manually run by a DBA (presumably during a change window in the middle of the night while rubbing a lucky rabbit’s foot and praying to the gods of all major religions) – is compact the remaining data in any diskgroup toward the start of the volume and then write zeros above the high watermark where this compacted data ends.
In our example above, this should return the capacity statistics to approximately: Allocated = 10TB, Used = 0TB, Free = 10TB. However, because zero detect is often considered to be a type of data reduction, some arrays then show horribly-skewed data reduction ratios as a result of ASRU.
Don’t get me wrong, many people have successfully used ASRU – and in some situations it may be your only choice. But there is another way…

ASM Filter Driver: The Good Way

Since Oracle Database version 12.1.0.2, the option has been available to install a piece of software called ASMFD, the ASM Filter Driver. ASMFD is a kernel module which resides in the I/O path of Oracle ASM disks – and is the natural successor to the Linux-only ASMLib kernel driver. Unlike ASMLib, or indeed native ASM, the ASMFD module contains support for SCSI UNMAP commands, which really is the missing piece of the jigsaw. Providing you use ASMFD, the deletion of files from within ASM will result in the storage array being notified as allocation units are freed up, resulting in the correct recalculation of Free and Used Capacity statistics – and without the unnecessary hack of writing zeros all over the place. It really is a no brainer.
Unless, of course, you’ve already installed your database and ASM and are now looking for some way to return freed capacity. In which case, installing ASMFD on an existing system may seem even more challenging than running ASRU. But you know what they say: it’s better to do it right first time than to be constantly forced into bodging it with PERL scripts.

TL;DR

If you want Oracle ASM to correctly free space back to your thin provisioned storage array, you need to choose between the correct method of using ASM Filter Driver or the botched method of running the ASRU reclamation tool, which comes in the form of a PERL script. Either way, it’s nothing to do with the storage platform, so don’t blame the storage guy…

All Flash Arrays: Scale Up vs Scale Out (Part 2)

In the first post on the subject of Scale Up versus Scale Out, we looked at the reasons why scalability is a key requirement for storage platforms, as well as discussing the limits of Scale Up only architectures, i.e. systems where more capacity is added to the same fixed number of controllers. In this article, we look at the alternative architecture known as Scale Out.

Scale Out – Adding Performance

In a scale out architecture, the possibility exists to add more storage controllers, thereby adding more performance capability. You may remember that the performance of a storage array is approximately proportional to the number of (and power of) its storage controllers, while the capacity is determined by the amount of flash or SSD media addressed by those controllers.

In this scenario, the base model typically consists of a pair of controllers (after all, at least two are required to provide resiliency). Most scale-up-capable storage arrays work by adding more pairs of these controllers, which are considered indivisible units and have names like “K-Block” (Kaminario K2) or “X-Brick” (DellEMC XtremIO). For now, let’s just call them controller pairs.

There are a number of technical challenges to overcome when building a system which can scale out. In the first post, we covered Active/Passive solutions, where only one controller processes I/O from the underlying media. In this scenario, the performance limitations are determined by the characteristics of the single active node – with the remaining (passive) node simply waiting to spring into action in the event of a failover. Clearly this type of architecture makes less sense as the number of nodes increases above two, since the additional nodes will also be passive and therefore adding little benefit.

Scale out architectures, then, typically employ an active/active solution whereby each controller contributes more performance capacity as it joins the system. And that means building a high-availability cluster, with all of the associated cluster management technologies that entails (failover, virtual IPs, protection against split brain scenarios, etc). No wonder some vendors stick to Scale Up only.

Of course, the biggest issue with a Scale Out only architecture is the question of what happens when additional capacity needs to be added. The answer is that another controller or set of controllers must be added too, complete with their attached storage media – but the controllers are an expensive and unnecessary addition if the only requirement is simply more capacity.

Scale Up and Scale Out – The Perfect Solution?

So what we’ve seen here is that a Scale Up architecture allows for more capacity to be added to existing controllers, while a Scale Out architecture allows for more performance to be added to existing capacity. It would therefore seem logical that the ultimate goal is to build a system which can (independently) scale both up and out. Scaling up allows more capacity to be added without the cost of more controllers. Scaling out allows more performance (controllers) to be added when required. And thus the characteristics of the storage platform can be extended in either of these two dimensions as needed. Perfect?

Arrays which can support both scale up and scale out have been surprisingly rare in the All Flash market so far, but they do exist. The concept is simple: customers that purchase storage typically do so over a three to five year period. Most people simply cannot guess how their requirements will change in that period of time… more users, more customers, more data? Yes, probably – but how much and over what time period? Choosing an architecture which allows independent (non-disruptive) scale of both capacity and performance insures against the risk associated with capacity planning, while also allowing customers to start off by purchasing only what they need today and then expanding at their own pace. Sounds a bit like cloud computing when you put it like that, doesn’t it?

Which conveniently leads us to…

The Future: Dynamically Composable / Disaggregated Storage?

None of us know how the future will look, especially in the technology industry. But one vision of the future comes from my employer, Kaminario, who is one of a number of companies exploring the concept of composable infrastructure. I think this is a very interesting new direction, which is why I’m writing about it here – but since I’m an employee of this vendor I must first give you a mandatory sales warning and you should treat the next paragraphs with a healthy dose of “well he would say that, wouldn’t he?”

In a dynamically composable storage environment, the two elements we have been discussing above (storage controllers and storage media) become completely disaggregated so that any shelf of media can be addressed by any set of controllers. These sets of systems can then be dynamically composed, so that – out of a set of multiple shelves of media and storage controllers – subsets of the two can be brought together to form virtual private arrays designed to serve specific applications.

© 2017 Kaminario

If you think about the potential of this method of presentation, it opens up many possibilities. For example, the dynamic and non-disruptive reallocation of storage resources offers customers the ability to constantly adapt to unpredictable workloads. Furthermore, concepts from AI can be used to automate this reallocation and even predict changes to requirements in advance.

This is useful for any customer with a complex estate of mixed workloads, but it’s incredibly useful to Cloud Service Providers and MSPs. After all, these organisations have no knowledge of what their customers are doing on their systems or what they will do in the future, so the ability to dynamically adapt performance and capacity requirements could provide a competitive edge.

Conclusion

We all know that I.T. is full of buzzwords, like agility or transformation. Is scalability another one? Maybe it’s in danger of becoming one. But if you think about it, one of the fundamental characteristics of any platform is its ability to scale. It essentially defines the limitations of the platform that you may meet as you grow – and growth is pretty much the point of any business. So take the time to understand what scale actually means in a storage context and you might avoid learning about those limitations the hard way…

All Flash Arrays: Scale Up vs Scale Out (Part 1)

Imagine you want to buy some more storage for your laptop – let’s say an external USB drive for backups. What are the fundamental questions you need to ask before you get down to the thorny issue of price? Typically, there is only one key question:

  • How much capacity do I need?

Of course there will be lesser questions, such as connectivity, brand, colour, weight, what it actually looks like and so on. But those are qualifying questions – ways to filter the drop down list on Amazon so you have less decisions to make.

However, different rules apply when buying enterprise storage: we might care less about colour and more about physical density, power requirements and the support capabilities of the vendor. We might care less about what the product looks like and more about how simple it is to administer. But most of all, for enterprise storage, there are now two fundamental questions instead of one:

  • How much capacity do I need?
  • How much performance do I need?

Of course, there is the further issue of what exactly we mean by “performance”, given that it can be measured in a number of different ways. The answer is dependant on the platform being used: for disk-based storage systems it was typically the number of I/O Operations Per Second (IOPS), while for modern storage systems it is more likely to be the bandwidth (the volume of data read and written per second). And just to add a little spice, when considering IOPS or bandwidth on All Flash array platforms the read/write ratio is also important.

So the actual requirement for, say, a three-node data warehouse cluster with 200 users might turn out to be:

“I need 50TB of usable capacity and the ability to deliver 1GB/second at 90% reads. What will this cost?”

Are we ready to spec a solution yet? Not quite. First we have to consider Rule #1.

Rule #1: Requirements Change

Most enterprise storage customers purchase their hardware for use over a period of time – with the most typical period being five years. So it stands to reason that whatever your requirements are at the time of purchase, they will change before the platform is retired. In fact, they will change many times. graph-growingData volumes will grow, because data volumes only ever get bigger… right? But also, those 200 users might grow to 500 users. The three node cluster might be extended to six nodes. The chances are that, in some way, you will need more performance and/or more capacity.

The truth is that, while customers buy their hardware based on a five year period, now more than ever they cannot even predict what will happen over the next 12 months. Forgive the cliche, the only thing predictable is unpredictability.

So as a customer in need of enterprise storage, what do you do? Clearly you won’t want to purchase all of the capacity and performance you might possibly maybe need in the future. That would be a big up-front investment which may never achieve a return. So your best bet is to purchase a storage platform which can scale. This way you can start with what you need and scale as your requirements grow.

This is where architecture becomes key.

Scalability is an Architectural Decision

You may remember from a previous post that the basic building blocks of any storage array are controllers and media, with various networking devices used to string them together. scale-starting-pointAs a gross simplification, the performance of a storage array is a product of the number of controllers and the power that those controllers have (assuming they aren’t held back by the media). The capacity of a storage array is clearly a product of the amount of media.

In a simple world you would just add more media when you wanted more capacity, you would add more controllers (or increase their power) when you needed more performance, and you would do both when you need both. But this is not a simple world.

I always think that the best way to visualise the two requirements of capacity and performance is to use two different dimensions on a graph, with performance as X and capacity as Y. So let’s use that here – we start with a single flash array which has one pair of controllers and one set of flash media.

Scale Up – Adding Capacity

The simple way to achieve scale up is to just add more media to an existing array. Media is typically arranged in some sort of indivisible unit, such as a shelf of SSDs arranged in a RAID configuration (so that it has inbuilt redundancy). In principle, adding another shelf of SSDs sounds easy, but complications arise when you consider the thorny issue of metadata.

To illustrate, consider that most enterprise All Flash arrays available today have inbuilt data reduction features including deduplication. At a high level, dedupe works by computing a hash value for each block written to the array and then comparing it to a table containing all the hash values of previously written blocks. If the hash is discovered in the table, the block already exists on the array and does not need to be written again, reducing the amount of physical media used.

This hash table is an example of the metadata storage arrays need to store and maintain in order to function; other examples of features which utilise metadata are thin provisioning, snapshots and replication. This makes metadata a critical factor in the performance of a storage array

To ensure the highest speed of access, most metadata is pinned in DRAM on the storage controllers. This has a knock-on effect in that the amount of addressable storage in an array can become directly linked to the amount of DRAM in the controllers. DRAM is a costly resource and affects the manufacturing cost of the array, so there is a balancing act required in order to have enough DRAM to store the necessary metadata without inflating the cost any more than is absolutely possible.

Hopefully you see where this is going. Adding more shelves of media increases the storage capacity of an array… thereby increasing the metadata footprint… and so increasing the need for DRAM. At some point the issue becomes whether it is even possible (technically or commercially) to support the metadata overhead of adding more shelves of physical media.

Scale-Up Only Architectures

There are many storage arrays on the market which have a scale-up only architecture, with Pure Storage being an obvious example. There are various arguments presented as to why this is the case, but my view is that these architectures were used as a compromise in order to get the products to market faster (especially if they also adopt an active/passive architecture). Having said that, it’s obvious that I am biased by the fact that I work for vendor which does not have this restriction – and who believes in offering scale up and scale out. So please don’t take my word for it – go read what the other vendors say and then form your own opinion.

One counter claim by proponents of scale-up only architecture is that performance can be added by upgrading array controllers in-place, non-disruptively. In other words, the controllers can be replaced with high-specification models with more CPU cores and more DRAM, bringing more performance capability to the array. The issue here is that this is a case of diminishing returns. Moving up through the available CPU models brings step changes in cost but only incremental increases in performance.

To try and illustrate this, let’s look at some figures for the Pure Storage //m series of All Flash arrays. There are three models increasing in price and performance: m20, m50 and m70. We can get performance figures measured in maximum IOPS (measured with a 32k block size as is Pure’s preferred way) from this datasheet and we can get details of the CPU and DRAM specifications from this published validation report by the NSA. Let’s use Wikipedia’s List of Intel Xeon microprocessors page to find the list price of those CPUs and compare the increments in price to those of the maximum stated performance:

Here we can see that the list price for the CPUs alone rises 238% and then 484% moving from //m20 to //m50 to //m70, yet the maximum performance measured in IOPS rises just 147% and 200%. You can argue that the CPU price is not a perfect indicator of the selling price of each //m series array, but it’s certainly a factor. As anyone familiar with purchasing servers will attest, buying higher-spec models takes more out of your pocket than it gives you back in performance.

My point here – and this is a general observation rather than one about Pure in particular – is that this is not a cost effective scaling strategy in comparison to the alternative, which is the ability to scale out by adding more controllers.

Coming Next: Scale Out

In the next post we’ll look at Scale Out architectures and what they mean for customers with independently varying requirements for capacity and performance.

Scale out allows the cost-effective addition of more controllers and therefore more performance capability, along with other benefits such as the addition of more ports. But there are potential downsides too…