October 19, 2016 8 Comments
I want you to imagine that you are about to run a race. You have your trainers on, your pre-race warm up is complete and you are at the start line. You look to your right… and see the guy next to you, the one with the bright orange trainers, is hopping up and down on one leg. He does have two legs – the other one is held up in the air – he’s just choosing to hop the whole race on one foot. Why?
You can’t think of a valid reason so you call across, “Hey buddy… why are you running on one leg?”
His reply blows your mind: “Because I want to be sure that, if one of my legs falls off, I can still run at the same speed”.
Welcome, my friends, to the insane world of storage marketing.
High Availability Clusters
The principles of high availability are fairly standard, whether you are discussing enterprise storage, databases or any other form of HA. The basic premise is that, to maintain service in the event of unexpected component failures, you need to have at least two of everything. In the case of storage array HA, we are usually talking about the storage controllers which are the interfaces between the outside world and the persistent media on which data resides.
Ok so let’s start at the beginning: if you only have one controller then you are running at risk, because a controller failure equals a service outage. No enterprise-class storage array would be built in this manner. So clearly you are going to want a minimum of two controllers… which happens to be the most common configuration you’ll find.
So now we have two controllers, let’s call them A and B. Each controller has CPU, memory and so on which allow it to deliver a certain level of performance – so let’s give that an arbitrary value: one controller can deliver 1P of performance. And finally, let’s remember that those controllers cost money – so let’s say that a controller capable of giving 1P of performance costs five groats.
In a basic active/passive design, one controller (A) handles all traffic while the other (B) simply sits there waiting for its moment of glory. That moment comes when A suffers some kind of failure – and B then leaps into action, immediately replacing A by providing the same service. There might be a minor delay as the system performs a failover, but with multipathing software in place it will usually be quick enough to go unnoticed.
So what are the downsides of active/passive? There are a few, but the most obvious one is that you are architecturally limited to seeing only 50% of your total available performance. You bought two controllers (costing you ten groats!) which means you have 2P of performance in your pocket, but you will forever be limited to a maximum of 1P of performance under this design.
In an active/active architecture, both controllers (A and B) are available to handle traffic. This means that under normal operation you now have 2P of performance – and all for the same price of ten groats. Both the overall performance and the price/performance have doubled.
What about in a failure situation? Well, if controller A fails you still have controller B functioning, which means you are now down to 1P of performance. It’s now half the performance you are used to in this architecture, but remember that 1P is still the same performance as the active/passive model. Yes, that’s right… the performance under failure is identical for both designs.
What About The Cost?
Smart people look at technical criteria and choose the one which best fits their requirements. But really smart people (like my buddy Shai Maskit) remember that commercial criteria matter too. So with that in mind, let’s go back and consider those prices a little more. For ten groats, the active/active solution delivered performance of 2P under normal operation. The active/passive solution only delivered 1P. What happens if we attempt to build an active/passive system with 2P of performance?
To build an active/passive solution which delivers 2P of performance we now need to use bigger, more powerful controllers. Architecturally that’s not much of a challenge – after all, most modern storage controllers are just x86 servers and there are almost always larger models available. The problem comes with the cost. To paraphrase Shai’s blog on this same subject:
Cost of storage controller capable of 1P performance < Cost of storage controller capable of 2P performance
In other words, building an active/passive system requires more expensive hardware than building a comparable active/active system. It might not be double, as in my picture, but it will sure as hell be more expensive – and that cost is going to be passed on to the end user.
Does It Scale?
Another question that really smart people ask is, “How does it scale?”. So let’s think about what happens when you want to add more performance.
In an active/active design you have the option of adding more performance by adding more controllers. As long as your architecture supports the ability for all controllers to be active concurrently, adding performance is as simple as adding nodes into a cluster.
But what happens when you add a node to an active/passive solution? Nothing. You are architecturally limited to the performance of one controller. Adding more controllers just makes the price/performance even worse. This means that the only solution for adding performance to an active/passive system is to replace the controllers with more powerful versions…
The Pure Storage Architecture
Pure Storage is an All Flash Array vendor who knows how to play the marketing game better than most, so let’s have a look at their architecture. The PS All Flash Array is a dual-controller design where both controllers send and receive I/Os to the hosts. But… only one controller processes I/Os to and from the underlying persistent media (the SSDs). So what should we call this design, active/active or active/passive?
According to an IDC white paper published on PS’s website, PS controllers are sized so that each controller can deliver 100% of the published performance of the array. The paper goes on to explain that under normal operation each controller is loaded to a maximum of 50% on the host side. This way, PS promises that performance under failure will be equal to the performance under normal operations.
In other words, as an architectural decision, the sum of the performance of both controllers can never be delivered.
So which of the above designs does that sound like to you? It sounds like active/passive to me, but of course that’s not going to help PS sell its flash arrays. Unsurprisingly, on the PS website the product is described as “active/active” at every opportunity.
Yet even PS’s chief talking head, Vaughn Stewart, has to ask the question, “Is the FlashArray an Active/Active or Active/Passive Architecture?” and eventually comes to the conclusion that, “Active/Active or Active/Passive may be debatable”.
There’s no debate in my view.
You will obviously draw your own conclusions on everything I’ve discussed above. I don’t usually pick on other AFA vendors during these posts because I’m aiming for an educational tone rather than trying to fling FUD. But I’ll be honest, it pisses me off when vendors appear to misuse technical jargon in a way which conveniently masks their less-glamorous architectural decisions.
My advice is simple. Always take your time to really look into each claim and then frame it in your own language. It’s only then that you’ll really start to understand whether something you read about is an innovative piece of design from someone like PS… or more likely just another load of marketing BS.
* Many thanks to my colleague Rob Li for the excellent running-on-one-leg metaphor