All Flash Arrays: Scale Up vs Scale Out (Part 2)

In the first post on the subject of Scale Up versus Scale Out, we looked at the reasons why scalability is a key requirement for storage platforms, as well as discussing the limits of Scale Up only architectures, i.e. systems where more capacity is added to the same fixed number of controllers. In this article, we look at the alternative architecture known as Scale Out.

Scale Out – Adding Performance

In a scale out architecture, the possibility exists to add more storage controllers, thereby adding more performance capability. You may remember that the performance of a storage array is approximately proportional to the number of (and power of) its storage controllers, while the capacity is determined by the amount of flash or SSD media addressed by those controllers.

In this scenario, the base model typically consists of a pair of controllers (after all, at least two are required to provide resiliency). Most scale-up-capable storage arrays work by adding more pairs of these controllers, which are considered indivisible units and have names like “K-Block” (Kaminario K2) or “X-Brick” (DellEMC XtremIO). For now, let’s just call them controller pairs.

There are a number of technical challenges to overcome when building a system which can scale out. In the first post, we covered Active/Passive solutions, where only one controller processes I/O from the underlying media. In this scenario, the performance limitations are determined by the characteristics of the single active node – with the remaining (passive) node simply waiting to spring into action in the event of a failover. Clearly this type of architecture makes less sense as the number of nodes increases above two, since the additional nodes will also be passive and therefore adding little benefit.

Scale out architectures, then, typically employ an active/active solution whereby each controller contributes more performance capacity as it joins the system. And that means building a high-availability cluster, with all of the associated cluster management technologies that entails (failover, virtual IPs, protection against split brain scenarios, etc). No wonder some vendors stick to Scale Up only.

Of course, the biggest issue with a Scale Out only architecture is the question of what happens when additional capacity needs to be added. The answer is that another controller or set of controllers must be added too, complete with their attached storage media – but the controllers are an expensive and unnecessary addition if the only requirement is simply more capacity.

Scale Up and Scale Out – The Perfect Solution?

So what we’ve seen here is that a Scale Up architecture allows for more capacity to be added to existing controllers, while a Scale Out architecture allows for more performance to be added to existing capacity. It would therefore seem logical that the ultimate goal is to build a system which can (independently) scale both up and out. Scaling up allows more capacity to be added without the cost of more controllers. Scaling out allows more performance (controllers) to be added when required. And thus the characteristics of the storage platform can be extended in either of these two dimensions as needed. Perfect?

Arrays which can support both scale up and scale out have been surprisingly rare in the All Flash market so far, but they do exist. The concept is simple: customers that purchase storage typically do so over a three to five year period. Most people simply cannot guess how their requirements will change in that period of time… more users, more customers, more data? Yes, probably – but how much and over what time period? Choosing an architecture which allows independent (non-disruptive) scale of both capacity and performance insures against the risk associated with capacity planning, while also allowing customers to start off by purchasing only what they need today and then expanding at their own pace. Sounds a bit like cloud computing when you put it like that, doesn’t it?

Which conveniently leads us to…

The Future: Dynamically Composable / Disaggregated Storage?

None of us know how the future will look, especially in the technology industry. But one vision of the future comes from my employer, Kaminario, who is one of a number of companies exploring the concept of composable infrastructure. I think this is a very interesting new direction, which is why I’m writing about it here – but since I’m an employee of this vendor I must first give you a mandatory sales warning and you should treat the next paragraphs with a healthy dose of “well he would say that, wouldn’t he?”

In a dynamically composable storage environment, the two elements we have been discussing above (storage controllers and storage media) become completely disaggregated so that any shelf of media can be addressed by any set of controllers. These sets of systems can then be dynamically composed, so that – out of a set of multiple shelves of media and storage controllers – subsets of the two can be brought together to form virtual private arrays designed to serve specific applications.

© 2017 Kaminario

If you think about the potential of this method of presentation, it opens up many possibilities. For example, the dynamic and non-disruptive reallocation of storage resources offers customers the ability to constantly adapt to unpredictable workloads. Furthermore, concepts from AI can be used to automate this reallocation and even predict changes to requirements in advance.

This is useful for any customer with a complex estate of mixed workloads, but it’s incredibly useful to Cloud Service Providers and MSPs. After all, these organisations have no knowledge of what their customers are doing on their systems or what they will do in the future, so the ability to dynamically adapt performance and capacity requirements could provide a competitive edge.

Conclusion

We all know that I.T. is full of buzzwords, like agility or transformation. Is scalability another one? Maybe it’s in danger of becoming one. But if you think about it, one of the fundamental characteristics of any platform is its ability to scale. It essentially defines the limitations of the platform that you may meet as you grow – and growth is pretty much the point of any business. So take the time to understand what scale actually means in a storage context and you might avoid learning about those limitations the hard way…

All Flash Arrays: Scale Up vs Scale Out (Part 1)

Imagine you want to buy some more storage for your laptop – let’s say an external USB drive for backups. What are the fundamental questions you need to ask before you get down to the thorny issue of price? Typically, there is only one key question:

  • How much capacity do I need?

Of course there will be lesser questions, such as connectivity, brand, colour, weight, what it actually looks like and so on. But those are qualifying questions – ways to filter the drop down list on Amazon so you have less decisions to make.

However, different rules apply when buying enterprise storage: we might care less about colour and more about physical density, power requirements and the support capabilities of the vendor. We might care less about what the product looks like and more about how simple it is to administer. But most of all, for enterprise storage, there are now two fundamental questions instead of one:

  • How much capacity do I need?
  • How much performance do I need?

Of course, there is the further issue of what exactly we mean by “performance”, given that it can be measured in a number of different ways. The answer is dependant on the platform being used: for disk-based storage systems it was typically the number of I/O Operations Per Second (IOPS), while for modern storage systems it is more likely to be the bandwidth (the volume of data read and written per second). And just to add a little spice, when considering IOPS or bandwidth on All Flash array platforms the read/write ratio is also important.

So the actual requirement for, say, a three-node data warehouse cluster with 200 users might turn out to be:

“I need 50TB of usable capacity and the ability to deliver 1GB/second at 90% reads. What will this cost?”

Are we ready to spec a solution yet? Not quite. First we have to consider Rule #1.

Rule #1: Requirements Change

Most enterprise storage customers purchase their hardware for use over a period of time – with the most typical period being five years. So it stands to reason that whatever your requirements are at the time of purchase, they will change before the platform is retired. In fact, they will change many times. graph-growingData volumes will grow, because data volumes only ever get bigger… right? But also, those 200 users might grow to 500 users. The three node cluster might be extended to six nodes. The chances are that, in some way, you will need more performance and/or more capacity.

The truth is that, while customers buy their hardware based on a five year period, now more than ever they cannot even predict what will happen over the next 12 months. Forgive the cliche, the only thing predictable is unpredictability.

So as a customer in need of enterprise storage, what do you do? Clearly you won’t want to purchase all of the capacity and performance you might possibly maybe need in the future. That would be a big up-front investment which may never achieve a return. So your best bet is to purchase a storage platform which can scale. This way you can start with what you need and scale as your requirements grow.

This is where architecture becomes key.

Scalability is an Architectural Decision

You may remember from a previous post that the basic building blocks of any storage array are controllers and media, with various networking devices used to string them together. scale-starting-pointAs a gross simplification, the performance of a storage array is a product of the number of controllers and the power that those controllers have (assuming they aren’t held back by the media). The capacity of a storage array is clearly a product of the amount of media.

In a simple world you would just add more media when you wanted more capacity, you would add more controllers (or increase their power) when you needed more performance, and you would do both when you need both. But this is not a simple world.

I always think that the best way to visualise the two requirements of capacity and performance is to use two different dimensions on a graph, with performance as X and capacity as Y. So let’s use that here – we start with a single flash array which has one pair of controllers and one set of flash media.

Scale Up – Adding Capacity

The simple way to achieve scale up is to just add more media to an existing array. Media is typically arranged in some sort of indivisible unit, such as a shelf of SSDs arranged in a RAID configuration (so that it has inbuilt redundancy). In principle, adding another shelf of SSDs sounds easy, but complications arise when you consider the thorny issue of metadata.

To illustrate, consider that most enterprise All Flash arrays available today have inbuilt data reduction features including deduplication. At a high level, dedupe works by computing a hash value for each block written to the array and then comparing it to a table containing all the hash values of previously written blocks. If the hash is discovered in the table, the block already exists on the array and does not need to be written again, reducing the amount of physical media used.

This hash table is an example of the metadata storage arrays need to store and maintain in order to function; other examples of features which utilise metadata are thin provisioning, snapshots and replication. This makes metadata a critical factor in the performance of a storage array

To ensure the highest speed of access, most metadata is pinned in DRAM on the storage controllers. This has a knock-on effect in that the amount of addressable storage in an array can become directly linked to the amount of DRAM in the controllers. DRAM is a costly resource and affects the manufacturing cost of the array, so there is a balancing act required in order to have enough DRAM to store the necessary metadata without inflating the cost any more than is absolutely possible.

Hopefully you see where this is going. Adding more shelves of media increases the storage capacity of an array… thereby increasing the metadata footprint… and so increasing the need for DRAM. At some point the issue becomes whether it is even possible (technically or commercially) to support the metadata overhead of adding more shelves of physical media.

Scale-Up Only Architectures

There are many storage arrays on the market which have a scale-up only architecture, with Pure Storage being an obvious example. There are various arguments presented as to why this is the case, but my view is that these architectures were used as a compromise in order to get the products to market faster (especially if they also adopt an active/passive architecture). Having said that, it’s obvious that I am biased by the fact that I work for vendor which does not have this restriction – and who believes in offering scale up and scale out. So please don’t take my word for it – go read what the other vendors say and then form your own opinion.

One counter claim by proponents of scale-up only architecture is that performance can be added by upgrading array controllers in-place, non-disruptively. In other words, the controllers can be replaced with high-specification models with more CPU cores and more DRAM, bringing more performance capability to the array. The issue here is that this is a case of diminishing returns. Moving up through the available CPU models brings step changes in cost but only incremental increases in performance.

To try and illustrate this, let’s look at some figures for the Pure Storage //m series of All Flash arrays. There are three models increasing in price and performance: m20, m50 and m70. We can get performance figures measured in maximum IOPS (measured with a 32k block size as is Pure’s preferred way) from this datasheet and we can get details of the CPU and DRAM specifications from this published validation report by the NSA. Let’s use Wikipedia’s List of Intel Xeon microprocessors page to find the list price of those CPUs and compare the increments in price to those of the maximum stated performance:

Here we can see that the list price for the CPUs alone rises 238% and then 484% moving from //m20 to //m50 to //m70, yet the maximum performance measured in IOPS rises just 147% and 200%. You can argue that the CPU price is not a perfect indicator of the selling price of each //m series array, but it’s certainly a factor. As anyone familiar with purchasing servers will attest, buying higher-spec models takes more out of your pocket than it gives you back in performance.

My point here – and this is a general observation rather than one about Pure in particular – is that this is not a cost effective scaling strategy in comparison to the alternative, which is the ability to scale out by adding more controllers.

Coming Next: Scale Out

In the next post we’ll look at Scale Out architectures and what they mean for customers with independently varying requirements for capacity and performance.

Scale out allows the cost-effective addition of more controllers and therefore more performance capability, along with other benefits such as the addition of more ports. But there are potential downsides too…

 

All Flash Arrays: Active/Active versus Active/Passive

running

I want you to imagine that you are about to run a race. You have your trainers on, your pre-race warm up is complete and you are at the start line. You look to your right… and see the guy next to you, the one with the bright orange trainers, is hopping up and down on one leg. He does have two legs – the other one is held up in the air – he’s just choosing to hop the whole race on one foot. Why?

You can’t think of a valid reason so you call across, “Hey buddy… why are you running on one leg?”

His reply blows your mind: “Because I want to be sure that, if one of my legs falls off, I can still run at the same speed”.

Welcome, my friends, to the insane world of storage marketing.

High Availability Clusters

The principles of high availability are fairly standard, whether you are discussing enterprise storage, databases or any other form of HA. The basic premise is that, to maintain service in the event of unexpected component failures, you need to have at least two of everything. In the case of storage array HA, we are usually talking about the storage controllers which are the interfaces between the outside world and the persistent media on which data resides.

Ok so let’s start at the beginning: if you only have one controller then you are running at risk, because a controller failure equals a service outage. No enterprise-class storage array would be built in this manner. So clearly you are going to want a minimum of two controllers… which happens to be the most common configuration you’ll find.

So now we have two controllers, let’s call them A and B. Each controller has CPU, memory and so on which allow it to deliver a certain level of performance – so let’s give that an arbitrary value: one controller can deliver 1P of performance. And finally, let’s remember that those controllers cost money – so let’s say that a controller capable of giving 1P of performance costs five groats.

Active/Passive Design

In a basic active/passive design, one controller (A) handles all traffic while the other (B) simply sits there waiting for its moment of glory. That moment comes when A suffers some kind of failure – and B then leaps into action, immediately replacing A by providing the same service. There might be a minor delay as the system performs a failover, but with multipathing software in place it will usually be quick enough to go unnoticed.

active-passive

So what are the downsides of active/passive? There are a few, but the most obvious one is that you are architecturally limited to seeing only 50% of your total available performance. You bought two controllers (costing you ten groats!) which means you have 2P of performance in your pocket, but you will forever be limited to a maximum of 1P of performance under this design.

Active/Active Design

In an active/active architecture, both controllers (A and B) are available to handle traffic. This means that under normal operation you now have 2P of performance – and all for the same price of ten groats. Both the overall performance and the price/performance have doubled.

active-active

What about in a failure situation? Well, if controller A fails you still have controller B functioning, which means you are now down to 1P of performance. It’s now half the performance you are used to in this architecture, but remember that 1P is still the same performance as the active/passive model. Yes, that’s right… the performance under failure is identical for both designs.

What About The Cost?

Smart people look at technical criteria and choose the one which best fits their requirements. But really smart people (like my buddy Shai Maskit) remember that commercial criteria matter too. So with that in mind, let’s go back and consider those prices a little more. For ten groats, the active/active solution delivered performance of 2P under normal operation. The active/passive solution only delivered 1P. What happens if we attempt to build an active/passive system with 2P of performance?

active-passive-larger

To build an active/passive solution which delivers 2P of performance we now need to use bigger, more powerful controllers. Architecturally that’s not much of a challenge – after all, most modern storage controllers are just x86 servers and there are almost always larger models available. The problem comes with the cost. To paraphrase Shai’s blog on this same subject:

Cost of storage controller capable of 1P performance  <  Cost of storage controller capable of 2P performance

In other words, building an active/passive system requires more expensive hardware than building a comparable active/active system. It might not be double, as in my picture, but it will sure as hell be more expensive – and that cost is going to be passed on to the end user.

Does It Scale?

Another question that really smart people ask is, “How does it scale?”. So let’s think about what happens when you want to add more performance.

In an active/active design you have the option of adding more performance by adding more controllers. As long as your architecture supports the ability for all controllers to be active concurrently, adding performance is as simple as adding nodes into a cluster.

But what happens when you add a node to an active/passive solution? Nothing. You are architecturally limited to the performance of one controller. Adding more controllers just makes the price/performance even worse. This means that the only solution for adding performance to an active/passive system is to replace the controllers with more powerful versions…

The Pure Storage Architecture

active-passive-backendPure Storage is an All Flash Array vendor who knows how to play the marketing game better than most, so let’s have a look at their architecture. The PS All Flash Array is a dual-controller design where both controllers send and receive I/Os to the hosts. But… only one controller processes I/Os to and from the underlying persistent media (the SSDs). So what should we call this design, active/active or active/passive?

According to an IDC white paper published on PS’s website, PS controllers are sized so that each controller can deliver 100% of the published performance of the array. The paper goes on to explain that under normal operation each controller is loaded to a maximum of 50% on the host side. This way, PS promises that performance under failure will be equal to the performance under normal operations.

In other words, as an architectural decision, the sum of the performance of both controllers can never be delivered.

So which of the above designs does that sound like to you? It sounds like active/passive to me, but of course that’s not going to help PS sell its flash arrays. Unsurprisingly, on the PS website the product is described as “active/active” at every opportunity.

Yet even PS’s chief talking head, Vaughn Stewart, has to ask the question, “Is the FlashArray an Active/Active or Active/Passive Architecture?” and eventually comes to the conclusion that, “Active/Active or Active/Passive may be debatable”.

There’s no debate in my view.

Conclusion

You will obviously draw your own conclusions on everything I’ve discussed above. I don’t usually pick on other AFA vendors during these posts because I’m aiming for an educational tone rather than trying to fling FUD. But I’ll be honest, it pisses me off when vendors appear to misuse technical jargon in a way which conveniently masks their less-glamorous architectural decisions.

My advice is simple. Always take your time to really look into each claim and then frame it in your own language. It’s only then that you’ll really start to understand whether something you read about is an innovative piece of design from someone like PS… or more likely just another load of marketing BS.

* Many thanks to my colleague Rob Li for the excellent running-on-one-leg metaphor

All Flash Arrays: Controllers Are The New Bottleneck

bottleneck

Today’s storage array market contains a wild variation of products: block storage, file storage or object storage; direct attached, SANs or NAS systems; fibre-channel, iSCSI or Infiniband… Even the SAN section of the market is full of diversity: from legacy hard disk drive-based arrays through the transitory step of tiered disk+flash hybrid systems and on to modern All-Flash Arrays (AFAs).

If you were partial to the odd terrible pun, you might even say that it was a bewildering array of choices. [*Array* of choices? Oh come on. If you’re expecting a higher class of humour than that, I’m afraid this is not the blog for you]

Anyway, one thing that pretty much all storage arrays have in common is the basic configuration blocks of which they comprise:

  • Array controllers
  • Internal networking
  • Persistent Media (flash or disk)

Over the course of this blog series I’ve talked a lot about both flash and disk media, but now it’s time to concentrate a little more on the other stuff – specifically, the controllers. A typical Storage Area Network delivers a lot more functionality than would be expected from just connecting a bunch of disks or flash – and it’s the controllers that are responsible for most of that added functionality.

Storage Array Controllers

Think of your storage system as a private network on which is located a load of dumb disk or flash drives. I say dumb because they can do little else other than accept I/O requests: reads and writes. The controllers are therefore required to provide the intelligence needed to present those drives to the outside world and add all of the functionality associated with enterprise-class storage:air-traffic-control

  • Resilience, automatic fault tolerance and high availability, RAID
  • Mirroring and/or replication
  • Data reduction technologies (compression, deduplication, thin provisioning)
  • Data management features (snapshots, clones, etc)
  • Management and monitoring interfaces
  • Vendor support integration such as call-home and predictive analytics

Controllers are able to add this “intelligence” because they are actually computers in their own right, acting as intermediaries between the back-end storage devices and the front-end storage fabric which connects the array to its clients. And as computers, they rely on the three classic computing resources (which I’m going to list using fancy colours so I can sneak them up on you again later in this post):

  1. Memory (DRAM)
  2. Processing (CPU)
  3. Networking

It’s the software running on the array controllers – and utilising these resources – that describes their behaviour. But with the rise of flash storage, this behaviour has had to change… drastically.

boxing-afa-hdd

Disk Array Controllers vs Flash Array Controllers

In the days when storage arrays were crammed full of disk drives, the controllers in those arrays spent lots of time waiting on mechanical latency. This meant that the CPUs within the controllers had plenty of idle cycles where they simply had to wait for data to be stored or retrieved by the disks they were addressing. To put it another way, CPU power wasn’t such an important priority when specifying the controller hardware.

This is clearly not the case with the controllers in an all flash array, since mechanical latency is a thing of the past. The result is that controller CPUs now have no time to spare – data is constantly being handled, addressed, moved and manipulated. Suddenly, the choice of CPU has a direct effect on the array’s ability to process I/O requests.

Dedupe Kills It

But there’s more. One of the biggest shifts in behaviour seen with flash arrays is the introduction of data reduction technology – specifically deduplication. This functionality, known colloquially as dedupe, intervenes in the write process to see if an exact copy of any written block already exists somewhere else on the array. copy-stampIf the block does exist, the duplicate copy does not need to be written – and instead a pointer to the existing version can be stored, saving considerable space. This pointer is an example of what we call metadata – information about data.

I will cover deduplication at greater length in another post, but for now there are three things to consider about the effect dedupe has on storage array controllers:

  1. Dedupe requires the creation of a fairly complex set of metadata structures – and for performance reasons much of this will need to reside in DRAM on the controllers. And as more data is stored on the array, the amount of metadata created increases – hence a growing dependency on the availability of (expensive) memory in those array controllers.
  2. The process of checking each incoming block (which involves calculating a hash value) and comparing against a table of metadata stored in DRAM is very CPU intensive. Thus array controllers which support DRAM have increasing requirements for (expensive) processing power.
  3. For storage arrays which run in an active/active configuration (i.e. with multiple redundant controllers, each of which actively send and receive data from the persistent storage layer), much of this information will need to be passed between controllers over the array’s internal networking.

Did you spot the similarities between this list and the colourful one from earlier? If you didn’t, you must be colour blind. Flash Array controllers are much more dependant on their resources – particularly CPU and DRAM – than disk array controllers.

Summary

Flash array controllers have to do almost everything that their ancestors, the venerable disk array controllers, used to do. But they have to do it much faster and in greater volume. bottleneck-signNot only that, but they have to do so much more… especially for the process known as data reduction. And as we’ve seen, the overhead of all these tasks causes a much greater strain on memory, processing and networking than was previously seen in the world of disk arrays – which is one of the reasons you cannot simply retrofit SSDs into a disk array architecture.

With the introduction of flash into storage, the bottleneck has moved away from the persistence layer and is now with the controllers. Over the next few articles we’ll look at what that means and consider the implications of various AFA architecture strategies on that bottleneck. After all, as is so often the case when it comes to matters of performance, you can’t always remove the bottleneck… but you can choose the one which works best for you 🙂

All Flash Arrays: Hybrid Means Compromise

hybrid-car-engine

Sometimes the transition between two technologies is long and complicated. It may be that the original technology is so well established that it’s entrenched in people’s minds as simply “the way things are” – inertia, you might say. It could be that there is more than one form of the new technology to choose from, with smart customers holding back to wait and see which emerges as a stronger contender for their investment. Or it could just be that the newer form of technology doesn’t yet deliver all of the benefits of the legacy version.

hybrid-car-toyotaThe automotive industry seems like a good example here. After over a century of using internal combustion engines, we are now at the point where electric vehicles are a serious investment for manufacturers. However, fully-electric vehicles still have issues to overcome, while there is continued debate over which approach is better: batteries or hydrogen fuel cells. Needless to say, the majority of vehicles on the road today still use what you could call the legacy method of propulsion.

However, one type of vehicle which has been successful in gaining market share is the hybrid electric vehicle. This solution attempts to offer customers the best of both worlds: the lower fuel consumption and claimed environmental benefits of an electric vehicle, but with the range, performance and cost of a fuel-powered vehicle. Not everybody believes it makes sense, but enough do to make it a worthwhile venture for the manufacturers.

Now here’s the interesting thing about hybrid vehicles… the thing that motivated me to write two paragraphs about cars instead of flash arrays… Nobody believes that hybrid electric vehicles are the permanent solution. Everybody knows that hybrids are a transient solution on the way to somewhere else. Nobody at all thinks that hybrid is the end-game. But the people who buy hybrid cars also believe that this state of affairs will not change during the period in which they own the car.

Hybrid Flash Arrays (HFAs)

There are two types of flash storage architecture which could be labelled a hybrid – those where a disk array has been repopulated with flash and those which are designed specifically for the purpose of mixing flash and disk. I’ve talked about naming conventions before, it’s a tricky subject. But for the purposes of this article I am only discussing the latter: systems where the architecture has been designed so that disk and flash co-exist as different tiers of storage. Think along the lines of Nimble Storage, Tegile and Tintri.

Why do this? Well, as with hybrid electric vehicles the idea is to bridge the gap between two technologies (disk and flash) by giving customers the best of both worlds. That means the performance of flash plus its low power, cooling and physical space requirements – combined with the density of disk and its corresponding impact on price. In other words, if disk is cheap but slow while flash is fast but expensive, HFAs are aimed at filling the gap.

Hybrid (adjective)
of mixed character; composed of different elements.
bred as a hybrid from different species or varieties.

As you can see there are a lot of synergies between this trend and that of the electric vehicle. Also, most storage systems are purchased with a five-year refresh cycle in mind, which is not dissimilar to the average length of ownership of a car. But there’s a massive difference: the rate of change in the development of flash memory technology.

In recent years the density of NAND flash has increased by orders of magnitude, especially with the introduction of 3D NAND technology and the subsequent use of Triple-Level Cell (TLC). And when the density goes up, the price comes down – closing the gap between disk and flash. In fact we’re at the point now where Wikibon predicts that “flash … will become a lower cost media than disk … for almost all storage in 2016″:

Image courtesy of Wikibon "Evolution of All-Flash Array Architectures" by David Floyer (2015)

Image courtesy of Wikibon “Evolution of All-Flash Array Architectures” by David Floyer (2015)

That’s great news for customers – but definitely not for HFA vendors.

Conclusion

And so we reach the root of the problem with HFAs. It’s not just that they are slower than All Flash Arrays. It’s not even that they rely on the guesswork of automatic tiering algorithms to move data between their tiers of disk and flash. It’s simply that their entire existence is predicated on the idea of being a transitory solution designed to bridge a gap which is already closing faster than they can fill it.

mind-the-gapIf you want proof of this, just look at the three HFA vendors I name checked earlier – all of which are rushing to bring out All Flash versions of their arrays. Nimble Storage is the only one of the three to be publicly listed – and its recent results indicate a strategic rethink may be required.

When it comes to hybrid electric vehicles, it’s true that the concept of mass-owned fully-electric cars still belongs in the future. But when it comes to hybrid flash arrays, the adoption of All-Flash is already happening today. The advice to customers looking to invest in a five-to-seven year storage project is therefore pretty simple: Mind the gap.

All Flash Arrays: SSD-based versus Ground-Up Design

design-papers

In recent articles in this series I’ve been looking at the architectural choices for building All Flash Arrays (AFAs). I surmised that there are three main approaches:

  • Hybrid Flash Arrays
  • SSD-based All Flash Arrays
  • Ground-Up All Flash Arrays (which from here on I’ll refer to as Custom Flash Module arrays or CFM arrays)

I’ve already blown metaphorical raspberries at the hybrid approach, so now it’s time to cover the other two.

SSD or CFM: The Big Question

I think the most interesting question in the AFA industry right now is the one of whether the SSD or CFM design will win. Of course, it’s easy to say “win” like that as if it’s a simple race, but this is I.T. – there’s never a simple answer. However, the reality is that each method offers benefits and drawbacks, so I’m going to use this blog post to simply describe them as I see them.

Before I do that, let me just remind you of what the vendor landscape looks like at this time:

SSD-based architecture: Right now you can buy SSD-based arrays from EMC (XtremIO), Pure Storage, Kaminario, Solidfire, HP 3PAR and Huawei to name a few. It’s fair to say that the SSD-based design has been the most common in the AFA space so far.

CFM-based architecture: On the other hand, you can now buy ground-up CFM-based arrays from Violin Memory, IBM (FlashSystem), HDS (VSP), Pure Storage (FlashArray//m) and EMC (DSSD). The latter has caused some excitement because of DSSD’s current air of mystery in the marketplace – in other words, the product isn’t yet generally available.

So which approach is “the best”?

The SSD-based Approach

If you were going to start an All Flash Array company and needed to bring a product to market as soon as possible, it’s quite likely you would go down the SSD route. Apart from anything else, flash management is hard work – and needs constant attention as new types of flash come to market. A flash hardware engineer friend of mine used to say that each new flash chip is like a snowflake – they all behave slightly differently. So by buying flash in the ready-made form of an SSD you bypass the requirement to put in all this work. The flash controller from the SSD vendor does it for you, leaving you to concentrate on the other stuff that’s needed in enterprise storage: resilience, availability, data services, etc.Samsung_840_EVO_SSD

On the other hand, it seems clear that an SSD is a package of flash pretending to behave like a disk. That often means I/Os are taking place via protocols that were designed for disk, such as Serial Attached SCSI. Also, in a unit the size of an all flash array there are likely to be many SSDs… but because each one is an isolated package of flash, they cannot work together and manage the flash holistically. In other words, if one SSD is experiencing issues due to garbage collection (for example), the others cannot take the strain.

The Ground-Up Approach

For a number of years I worked for Violin Memory, which adopted the ground-up approach at its very core. Violin’s position was that only the CFM approach could unlock the full potential benefits from NAND flash. By tightly integrating the NAND flash into its array – and by using its own controllers to manage that flash – Violin believed it could deliver the best performance in the AFA market. On the other hand, many SSD vendors build products for the consumer market where the highest levels of performance simply aren’t necessary. All that’s required is something faster than disk – it doesn’t always have to be the fastest possible solution.electronics

It could also be argued that any CFM vendor who has a good relationship with a flash fabricator (for example, Violin was partly-owned by Toshiba) could gain a competitive advantage by working on the very latest NAND flash technologies before they are available in SSD form. What’s more, SSDs represent an additional step in the process of taking NAND flash from chip to All Flash Array, which potentially means there’s an extra party needing to make their margin. Could it be that the CFM approach is more cost effective? [Update from Jan 2017: Violin Memory has now filed for chapter 11 bankruptcy protection]

SSD Economics

The argument about economics is an interesting one. Many technical people have a tendency to focus on what they know and love: technology. I’m as guilty of this as anyone – given two solutions to a problem I tend to gravitate toward the one that has the most elegant technical design, even if it isn’t necessarily the most commercially-favourable. Taking raw flash and integrating it into a custom flash module sounds great, but what is the cost of manufacturing those CFMs?

moneyManufacturing is all about economies of scale. If you design something and then build thousands of them, it will obviously cost you more per unit than if you build millions of them. How many ground-up all flash vendors are building their custom flash modules by the millions? In May 2015, IBM issued this press release in which they claimed that they were the “number one all-flash storage array vendor in 2014“. How many units did they ship? 2,100.

In just the second quarter of 2015, almost 24 million SSDs were shipped to customers, with Samsung responsible for 43.8% of that total (according to US analyst firm Trendfocus, Inc). Who do you think was able to achieve the best economy of scale?

Design Agility

The other important question is the one about New Stuff ™. We are always being told about fantastic new storage technologies that are going to change our lives, so who is best placed to adopt them first?

Again there’s an argument to be made on both sides. If the CFM flash vendor is working hand-in-glove with a fabricator, they may have access to the latest technology coming down the line. That means they can be prepared ahead of the pack – a clear competitive advantage, right?

But how agile is the CFM design? Changing the NVM media requires designing an entirely-new flash module, with all the associated hardware engineering costs such as prototyping, testing, QA and limited initial manufacturing runs.

For an SSD all flash array vendor, however, that work is performed by the SSD vendor… again somebody like Samsung, Intel or Micron who have vast infrastructures in place to perform that sort of work all the time. After all, a finished SSD must behave exactly like a disk, regardless of what NVM technology it uses under the covers.

Conclusion

There are obviously two sides to this argument. The SSD was designed to replace a fundamental bottleneck in storage systems: the hard disk drive. Ironically, it may be the fate of the SSD to become exactly what it replaced. For flash to become mainstream it was necessary to create a “flash-behaving-as-disk” package, but the flip side of this is the way that SSDs stifle the true potential of the underlying flash. (Although perhaps NVMe technologies will offer us some salvation…)question-mark-dice

However, unless you are a company the size of Samsung, Intel or Micron it seems unlikely that you would be able to retain the manufacturing agility and economies of scale required to produce custom flash modules at the price point of SSDs. Nor would you be likely to have the agility to adopt new NVM technologies at the moment that they become economically preferable to whatever medium you were using previously.

Whatever happens, you can be sure that each side will claim victory. With the entire primary data market to play for, this is a high stakes game. Every vendor has to invest a large amount of money to enter the field, so nobody wants to end up being consigned to the history books as the Betamax of flash…

For younger readers, Betamax was the loser in a battle with VHS over who would dominate the video tape market. You can read about it here. What do you mean, “What is a video tape?” Those things your parents used to watch movies on before the days of DVDs. What do you mean, “What is a DVD?” Jeez, I feel old.

All Flash Arrays: Where’s My Capacity? Effective, Usable and Raw Explained

capacity-effective-usable-raw

What’s the most important attribute to consider when you want to buy a new storage system? More critical than performance, more interesting than power and cooling requirements, maybe even more important than price? Whether it’s an enterprise-class All Flash Array, a new drive for your laptop or just a USB flash key, the first question on anybody’s mind is usually: how big is it?

Yet surprisingly, at least when it comes to All Flash Arrays, it is becoming increasingly difficult to get an accurate answer to this question. So let’s try and bring some clarity to that in this post.

Before we start, let’s quickly address the issue of binary versus decimal capacity measurements. For many years the computer industry has lived with two different definitions for capacity: memory is typically measured in binary values (powers of two) e.g. one kibibyte = 2 ^ 10 bytes = 1024 bytes. On the other hand, hard disk drive manufacturers have always used decimal values (powers of ten) e.g. one kilobyte = 10 ^ 3 bytes = 1000 bytes. Since flash memory is commonly used for the same purpose as disk drives, it is usually sold with capacities measured in decimal values – so make sure you factor this in when sizing your environments.

Definitions

Now that’s covered, let’s look at the three ways in which capacity is most commonly described: raw capacity, usable capacity and effective capacity. To ensure we don’t stray from the truth, I’m going to use definitions from SNIA, the Storage Networking Industry Association.

Raw Capacity: The sum total amount of addressable capacity of the storage devices in a storage system.

rawThe raw capacity of a flash storage product is the sum of the available capacity of each and every flash chip on which data can be stored. Imagine an SSD containing 18 Intel MLC NAND die packages, each of which has 32GB of addressable flash. This therefore contains 576GB of raw capacity. The word addressable is important because the packages actually contain additional unaddressable flash which is used for purposes such as error correction – but since this cannot be addressed by either you or the firmware of the SSD, it doesn’t count towards the raw value.

Usable Capacity: (synonymous with Formatted Capacity in SNIA terminology) The total amount of bytes available to be written after a system or device has been formatted for use… [it] is less than or equal to raw capacity.

usablePossibly one of the most abused terms in storage, usable capacity (also frequently called net capacity) is what you have left after taking raw capacity and removing the space set aside for system use, RAID parity, over-provisioning (i.e. headroom for garbage collection) and so on. It is guaranteed capacity, meaning you can be certain that you can store this amount of data regardless of what the data looks like.

That last statement is important once data reduction technologies come into play, i.e. compression, deduplication and (arguably) thin provisioning. Take 10TB of usable space and write 5TB of data into it – you now have 5TB of usable capacity remaining. Sounds simple? But take 10TB of usable space and write 5TB of data which dedupes and compresses at a 5:1 ratio – now you only need 1TB of usable space to store it, meaning you have 9TB of usable capacity left available.

Effective Capacity: The amount of data stored on a storage system … There is no way to precisely predict the effective capacity of an unloaded system. This measure is normally used on systems employing space optimization technologies.

effectiveThe effective capacity of a storage system is the amount of data you could theoretically store on it in certain conditions. These conditions are assumptions, such as “my data will reduce by a factor of x:1”. There is much danger here. The assumptions are almost always related to the ability of a dataset to reduce in some way (i.e. compress, dedupe etc) – and that cannot be known until the data is actually loaded. What’s more, data changes… as does its ability to reduce.

For this reason, effective capacity is a terrible measurement on which to make any firm plans unless you have some sort of guarantee from the vendor. This takes the form of something like, “We guarantee you x terabytes of effective capacity – and if you fail to realise this we will provide you with additional storage free of charge to meet that guarantee”. This would typically then be called the guaranteed effective capacity.

The most commonly used assumptions in the storage industry are that databases reduce by around 2:1 to 4:1, VSI systems around 5:1 to 6:1 and VDI systems anything from 8:1 right up to 18:1 or even further. This means an average data reduction of around 6:1, which is the typical ratio you will see on most vendor’s data sheets. If you take 10TB of usable capacity and assume an average of 6:1 data reduction, you therefore end up with an effective capacity of 60TB. Some vendors use a lower ratio, such as 3:1 – and this is good for you the customer, because it gives you more protection from the risk of your data not reducing.

But it’s all meaningless in the real world. You simply cannot know what the effective capacity of a storage system is until you put your data on it. And you cannot guarantee that it will remain that way if your data is going to change. Never, ever buy a storage system based purely on the effective capacity offered by the vendor unless it comes with a guarantee – and always consider whether the assumed data reduction ratio is relevant to you.

Think of it this way: if you are being sold effective capacity you are taking on the financial risk associated with that data reduction. However, if you are being sold guaranteed effective capacity, the vendor is taking on that financial risk instead. Which scenario would you prefer?

Use and Abuse of Capacities

Three different ways to measure capacity? Sounds complicated. And in complexity comes opportunities for certain flash array vendors to use smoke and mirrors in order to make their products seem more appealing. I’m going to highlight what I think are the two most common tactics here.

1. Confusing Usable Capacity with Effective Capacity

Many flash array vendors have Always-On data reduction services. This is often claimed to be for the customer’s benefit but is often more about reducing the amount of writes taking place to the flash media (to alleviate performance and endurance issues). For some vendors, not having the ability to disable data reduction can be spun around to their advantage: they simply make out that the terms usable and effective are synonymous, or splice them together into the unforgivable phrase effective usable capacity to make their products look larger. How convenient.

Let me tell you this now: every flash array has a usable capacity… it is the maximum amount of unique, incompressible data that can be stored, i.e. the effective capacity if the data reduction ratio were 1:1. I would argue that this is a much more important figure to know when you buy a flash system, because if you buy on effective capacity you are just buying into dreams. Make your vendor tell you the usable capacity at 1:1 data reduction and then calculate the price per GB based on that value.

2. My Data Reduction Is Better Than Yours

Every flash vendor thinks their data reduction technologies are the best. They can’t all be right. Yet sometimes you will hear claims so utterly ridiculous that you’ll think it’s some kind of joke. I guess we all believe what we want to hear.

Here’s the truth. Compression and deduplication are mature technologies – they have been around for decades. Nobody in the world of flash storage is going to suddenly invent something that is remarkably better than the competition. Sometimes one vendor’s tech might deliver better results than another, but on other days (and, crucially, with other datasets) that will reverse. For this reason, as well as for your own sanity, you should assume they will all be roughly the same… at least until you can test them with your data. When you evaluate competitive flash products, make them commit to a guaranteed effective capacity and then hold them to it. If they won’t commit, beware.

3. Including Savings From Thin Provisioning

Thin Provisioning is a feature whereby physical storage is allocated only when it is used, rather than being preallocated at the moment of volume creation. Thus if a 10TB volume is created and then 1TB of data written to it, only 1TB of physical storage is used. The host is fooled into thinking that it has all 10TB available, but in reality there may be far less physical capacity free on the storage system.

Some vendors show the benefits from thin provisioned volumes as a separate saving from compression and deduplication, but some roll all three up into a single number. This conveniently makes their data reduction ratios look amazing – but in my opinion this then becomes a joke. Thin provisioning isn’t data reduction, because no data is being reduced. It’s a cheap trick – and it says a lot about the vendor’s faith in their compression and deduplication technologies.

Conclusion

Don’t let your vendors set the agenda when it comes to sizing. If you are planning on buying a certain capacity of flash, make sure you know the raw and usable capacities, plus the effective capacity and the assumed data reduction ratio used to calculate it. Remember that usable should be lower than raw, while effective (which is only relevant when data reduction technologies are present) will commonly be higher.

Keep in mind that Effective Capacity = Usable Capacity X Data Reduction Factor.

Be aware that when a product with an “Always-On” data reduction architecture tells you how much capacity you have left, it’s basically a guess. In reality, it’s entirely dependent on the data you intend to write. I’ve always thought that “Always-On” was another bit of marketing spin; you could easily rename it as an “Unavoidable” or “No Choice” architecture.

In my opinion, the best data reduction technology will be selectable and granular. That means you can choose, at a LUN level, whether you want to take advantage of compression and/or deduplication or not – you aren’t tied in by the architecture. Like with all features, the architecture should allow you to have a choice rather then enforce a compromise.

Finally, remember the rule about effective capacity. If it’s guaranteed, the risk is on the vendor. If they won’t guarantee, the risk is on you.

So there we have it: clarity and choice. Because in my opinion – and no matter which way you measure it – one size simply doesn’t fit all.

All Flash Arrays: Can’t I Just Stick Some SSDs In My Disk Array?

dinosaur-velociraptor

In the previous post of this series I outlined three basic categories of All Flash Array (AFA): the hybrid AFA, the SSD-based AFA and the ground-up AFA. This post addresses the first one and is therefore aimed at answering one of the questions I hear most often: why can’t I just stick a bunch of SSDs in my existing disk array?

Data Centre Dinosaurs

Disk arrays – and in this case we are mainly talking about storage area networks – have been around for a long time. Every large company has a number of monolithic, multi-controller cache-based disk arrays in their data centre. They are the workhorses of storage: ever reliable, able to host multiple, mixed workloads and deliver predictable performance. Predictably slow performance, of course – but you mustn’t underestimate just how safe these things feel to the people who are paid to ensure the safety of their data. Add to this the full suite of data services that come with them (replication, mirroring, snapshots etc) and you have all that you could ask for.

Battleship!

Battleship!

Except of course that they are horribly expensive, terribly slow and use up vast amounts of power, cooling and floor space. They are also a dying breed, memorably described by Chris Mellor of The Register as like outdated battleships in an era of modern warfare.

The SSD Power-Up

Every large vendor has a top-end product: EMC’s VMAX, IBM’s DS8000, HDS’s VSP… and pretty much every product has the ability to use SSDs in place of disk drives. So why not fill one of these monsters with flash drives and then call it an “All Flash Array”? Just like in those computer games where your spacecraft hits a power-up and suddenly it’s bigger and faster with better weapons… surely a bunch of SSDs would convert your ageing battleship into a modern cruiser with new-found agility and seaworthiness. Ahoy! [Ok, I’ll stop with the naval analogies now]

Well, no. And to understand why, let’s look at how most disk arrays are architected.

The Classic Disk Array Architecture

Let’s consider what it takes to build a typical SAN disk array. We’ll start with the most obvious component, which is the hard disk drive itself. These things have been around for decades so we are fairly familiar with them. They offer pretty reasonable capacity of up to 4TB (in fact there are even a few 6TB models out now) but they have a limitation with regard to performance: you are unlikely to be able to drive much more than 200 transactions per second.

At this point, stop and think about the performance characteristics of a hard disk drive. Disks don’t really care if you are doing read or write I/Os – the performance is fairly symmetrical. However, there is a drastic difference in the performance of random versus sequential I/Os: each I/O operation incurs the penalty of latency. A single, large I/O is therefore much more efficient than many, small I/Os.

This is the architectural constraint of every hard disk drive and therefore the design challenge around which we much architect our disk array.

In the next section we’re going to build a disk array from scratch, considering all the possibilities that need to be accounted for. If it looks like it will take too long to read, you can skip down to the conclusion section at the end.

Building A Disk Array

We’re now going to build a disk array, so the first ingredient is clearly going to be hard drives. So let’s start by taking a bunch of disks and putting them into a shelf, or some other form of enclosure:

build-a-disk-array1

At this point the density is limited by the number of disks we can fit into the enclosure, which might typically be 25 if they are of the smaller form factor or up to around 14 if they are the larger 3.5 inch variety.

Next we’re going to need a controller – and in that controller we’re going to want a large chunk of DRAM to act as a cache in order to try and minimise the number of I/Os hitting the disk enclosure. We’ll allocate some of that DRAM to work as a read cache, in the hope that many of the reads will be hitting a small subset of the data stored, i.e. the “hot” blocks. If this gamble is successful we will have taken some load off of the disks – and that is a Good Thing:

build-a-disk-array2

The rest of the DRAM will be allocated to a write cache, because clearly we don’t want to have to incur the penalty of rotational latency every time a write I/O is performed. By writing the data to the DRAM buffer and then issuing the acknowledgement back to the client, we can take our time over writing the data to the persistent storage in the disk enclosure.

Now, this is an enterprise-class product we are trying to build, so that means there are requirements for resiliency, redundancy, online maintenance etc. It therefore seems pretty obvious that having only one controller is a single point of failure, so let’s add another one:

build-a-disk-array3

This brings up a new challenge concerning that write cache we just discussed. Since we are acknowledging writes when they hit DRAM it could be possible for controller to crash before changed blocks are persisted to disk – resulting in data loss. Also, an old copy of a block could be in the cache of one controller while a newly-changed version exists in the other one. These possibilities cannot be allowed, so we will need to mirror our write cache between the controllers. In this setup we won’t acknowledge the write until it’s been written to both write caches.

Of course, this introduces a further delay, so we’ll need to add some sort of high speed interconnect into the design to make the process of mirroring as fast as possible:

build-a-disk-array4

This mirroring may protect us from losing data in the event of a single controller failure, but what about if power to the entire system was lost? Changed blocks in the mirrored write cache would still be lost, resulting in lost data… so now we need to add some batteries to each controller in order to provide sufficient power that cached writes can be flushed to persistent media in the event of any systemic power issue:

build-a-disk-array5

That’s everything we need from the controllers – so now we need to connect them together with the disks. Traditionally, disk arrays have tended to use serial architectures to attach disks onto a back end network which essentially acts as a loop. This has some limitations in terms of performance but when your fundamental building blocks are each limited to 200 IOPS it’s hardly the end of the world:

build-a-disk-array6

So there we have it. We’ve built a disk array complete with redundant controllers and battery-backed DRAM cache. Put a respectable logo on the front and you will find this basic design used in data centres around the world.

But does it still make sense if you switch to flash?

From HDD To SSD

Let’s take our finished disk array design and replace the disks with SSDs:

build-a-disk-array7

And now let’s take a moment to consider the performance characteristics of flash: the latency is much lower than disk, meaning the penalty for performing random I/Os instead of sequential I/Os is negligible. However, the performance of read versus write I/Os is asymmetrical: writes take substantially longer than reads – especially sustained writes. What does that do to all of the design principles in our previous architecture?

  • With so many more transactions per second available from the SSDs, it no longer makes sense to use a serial / loop based back end network. Some sort of switched infrastructure is probably more suitable.
  • Because the flash media is so much faster than disk (i.e. has a significantly lower latency), we can do away with the read cache. Depending on our architecture, we may also be able to avoid using a write cache too – resulting in complete removal of the DRAM in those controllers (although this would not be the case if deduplication were to be included in the design – more on that another time).
  • If we no longer have data in DRAM, we no longer need the batteries and may also be able to remove or at least downsize the high speed network connecting the controllers.

All we are left with now is the enclosure full of SSDs – and there is an argument to be made for whether that is the most efficient method of packaging NAND flash. It’s certainly not the most dense method, which is why Violin Memory and IBM’s FlashSystem both use their own custom flash modules to package their flash.

Conclusion

Did you notice how pretty much every design decision that we made building the disk array architecture turned out to be the wrong one for a flash-based solution? This shouldn’t really be a surprise, since flash is fundamentally different to disk in its behaviour and performance.

Battleship Down!

Battleship Down!

Architecture matters. Filling a legacy disk array with SSDs simply isn’t playing to the strengths of flash. Perhaps if it were a low cost option it would be a sensible stop-gap solution, but typically the SSD options for these legacy arrays are astonishingly expensive.

So next time you look at a hybrid disk array product that’s being marketed as “all flash”, do yourself a favour. Think about the architecture. If it was designed for disk, the chances are it’ll perform like it was designed for disk. And you don’t want to end up with that sinking feeling…

Thanks to my friend and former colleague Steve “yeah” Willson for the concepts behind this blog post. Steve, I dedicate the picture of a velociraptor at the top of this page to you. You have earned it.

All Flash Arrays: What Is An AFA?

All Flash Arrays - Hybrid, SSD-based or Ground-Up

For the last couple of years I’ve been writing a series of blog posts introducing the concepts of flash-memory and solid state storage to those who aren’t part of the storage industry. I’ve covered storage fundamentals, some of what I consider to be the enduring myths of storage, a section of unashamed disk-bashing and then a lengthy set of articles about NAND flash itself.

Now it’s time to talk about all flash arrays. But first, a warning.

Although I work for a flash array vendor, I have attempted to keep my posts educational and relatively unbiased. That’s pretty tricky when talking about the flash media, but it’s next to impossible when talking about arrays themselves. So from here on this is all just my opinion – you can form your own and disagree with me if you choose – there’s a comment box below. But please be up front if you work for a vendor yourself.

All Flash Array Definition(s)

It is surprisingly hard to find a common definition of the All Flash Array (or AFA), but one thing that everyone appears to agree on is the shared nature of AFAs – they are network-attached, shared storage (i.e. SAN or NAS). After that, things get tricky.

IDC, in its 2015 paper Worldwide Flash Storage Solutions in the Datacenter Taxonomy, divides network-attached flash storage into All Flash Arrays (AFAs) and Hybrid Flash Arrays (HFAs). It further divides AFAs into categories based on their use of custom flash modules (CFMs) and solid state disks (SSDs), while HFAs are divided into categories of mixed (where both disks and flash are used) and all-flash (using CFMs or SSDs but with no disk media present).

definitionDid you make it through that last paragraph? Perhaps, like me, you find the HFA category “all-flash” confusingly named given the top-level category of “all flash arrays”? Then let’s go and see what Gartner says.

Gartner doesn’t even get as far as using the term AFA, preferring the term Solid State Array (or SSA). I once asked Gartner’s Joe Unsworth about this (I met him in the kitchen at a party – he was considerably more sober than I) and he explained that the SSA term is designed to cope with any future NAND-flash replacement technology, rather than restricting itself to flash-based arrays… which seems reasonable enough, but it does not appear to have caught on outside of Gartner.

The big catch with Gartner’s SSA definition is that, to qualify, any potential SSA product must be positioned and marketed “with specific model numbers, which cannot be used as, upgraded or converted to general-purpose or hybrid storage arrays“. In other words, if you can put a disk in it, you won’t see it on the Gartner SSA magic quadrant – a decision which has drawn criticism from industry commentators for the way it arbitrarily divides the marketplace (with a response from Gartner here).

The All Flash Array Definition at flashdba.com

So that’s IDC and Gartner covered; now I’m going to give my definition of the AFA market sector. I may not be as popular or as powerful as IDC or Gartner but hey, this is my website and I make the rules.

In my humble opinion, an AFA should be defined as follows:

An all flash array is a shared storage array in which all of the persistent storage media comprises of flash memory.

Yep, if it’s got a disk in it, it’s not an AFA.

This then leads us to consider three categories of AFA:

Hybrid AFAs

disk-platterThe hybrid AFA is the poor man’s flash array. It’s performance can best be described as “disk plus” and it is extremely likely to descend from a product which is available in all-disk, mixed (disk+SSD) or all-SSD configurations. Put simply, a hybrid AFA is a disk array in which the disks have been swapped out for SSDs. There are many of these products out there (EMC’s VNX-F and HP’s all-flash 3PAR StoreServ spring to mind) – and often the vendors are at pains to distance themselves from this definition. But the truth lies in the architecture: a hybrid AFA may contain flash in the form of SSDs, but it is fundamentally and inescapably architected for disk. I will discuss this in more detail in a future article.

SSD-based AFAs

Samsung_840_EVO_SSDThe next category covers all-flash arrays that have been architected with flash in mind but which only use flash in the form of solid state drives (SSDs). A typical SSD-based AFA consists of two controllers (usually Intel x86-based servers) and one or more shelves of SSDs – examples would be EMC’s XtremIO, Pure Storage, Kaminario and SolidFire. Since these SSDs are usually sourced from a third party vendor – as indeed are the servers – the majority of the intellectual property of an SSD-based array concerns the software running on the controllers. In other words, for the majority of SSD-based array vendors the secret sauce is all software. What’s more, that software generally doesn’t cover the tricky management of the flash media, since that task is offloaded to the SSD vendor’s firmware. And from a purely go-to-market position (imagine you were founding a company that made one of these arrays), this approach is the fastest.

Ground-Up AFAs

NAND-flashThe final category is the ground-up designed AFA – one that is architected and built from the ground up to use raw flash media straight from the NAND flash fabricators. There are, at the time of writing, only two vendors in the industry who offer this type of array: Violin Memory (my employer) and IBM with its FlashSystem. A ground-up array implements many of its features in hardware and also takes a holistic approach to managing the NAND flash media, because it is able to orchestrate the behaviour of the flash across the entire array (whereas SSDs are essentially isolated packages of flash). So in contrast with the SSD-based approach, the ground-up array has a much larger proportion of it’s intellectual property in its hardware. The flash itself is usually located on cards or boards known as Custom Flash Modules (or CFMs).

Why are there only two ground-up AFAs on the market? Well, mainly because it takes a lot longer to create this sort of product: Violin is ten years old this year, while IBM acquired the RamSan product from Texas Memory Systems who had been around since 1987. In comparison, the remaining AFA companies are mostly under six years old. It also requires hardware engineering with NAND flash knowledge, usually coupled to a relationship with a NAND flash foundry (Violin, for example, has a strategic alliance with Toshiba – the inventor of NAND flash).

Which Is Best?

Ahh well that’s the question, isn’t it? Which architecture will win the day, or will something else replace them all? So that’s what we’ll be looking at next… starting with the Hybrid Array. And while I don’t want to give away too much too soon <spoiler alert>, in my book the hybrid array is an absolute stinker.