All Flash Arrays: Scale Up vs Scale Out (Part 2)

In the first post on the subject of Scale Up versus Scale Out, we looked at the reasons why scalability is a key requirement for storage platforms, as well as discussing the limits of Scale Up only architectures, i.e. systems where more capacity is added to the same fixed number of controllers. In this article, we look at the alternative architecture known as Scale Out.

Scale Out – Adding Performance

In a scale out architecture, the possibility exists to add more storage controllers, thereby adding more performance capability. You may remember that the performance of a storage array is approximately proportional to the number of (and power of) its storage controllers, while the capacity is determined by the amount of flash or SSD media addressed by those controllers.

In this scenario, the base model typically consists of a pair of controllers (after all, at least two are required to provide resiliency). Most scale-up-capable storage arrays work by adding more pairs of these controllers, which are considered indivisible units and have names like “K-Block” (Kaminario K2) or “X-Brick” (DellEMC XtremIO). For now, let’s just call them controller pairs.

There are a number of technical challenges to overcome when building a system which can scale out. In the first post, we covered Active/Passive solutions, where only one controller processes I/O from the underlying media. In this scenario, the performance limitations are determined by the characteristics of the single active node – with the remaining (passive) node simply waiting to spring into action in the event of a failover. Clearly this type of architecture makes less sense as the number of nodes increases above two, since the additional nodes will also be passive and therefore adding little benefit.

Scale out architectures, then, typically employ an active/active solution whereby each controller contributes more performance capacity as it joins the system. And that means building a high-availability cluster, with all of the associated cluster management technologies that entails (failover, virtual IPs, protection against split brain scenarios, etc). No wonder some vendors stick to Scale Up only.

Of course, the biggest issue with a Scale Out only architecture is the question of what happens when additional capacity needs to be added. The answer is that another controller or set of controllers must be added too, complete with their attached storage media – but the controllers are an expensive and unnecessary addition if the only requirement is simply more capacity.

Scale Up and Scale Out – The Perfect Solution?

So what we’ve seen here is that a Scale Up architecture allows for more capacity to be added to existing controllers, while a Scale Out architecture allows for more performance to be added to existing capacity. It would therefore seem logical that the ultimate goal is to build a system which can (independently) scale both up and out. Scaling up allows more capacity to be added without the cost of more controllers. Scaling out allows more performance (controllers) to be added when required. And thus the characteristics of the storage platform can be extended in either of these two dimensions as needed. Perfect?

Arrays which can support both scale up and scale out have been surprisingly rare in the All Flash market so far, but they do exist. The concept is simple: customers that purchase storage typically do so over a three to five year period. Most people simply cannot guess how their requirements will change in that period of time… more users, more customers, more data? Yes, probably – but how much and over what time period? Choosing an architecture which allows independent (non-disruptive) scale of both capacity and performance insures against the risk associated with capacity planning, while also allowing customers to start off by purchasing only what they need today and then expanding at their own pace. Sounds a bit like cloud computing when you put it like that, doesn’t it?

Which conveniently leads us to…

The Future: Dynamically Composable / Disaggregated Storage?

None of us know how the future will look, especially in the technology industry. But one vision of the future comes from my employer, Kaminario, who is one of a number of companies exploring the concept of composable infrastructure. I think this is a very interesting new direction, which is why I’m writing about it here – but since I’m an employee of this vendor I must first give you a mandatory sales warning and you should treat the next paragraphs with a healthy dose of “well he would say that, wouldn’t he?”

In a dynamically composable storage environment, the two elements we have been discussing above (storage controllers and storage media) become completely disaggregated so that any shelf of media can be addressed by any set of controllers. These sets of systems can then be dynamically composed, so that – out of a set of multiple shelves of media and storage controllers – subsets of the two can be brought together to form virtual private arrays designed to serve specific applications.

© 2017 Kaminario

If you think about the potential of this method of presentation, it opens up many possibilities. For example, the dynamic and non-disruptive reallocation of storage resources offers customers the ability to constantly adapt to unpredictable workloads. Furthermore, concepts from AI can be used to automate this reallocation and even predict changes to requirements in advance.

This is useful for any customer with a complex estate of mixed workloads, but it’s incredibly useful to Cloud Service Providers and MSPs. After all, these organisations have no knowledge of what their customers are doing on their systems or what they will do in the future, so the ability to dynamically adapt performance and capacity requirements could provide a competitive edge.

Conclusion

We all know that I.T. is full of buzzwords, like agility or transformation. Is scalability another one? Maybe it’s in danger of becoming one. But if you think about it, one of the fundamental characteristics of any platform is its ability to scale. It essentially defines the limitations of the platform that you may meet as you grow – and growth is pretty much the point of any business. So take the time to understand what scale actually means in a storage context and you might avoid learning about those limitations the hard way…

All Flash Arrays: Scale Up vs Scale Out (Part 1)

Imagine you want to buy some more storage for your laptop – let’s say an external USB drive for backups. What are the fundamental questions you need to ask before you get down to the thorny issue of price? Typically, there is only one key question:

  • How much capacity do I need?

Of course there will be lesser questions, such as connectivity, brand, colour, weight, what it actually looks like and so on. But those are qualifying questions – ways to filter the drop down list on Amazon so you have less decisions to make.

However, different rules apply when buying enterprise storage: we might care less about colour and more about physical density, power requirements and the support capabilities of the vendor. We might care less about what the product looks like and more about how simple it is to administer. But most of all, for enterprise storage, there are now two fundamental questions instead of one:

  • How much capacity do I need?
  • How much performance do I need?

Of course, there is the further issue of what exactly we mean by “performance”, given that it can be measured in a number of different ways. The answer is dependant on the platform being used: for disk-based storage systems it was typically the number of I/O Operations Per Second (IOPS), while for modern storage systems it is more likely to be the bandwidth (the volume of data read and written per second). And just to add a little spice, when considering IOPS or bandwidth on All Flash array platforms the read/write ratio is also important.

So the actual requirement for, say, a three-node data warehouse cluster with 200 users might turn out to be:

“I need 50TB of usable capacity and the ability to deliver 1GB/second at 90% reads. What will this cost?”

Are we ready to spec a solution yet? Not quite. First we have to consider Rule #1.

Rule #1: Requirements Change

Most enterprise storage customers purchase their hardware for use over a period of time – with the most typical period being five years. So it stands to reason that whatever your requirements are at the time of purchase, they will change before the platform is retired. In fact, they will change many times. graph-growingData volumes will grow, because data volumes only ever get bigger… right? But also, those 200 users might grow to 500 users. The three node cluster might be extended to six nodes. The chances are that, in some way, you will need more performance and/or more capacity.

The truth is that, while customers buy their hardware based on a five year period, now more than ever they cannot even predict what will happen over the next 12 months. Forgive the cliche, the only thing predictable is unpredictability.

So as a customer in need of enterprise storage, what do you do? Clearly you won’t want to purchase all of the capacity and performance you might possibly maybe need in the future. That would be a big up-front investment which may never achieve a return. So your best bet is to purchase a storage platform which can scale. This way you can start with what you need and scale as your requirements grow.

This is where architecture becomes key.

Scalability is an Architectural Decision

You may remember from a previous post that the basic building blocks of any storage array are controllers and media, with various networking devices used to string them together. scale-starting-pointAs a gross simplification, the performance of a storage array is a product of the number of controllers and the power that those controllers have (assuming they aren’t held back by the media). The capacity of a storage array is clearly a product of the amount of media.

In a simple world you would just add more media when you wanted more capacity, you would add more controllers (or increase their power) when you needed more performance, and you would do both when you need both. But this is not a simple world.

I always think that the best way to visualise the two requirements of capacity and performance is to use two different dimensions on a graph, with performance as X and capacity as Y. So let’s use that here – we start with a single flash array which has one pair of controllers and one set of flash media.

Scale Up – Adding Capacity

The simple way to achieve scale up is to just add more media to an existing array. Media is typically arranged in some sort of indivisible unit, such as a shelf of SSDs arranged in a RAID configuration (so that it has inbuilt redundancy). In principle, adding another shelf of SSDs sounds easy, but complications arise when you consider the thorny issue of metadata.

To illustrate, consider that most enterprise All Flash arrays available today have inbuilt data reduction features including deduplication. At a high level, dedupe works by computing a hash value for each block written to the array and then comparing it to a table containing all the hash values of previously written blocks. If the hash is discovered in the table, the block already exists on the array and does not need to be written again, reducing the amount of physical media used.

This hash table is an example of the metadata storage arrays need to store and maintain in order to function; other examples of features which utilise metadata are thin provisioning, snapshots and replication. This makes metadata a critical factor in the performance of a storage array

To ensure the highest speed of access, most metadata is pinned in DRAM on the storage controllers. This has a knock-on effect in that the amount of addressable storage in an array can become directly linked to the amount of DRAM in the controllers. DRAM is a costly resource and affects the manufacturing cost of the array, so there is a balancing act required in order to have enough DRAM to store the necessary metadata without inflating the cost any more than is absolutely possible.

Hopefully you see where this is going. Adding more shelves of media increases the storage capacity of an array… thereby increasing the metadata footprint… and so increasing the need for DRAM. At some point the issue becomes whether it is even possible (technically or commercially) to support the metadata overhead of adding more shelves of physical media.

Scale-Up Only Architectures

There are many storage arrays on the market which have a scale-up only architecture, with Pure Storage being an obvious example. There are various arguments presented as to why this is the case, but my view is that these architectures were used as a compromise in order to get the products to market faster (especially if they also adopt an active/passive architecture). Having said that, it’s obvious that I am biased by the fact that I work for vendor which does not have this restriction – and who believes in offering scale up and scale out. So please don’t take my word for it – go read what the other vendors say and then form your own opinion.

One counter claim by proponents of scale-up only architecture is that performance can be added by upgrading array controllers in-place, non-disruptively. In other words, the controllers can be replaced with high-specification models with more CPU cores and more DRAM, bringing more performance capability to the array. The issue here is that this is a case of diminishing returns. Moving up through the available CPU models brings step changes in cost but only incremental increases in performance.

To try and illustrate this, let’s look at some figures for the Pure Storage //m series of All Flash arrays. There are three models increasing in price and performance: m20, m50 and m70. We can get performance figures measured in maximum IOPS (measured with a 32k block size as is Pure’s preferred way) from this datasheet and we can get details of the CPU and DRAM specifications from this published validation report by the NSA. Let’s use Wikipedia’s List of Intel Xeon microprocessors page to find the list price of those CPUs and compare the increments in price to those of the maximum stated performance:

Here we can see that the list price for the CPUs alone rises 238% and then 484% moving from //m20 to //m50 to //m70, yet the maximum performance measured in IOPS rises just 147% and 200%. You can argue that the CPU price is not a perfect indicator of the selling price of each //m series array, but it’s certainly a factor. As anyone familiar with purchasing servers will attest, buying higher-spec models takes more out of your pocket than it gives you back in performance.

My point here – and this is a general observation rather than one about Pure in particular – is that this is not a cost effective scaling strategy in comparison to the alternative, which is the ability to scale out by adding more controllers.

Coming Next: Scale Out

In the next post we’ll look at Scale Out architectures and what they mean for customers with independently varying requirements for capacity and performance.

Scale out allows the cost-effective addition of more controllers and therefore more performance capability, along with other benefits such as the addition of more ports. But there are potential downsides too…