What’s the most important attribute to consider when you want to buy a new storage system? More critical than performance, more interesting than power and cooling requirements, maybe even more important than price? Whether it’s an enterprise-class All Flash Array, a new drive for your laptop or just a USB flash key, the first question on anybody’s mind is usually: how big is it?
Yet surprisingly, at least when it comes to All Flash Arrays, it is becoming increasingly difficult to get an accurate answer to this question. So let’s try and bring some clarity to that in this post.
Before we start, let’s quickly address the issue of binary versus decimal capacity measurements. For many years the computer industry has lived with two different definitions for capacity: memory is typically measured in binary values (powers of two) e.g. one kibibyte = 2 ^ 10 bytes = 1024 bytes. On the other hand, hard disk drive manufacturers have always used decimal values (powers of ten) e.g. one kilobyte = 10 ^ 3 bytes = 1000 bytes. Since flash memory is commonly used for the same purpose as disk drives, it is usually sold with capacities measured in decimal values – so make sure you factor this in when sizing your environments.
Now that’s covered, let’s look at the three ways in which capacity is most commonly described: raw capacity, usable capacity and effective capacity. To ensure we don’t stray from the truth, I’m going to use definitions from SNIA, the Storage Networking Industry Association.
Raw Capacity: The sum total amount of addressable capacity of the storage devices in a storage system.
The raw capacity of a flash storage product is the sum of the available capacity of each and every flash chip on which data can be stored. Imagine an SSD containing 18 Intel MLC NAND die packages, each of which has 32GB of addressable flash. This therefore contains 576GB of raw capacity. The word addressable is important because the packages actually contain additional unaddressable flash which is used for purposes such as error correction – but since this cannot be addressed by either you or the firmware of the SSD, it doesn’t count towards the raw value.
Usable Capacity: (synonymous with Formatted Capacity in SNIA terminology) The total amount of bytes available to be written after a system or device has been formatted for use… [it] is less than or equal to raw capacity.
Possibly one of the most abused terms in storage, usable capacity is what you have left after taking raw capacity and removing the space set aside for system use, RAID parity, over-provisioning (i.e. headroom for garbage collection) and so on. It is guaranteed capacity, meaning you can be certain that you can store this amount of data regardless of what the data looks like.
That last statement is important once data reduction technologies come into play, i.e. compression, deduplication and (arguably) thin provisioning. Take 10TB of usable space and write 5TB of data into it – you now have 5TB of usable capacity remaining. Sounds simple? But take 10TB of usable space and write 5TB of data which dedupes and compresses at a 5:1 ratio – now you only need 1TB of usable space to store it, meaning you have 9TB of usable capacity left available.
Effective Capacity: The amount of data stored on a storage system … There is no way to precisely predict the effective capacity of an unloaded system. This measure is normally used on systems employing space optimization technologies.
The effective capacity of a storage system is the amount of data you could theoretically store on it in certain conditions. These conditions are assumptions, such as “my data will reduce by a factor of x:1”. There is much danger here. The assumptions are almost always related to the ability of a dataset to reduce in some way (i.e. compress, dedupe etc) – and that cannot be known until the data is actually loaded. What’s more, data changes… as does its ability to reduce.
For this reason, effective capacity is a terrible measurement on which to make any firm plans unless you have some sort of guarantee from the vendor. This takes the form of something like, “We guarantee you x terabytes of effective capacity – and if you fail to realise this we will provide you with additional storage free of charge to meet that guarantee”. This would typically then be called the guaranteed effective capacity.
The most commonly used assumptions in the storage industry are that databases reduce by around 2:1 to 4:1, VSI systems around 5:1 to 6:1 and VDI systems anything from 8:1 right up to 18:1 or even further. This means an average data reduction of around 6:1, which is the typical ratio you will see on most vendor’s data sheets. If you take 10TB of usable capacity and assume an average of 6:1 data reduction, you therefore end up with an effective capacity of 60TB. Some vendors use a lower ratio, such as 3:1 – and this is good for you the customer, because it gives you more protection from the risk of your data not reducing.
But it’s all meaningless in the real world. You simply cannot know what the effective capacity of a storage system is until you put your data on it. And you cannot guarantee that it will remain that way if your data is going to change. Never, ever buy a storage system based purely on the effective capacity offered by the vendor unless it comes with a guarantee – and always consider whether the assumed data reduction ratio is relevant to you.
Think of it this way: if you are being sold effective capacity you are taking on the financial risk associated with that data reduction. However, if you are being sold guaranteed effective capacity, the vendor is taking on that financial risk instead. Which scenario would you prefer?
Use and Abuse of Capacities
Three different ways to measure capacity? Sounds complicated. And in complexity comes opportunities for certain flash array vendors to use smoke and mirrors in order to make their products seem more appealing. I’m going to highlight what I think are the two most common tactics here.
1. Confusing Usable Capacity with Effective Capacity
Many flash array vendors have Always-On data reduction services. This is often claimed to be for the customer’s benefit but is often more about reducing the amount of writes taking place to the flash media (to alleviate performance and endurance issues). For some vendors, not having the ability to disable data reduction can be spun around to their advantage: they simply make out that the terms usable and effective are synonymous, or splice them together into the unforgivable phrase effective usable capacity to make their products look larger. How convenient.
Let me tell you this now: every flash array has a usable capacity… it is the maximum amount of unique, incompressible data that can be stored, i.e. the effective capacity if the data reduction ratio were 1:1. I would argue that this is a much more important figure to know when you buy a flash system, because if you buy on effective capacity you are just buying into dreams. Make your vendor tell you the usable capacity at 1:1 data reduction and then calculate the price per GB based on that value.
2. My Data Reduction Is Better Than Yours
Every flash vendor thinks their data reduction technologies are the best. They can’t all be right. Yet sometimes you will hear claims so utterly ridiculous that you’ll think it’s some kind of joke. I guess we all believe what we want to hear.
Here’s the truth. Compression and deduplication are mature technologies – they have been around for decades. Nobody in the world of flash storage is going to suddenly invent something that is remarkably better than the competition. Sometimes one vendor’s tech might deliver better results than another, but on other days (and, crucially, with other datasets) that will reverse. For this reason, as well as for your own sanity, you should assume they will all be roughly the same… at least until you can test them with your data. When you evaluate competitive flash products, make them commit to a guaranteed effective capacity and then hold them to it. If they won’t commit, beware.
3. Including Savings From Thin Provisioning
Thin Provisioning is a feature whereby physical storage is allocated only when it is used, rather than being preallocated at the moment of volume creation. Thus if a 10TB volume is created and then 1TB of data written to it, only 1TB of physical storage is used. The host is fooled into thinking that it has all 10TB available, but in reality there may be far less physical capacity free on the storage system.
Some vendors show the benefits from thin provisioned volumes as a separate saving from compression and deduplication, but some roll all three up into a single number. This conveniently makes their data reduction ratios look amazing – but in my opinion this then becomes a joke. Thin provisioning isn’t data reduction, because no data is being reduced. It’s a cheap trick – and it says a lot about the vendor’s faith in their compression and deduplication technologies.
Don’t let your vendors set the agenda when it comes to sizing. If you are planning on buying a certain capacity of flash, make sure you know the raw and usable capacities, plus the effective capacity and the assumed data reduction ratio used to calculate it. Remember that usable should be lower than raw, while effective (which is only relevant when data reduction technologies are present) will commonly be higher.
Keep in mind that Effective Capacity = Usable Capacity X Data Reduction Factor.
Be aware that when a product with an “Always-On” data reduction architecture tells you how much capacity you have left, it’s basically a guess. In reality, it’s entirely dependent on the data you intend to write. I’ve always thought that “Always-On” was another bit of marketing spin; you could easily rename it as an “Unavoidable” or “No Choice” architecture.
In my opinion, the best data reduction technology will be selectable and granular. That means you can choose, at a LUN level, whether you want to take advantage of compression and/or deduplication or not – you aren’t tied in by the architecture. Like with all features, the architecture should allow you to have a choice rather then enforce a compromise.
Finally, remember the rule about effective capacity. If it’s guaranteed, the risk is on the vendor. If they won’t guarantee, the risk is on you.
So there we have it: clarity and choice. Because in my opinion – and no matter which way you measure it – one size simply doesn’t fit all.