July 6, 2016 Leave a comment
Today’s storage array market contains a wild variation of products: block storage, file storage or object storage; direct attached, SANs or NAS systems; fibre-channel, iSCSI or Infiniband… Even the SAN section of the market is full of diversity: from legacy hard disk drive-based arrays through the transitory step of tiered disk+flash hybrid systems and on to modern All-Flash Arrays (AFAs).
If you were partial to the odd terrible pun, you might even say that it was a bewildering array of choices. [*Array* of choices? Oh come on. If you’re expecting a higher class of humour than that, I’m afraid this is not the blog for you]
Anyway, one thing that pretty much all storage arrays have in common is the basic configuration blocks of which they comprise:
- Array controllers
- Internal networking
- Persistent Media (flash or disk)
Over the course of this blog series I’ve talked a lot about both flash and disk media, but now it’s time to concentrate a little more on the other stuff – specifically, the controllers. A typical Storage Area Network delivers a lot more functionality than would be expected from just connecting a bunch of disks or flash – and it’s the controllers that are responsible for most of that added functionality.
Storage Array Controllers
Think of your storage system as a private network on which is located a load of dumb disk or flash drives. I say dumb because they can do little else other than accept I/O requests: reads and writes. The controllers are therefore required to provide the intelligence needed to present those drives to the outside world and add all of the functionality associated with enterprise-class storage:
- Resilience, automatic fault tolerance and high availability, RAID
- Mirroring and/or replication
- Data reduction technologies (compression, deduplication, thin provisioning)
- Data management features (snapshots, clones, etc)
- Management and monitoring interfaces
- Vendor support integration such as call-home and predictive analytics
Controllers are able to add this “intelligence” because they are actually computers in their own right, acting as intermediaries between the back-end storage devices and the front-end storage fabric which connects the array to its clients. And as computers, they rely on the three classic computing resources (which I’m going to list using fancy colours so I can sneak them up on you again later in this post):
- Memory (DRAM)
- Processing (CPU)
It’s the software running on the array controllers – and utilising these resources – that describes their behaviour. But with the rise of flash storage, this behaviour has had to change… drastically.
Disk Array Controllers vs Flash Array Controllers
In the days when storage arrays were crammed full of disk drives, the controllers in those arrays spent lots of time waiting on mechanical latency. This meant that the CPUs within the controllers had plenty of idle cycles where they simply had to wait for data to be stored or retrieved by the disks they were addressing. To put it another way, CPU power wasn’t such an important priority when specifying the controller hardware.
This is clearly not the case with the controllers in an all flash array, since mechanical latency is a thing of the past. The result is that controller CPUs now have no time to spare – data is constantly being handled, addressed, moved and manipulated. Suddenly, the choice of CPU has a direct effect on the array’s ability to process I/O requests.
Dedupe Kills It
But there’s more. One of the biggest shifts in behaviour seen with flash arrays is the introduction of data reduction technology – specifically deduplication. This functionality, known colloquially as dedupe, intervenes in the write process to see if an exact copy of any written block already exists somewhere else on the array. If the block does exist, the duplicate copy does not need to be written – and instead a pointer to the existing version can be stored, saving considerable space. This pointer is an example of what we call metadata – information about data.
I will cover deduplication at greater length in another post, but for now there are three things to consider about the effect dedupe has on storage array controllers:
- Dedupe requires the creation of a fairly complex set of metadata structures – and for performance reasons much of this will need to reside in DRAM on the controllers. And as more data is stored on the array, the amount of metadata created increases – hence a growing dependency on the availability of (expensive) memory in those array controllers.
- The process of checking each incoming block (which involves calculating a hash value) and comparing against a table of metadata stored in DRAM is very CPU intensive. Thus array controllers which support DRAM have increasing requirements for (expensive) processing power.
- For storage arrays which run in an active/active configuration (i.e. with multiple redundant controllers, each of which actively send and receive data from the persistent storage layer), much of this information will need to be passed between controllers over the array’s internal networking.
Did you spot the similarities between this list and the colourful one from earlier? If you didn’t, you must be colour blind. Flash Array controllers are much more dependant on their resources – particularly CPU and DRAM – than disk array controllers.
Flash array controllers have to do almost everything that their ancestors, the venerable disk array controllers, used to do. But they have to do it much faster and in greater volume. Not only that, but they have to do so much more… especially for the process known as data reduction. And as we’ve seen, the overhead of all these tasks causes a much greater strain on memory, processing and networking than was previously seen in the world of disk arrays – which is one of the reasons you cannot simply retrofit SSDs into a disk array architecture.
With the introduction of flash into storage, the bottleneck has moved away from the persistence layer and is now with the controllers. Over the next few articles we’ll look at what that means and consider the implications of various AFA architecture strategies on that bottleneck. After all, as is so often the case when it comes to matters of performance, you can’t always remove the bottleneck… but you can choose the one which works best for you🙂