Cloud Compromises: Constrained and Optimized CPUs

Imagine the scenario where you wonder into a clothing store to buy a t-shirt. You find a design you like in size “Medium” but it’s too tight (I guess #lockdown has been unkind to us all…) so you ask for the next size up. But when it arrives, you notice something bizarre: the “Large” is not only wider and longer, it also has an extra arm hole. Yes, there are enough holes for three arms as well as your head. Even more bizarrely, the “XL” size has four sets of sleeves, while the “Small” has only one and the “XS” none at all!

Surprisingly, this analogy is very applicable to cloud computing, where properties like compute power, memory, network bandwidth, capacity and performance are often tied together. As we saw in the previous post, a requirement for a certain amount of read I/O Operations Per Second (IOPS) can result in the need to overprovision unwanted capacity and possibly even unnecessary amounts of compute power.

But there is one situation where this causes extra levels of pain: when the workload in question is database software which is licensable by CPU cores (e.g. Oracle Database, Microsoft SQL Server).

To extend the opening analogy into total surrealism, imagine that the above clothing store exists in a state which collects a Sleeve Tax of %100 of the item value per sleeve. Now, your chosen t-shirt might be $40 but the Medium size will cost you $120, the Large $160 and the XXXXXL (suitable for octopods) a massive $360.

Luckily, the cloud providers have a way to help you out here. But it kind of sucks…

Constrained / Optimized VM Sizes

If you need large amounts of memory or I/O, the chances are you will have to pick a VM type which has a larger number of cores. But if you don’t want to buy databases licenses for these additional cores (because you don’t need the extra CPU power), you can choose to restrict the VM instance so that it only uses a subset of the total available cores. This is similar to the concept of logical partitioning which you may already have used on prem. Here are two examples of this practice from the big hyperscalers:

Microsoft Azure: Constrained vCPU capable VM sizes

Amazon Web Services: Introducing Optimize CPUs for Amazon EC2 Instances

As you can see, Microsoft and AWS have different names for this concept, but the idea is the same. You provision, let’s say, a 128 vCPU instance and then you restrict it to only using, for example, 32 vCPUs. Boom – you’ve dropped your database license requirement to 25% of the total number of vCPUs. Ok so you only get the compute performance of 25% too, but that’s still a big win on the license cost… right?

Well yes but…

There’s a snag. You still have to pay the full cost of the virtual machine despite only using a fraction of its resources. The monthly cost from the cloud provider is the same as if you were using the whole machine!

To quote Amazon (emphasis mine):

Please note that CPU optimized instances will have the same price as full-sized EC2 instances of the same size.

Or to quote the slightly longer version from Microsoft (emphasis mine):

The licensing fees charged for SQL Server or Oracle are constrained to the new vCPU count, and other products should be charged based on the new vCPU count. This results in a 50% to 75% increase in the ratio of the VM specs to active (billable) vCPUs. These new VM sizes allow customer workloads to use the same memory, storage, and I/O bandwidth while optimizing their software licensing cost. At this time, the compute cost, which includes OS licensing, remains the same one as the original size.

It’s great to be able to avoid the (potentially astronomical) cost of unnecessary database licences, but this is still a massive compromise – and the cost will add up over each month you are billed for compute cores that you literally cannot use. Again, this is the public cloud demonstrating that inefficiency and overprovisioning are to be accepted as a way of life.

Surely there must be a better way?

Spoiler alert: there IS a better way


Overprovisioning: The Curse Of The Cloud

I want you to imagine that you check in to a nice hotel. You’ve had a good day and you feel like treating yourself, so you decide to order breakfast in your room for the following morning. Why not? You fill out the menu checkboxes… Let’s see now: granola, toast, coffee, some fruit. Maybe a juice. That will do nicely.

You hang the menu on the door outside, but later a knock at the door brings bad news: You can only order a maximum of three items for breakfast. What? That’s crazy… but no amount of arguing will change their rules. Yet you really don’t want to choose just three of your five items. So what do you do? The answer is simple: you pay for a second hotel room so you can order a second breakfast.

Welcome to the world of overprovisioning.

Overprovisioning = Inefficiency

Overprovisioning is the act of deploying – and paying for – resources you don’t need, usually as a compromise to get enough of some other resource. It’s a technical challenge which results in a commercial or financial penalty. More simply, it’s just inefficiency.

The history of Information Technology is full of examples of this as well as technologies to overcome it: virtualization is a solution designed to overcome the inefficiency of deploying multiple physical servers; containerisation overcomes the inefficiency of virtualising a complete operating system many times… it’s all about being more efficient so you don’t have to pay for resources you don’t really need.

In the cloud, the biggest source of overprovisioning is the way that cloud resources like compute, memory, network bandwidth, storage capacity and performance are packaged together. If you need one of these in abundance, the chances are you will need to pay for more of the others regardless of whether they are required or not.

Overprovisioning = Compromise

As an example, at the time of writing, Google Cloud Platform’s pd-balanced block storage options provide 6 read IOPS and 6 write IOPS per GB of capacity:

* Persistent disk IOPS and throughput performance depends on disk size, instance vCPU count, and I/O block size, among other factors.

Consider a 1TB database with a reasonable requirement of 30,000 read IOPS during peak load. To build a solution capable of this, 5000GB (i.e. 5TB) of capacity would need to be provisioned… meaning 80% of the capacity is wasted!

Worse still, the “Read IOPS per instance” row of the table tells us that some of the available GCP instance types may not be able to hit our 30,000 requirement, meaning we may have to (over)provision a larger virtual machine type and pay for cores and RAM that aren’t necessary (by the way, I’m not picking on GCP here, this is common to all public clouds).

But the real sucker punch is that, if this database is licensed by CPU cores (e.g. Oracle, SQL Server) and we are having to overprovision CPU cores to get the required IOPS numbers, we now have to pay for additional, unwanted – and very expensive – database licenses.

Overprovisioning = Overpaying

My (old) front door

Let’s not imagine that this is a new phenomenon. If you’ve ever over-specced a server in your data centre (me), if you’ve ever convinced your boss that you need the Enterprise Edition of something because you thought it would be better for your career prospects (also me), or if you’ve ever spent £350 on a thermal imaging camera just so you can win an argument about whether you need a new front door (I neither admit nor deny this) then you have been overprovisioning.

It’s just that the whole nature of cloud computing, with it’s self-service, on-demand, limitlessly-scalable charateristics make it so easy to overprovision things all the time. So while the amounts may seem small when shown on the cloud provider’s Price per hour list, when you multiply them by the number of VMs, the number of regions and the number of hours in a year, they start to look massive on your bill.

And when you consider the knock on effects on database licensing, things really get painful. But let’s save that for the next blog post

Understanding Disk: Over-Provisioning

Image courtesy of Google Inc.

Image courtesy of Google Inc.

Storage for DBAs: In a recent news article in the UK, supermarket giant Tesco said it threw away almost 30,000 tonnes of food in the first half of 2013. That’s about 33,000 tons for those of you who can’t cope with the metric system. The story caused a lot of debate about the way in which we ignore the issue of wasted food – with Tesco being both criticised for the wastage and praised for publishing the figures. But waste isn’t a problem confined to just the food industry. The chances are it’s happening on your data centre too.

Stranded Capacity

As a simple example, let’s take a theoretical database which requires just under 6TB of storage capacity. To avoid complicating things we are going to ignore concepts such as striping, mirroring, caching and RAID for a moment and just pretend you want to stick a load of disks in a server. How many super-fast 15k RPM disk drives do you need if each one is 600GB? You need about ten, more or less, right? But here’s the thing: the database creates a lot of random I/O so it has a peak requirement for around 20,000 physical IOPS (I/O operations per second). Those 600GB drives can only service 200 IOPS each. So now you need 100 disks to be able to cope with the workload. 100 multiplied by 600GB is of course 60TB, so you will end up deploying sixty terabytes of capacity in order to service a database of six terabytes in size. Welcome to over-provisioning.

padlockNow here’s the real kicker. That remaining 54TB of capacity? You can’t use it. At least, you can’t use it if you want to be able to guarantee the 20,000 IOPS requirement we started out with. Any additional workload you attempt to deploy using the spare capacity will be issuing I/Os against it, resulting in more IOPS. If you were feeling lucky, you could take a gamble on trying to avoid any new workloads being present during peak requirement of the original database, but gambling is not something most people like to do in production environments. In other words, your spare capacity is stranded. Of your total disk capacity deployed, you can only ever use 10% of it.

Of course, disk arrays in the real world tend to use concepts such as wide-striping (spreading chunks of data across as many disks as possible to take advantage of all available performance) and caching (staging frequently accessed blocks in faster DRAM) but the underlying principle remains.

Short Stroking

hard-drive-short-strokingIf that previous example makes you cringe at the level of waste, prepare yourself for even worse. In my previous article I talked about the mechanical latency associated with disk, which consists of seek time (the disk head moving across the platter) and rotational latency (the rotation of the platter to the correct sector). If latency is critical (which it always, always is) then one method of reducing the latency experienced on a disk system is to limit the movement of the head, thus reducing the seek time. This is known as short stroking. If we only use the outer sectors of the platter (such as those coloured green in the diagram here), the head is guaranteed to always be closer to the next sector we require – and note that the outer sectors are preferable because they have a higher transfer rate than the inner sectors (to understand why, see the section on zones in this post). Of course this has a direct consequence in that a large portion of the disk is now unused, sometimes up to 90%. In the case of a 600GB disk short stroking may now result in only 60GB of capacity, which means ten times as many disks are necessary to provide the same capacity as a disk which is not short stroked.

Two Types of Capacity

When people talk about disk capacity then tend to be thinking of the storage capacity, i.e. the number of bytes of data that can be stored. However, while every storage device must have a storage capacity, it will also have a performance capacity – a limit to the amount of performance it can deliver, measured in I/Os per second and/or some derivative of bytes per second. And the thing about capacities is that bad things tend to happen when you try to exceed them.


In simplistic terms, performance and storage capacity are linked, with the ratio between them being specific to each type of storage. With disk drives, the performance capacity usually becomes the blocker before the storage capacity, particularly if the I/O is random (which means high numbers of IOPS). This means any overall solution you design must exceed the required storage capacity in order to deliver on performance. In the case of flash memory, the opposite is usually true: by supplying the required storage capacity there will be a surplus of performance capacity. Provide enough space and you shouldn’t need to worry about things like IOPS and bandwidth. (Although I’m not suggesting you should forego due diligence and just hope everything works out ok…)

Waste Watching

trashcanI opened with a reference to the story about food wastage – was it fair to compare this to wasted disk capacity in the data centre? One is a real world problem and the other a hypothetical idea taking place somewhere in cyberspace, right? Well maybe not. Think of all those additional disks that are required to provide the performance capacity you need, resulting in excess storage capacity which is either stranded or (in the case of short stroking) not even addressable. All those spindles require power to keep them spinning – power that mostly comes from power stations burning fossil fuels. The heat that they produce means additional cooling is required, adding to the power draw. And the additional data centre floor space means more real estate, all of which costs money and consumes resources. It’s all waste.

And that’s just the stuff you can measure. What about the end users that have to wait longer for their data because of the higher latency of disk? Those users may be expensive resources in their own right, but they are also probably using computers or smart devices which consume power, accessing your database over a network that consumes more power, via application servers that consume yet more power… all wasting time waiting for their results.

Wasted time, waster money, wasted resources. The end result of over-provisioning is not something you should under-estimate…