The Biggest Gap In The Clouds? High Performance RDBMS

Over the course of the last few blog posts, we’ve looked at how an increasing number of database workloads are migrating to the cloud, how there is more than one path to get there… and why overprovisioning is one of the biggest challenges to overcome.

We’re talking about business-critical application workloads here: big, complex, demanding, mission-critical, sensitive, performance-hungry… When on-prem, they are almost certainly running on dedicated, high-end infrastructure. And that’s a potential issue when you then migrate them to run on “someone else’s computer“.

As we’ve discussed before, the cloud is really a big pool of discrete resources and services, all of which are available on demand. You want a managed PostgreSQL instance? Click! It’s yours. You want three hundred virtual machines on which you can install your own software? Clickety-click! Off you go. If you’ve got the budget, the cloud has got a way for you to spend it. But underneath it all, whether you are using PaaS databases supplied by the cloud provider or installing the database software on IaaS systems, you are sharing that infrastructure – and the available performance – with the rest of the world.

Cloud Outcomes: Optimization versus Modernization

For some database workloads moving to the cloud, the modernization path will be the best fit, which means they will likely move to Platform-as-a-Service solutions where the day-to-day management of the database, operating system and infrastructure is taken care of by the cloud provider. Some examples of this path: on-prem SQL Server databases moving to Azure SQL Managed Instances; Oracle Databases moving to AWS’s Oracle RDS solution, etc.

But there is usually a certain class of database workload which doesn’t easily fit into these pre-packaged PaaS solutions: the big, the complex, the gnarly… the monsters of your data centre. And they inevitably end up in Infrastructure-as-a-Service… or stuck on-prem. For customers choosing the IaaS route (the “optimization path” in cloud-speak), the cloud provider manages the infrastructure but the customer is still responsible for the database and operating system.

Obviously, IaaS has a higher management overhead than PaaS, but often the journey to IaaS is simpler (essentially more of a lift and shift approach), while PaaS solutions often require a more complex migration. Especially with some cloud providers, where the recommended PaaS solution is actually a different database product entirely (for example, Oracle customers moving to Google Cloud or Microsoft Azure will be recommended by those cloud providers to move to Cloud SQL and Managed PostgreSQL respectively).

I/O Performance Is The Biggest Challenge

My view is that PaaS solutions are the best path for all appropriate workloads, but there will always be some outliers which need to move to IaaS. Almost by definition, those are the most high-profile, demanding, expensive, revenue-affecting… in fact… the most interesting workloads. And in all the cases I’ve seen*, I/O performance has been the limiting factor.

It’s relatively easy to get a lot of compute power in the cloud. But as soon as you start ramping up the amount of data you need to read and write, or demanding that those reads and writes have very fast, predictable response times, you hit problems. In other words, if latency, IOPS or throughput are your metrics of choice, you’d better be ready to start doing unnatural things.

And it’s not necessarily the case that your required level of performance cannot be achieved. Often, it’s more correct to say that your required levels of performance cannot be achieved at an acceptable cost. Because it turns out that the following statement is just as true in the cloud as it ever was on-prem:

Performance and Cost are two sides of the same coin…

This is why I believe that the biggest gap in the cloud providers’ product portfolios today is in the area of high performance relational databases: primarily Oracle Database and Microsoft SQL Server. The PaaS solutions are designed for the average workloads, not the high-end. A complex database running on, for example, Oracle Exadata will struggle to run on a vanilla IaaS deployment – while the refactoring required to take that database and migrate it to Managed PostreSQL is almost unimaginable.

And so, now, we have finally come to the point: the Silk Platform. I have spent the last few years at Silk bringing to market a solution specifically for this problem. Over the next few posts, I’m going to explain what Silk is and how we built our solution to deliver very high performance for databases in the cloud.

But you won’t have to rely on just my word for it. I have backup šŸ™‚

* I work primarily with customers of Microsoft Azure and Google Cloud

Advertisement

Cloud Compromises: Constrained and Optimized CPUs

Imagine the scenario where you wonder into a clothing store to buy a t-shirt. You find a design you like in size “Medium” but it’s too tight (I guess #lockdown has been unkind to us all…) so you ask for the next size up. But when it arrives, you notice something bizarre: the “Large” is not only wider and longer, it also has an extra arm hole. Yes, there are enough holes for three arms as well as your head. Even more bizarrely, the “XL” size has four sets of sleeves, while the “Small” has only one and the “XS” none at all!

Surprisingly, this analogy is very applicable to cloud computing, where properties like compute power, memory, network bandwidth, capacity and performance are often tied together. As we saw in the previous post, a requirement for a certain amount of read I/O Operations Per Second (IOPS) can result in the need to overprovision unwanted capacity and possibly even unnecessary amounts of compute power.

But there is one situation where this causes extra levels of pain: when the workload in question is database software which is licensable by CPU cores (e.g. Oracle Database, Microsoft SQL Server).

To extend the opening analogy into total surrealism, imagine that the above clothing store exists in a state which collects a Sleeve Tax of %100 of the item value per sleeve. Now, your chosen t-shirt might be $40 but the Medium size will cost you $120, the Large $160 and the XXXXXL (suitable for octopods) a massive $360.

Luckily, the cloud providers have a way to help you out here. But it kind of sucks…

Constrained / Optimized VM Sizes

If you need large amounts of memory or I/O, the chances are you will have to pick a VM type which has a larger number of cores. But if you don’t want to buy databases licenses for these additional cores (because you don’t need the extra CPU power), you can choose to restrict the VM instance so that it only uses a subset of the total available cores. This is similar to the concept of logical partitioning which you may already have used on prem. Here are two examples of this practice from the big hyperscalers:

Microsoft Azure: Constrained vCPU capable VM sizes

Amazon Web Services: Introducing Optimize CPUs for Amazon EC2 Instances

As you can see, Microsoft and AWS have different names for this concept, but the idea is the same. You provision, let’s say, a 128 vCPU instance and then you restrict it to only using, for example, 32 vCPUs. Boom – you’ve dropped your database license requirement to 25% of the total number of vCPUs. Ok so you only get the compute performance of 25% too, but that’s still a big win on the license cost… right?

Well yes but…

There’s a snag. You still have to pay the full cost of the virtual machine despite only using a fraction of its resources. The monthly cost from the cloud provider is the same as if you were using the whole machine!

To quote Amazon (emphasis mine):

Please note that CPU optimized instances will have the same price as full-sized EC2 instances of the same size.

Or to quote the slightly longer version from Microsoft (emphasis mine):

The licensing fees charged for SQL Server or Oracle are constrained to the new vCPU count, and other products should be charged based on the new vCPU count. This results in a 50% to 75% increase in the ratio of the VM specs to active (billable) vCPUs. These new VM sizes allow customer workloads to use the same memory, storage, and I/O bandwidth while optimizing their software licensing cost. At this time, the compute cost, which includes OS licensing, remains the same one as the original size.

It’s great to be able to avoid the (potentially astronomical) cost of unnecessary database licences, but this is still a massive compromise – and the cost will add up over each month you are billed for compute cores that you literally cannot use. Again, this is the public cloud demonstrating that inefficiency and overprovisioning are to be accepted as a way of life.

Surely there must be a better way?

Spoiler alert: there IS a better way

AWR Generator

tools

As part of my role at Violin I spend a lot of time profiling customer’s databases to see how their performance varies over time. The easiest way to do this (since I often don’t have remote access) is to ask for lots of AWR reports. One single report covering a large span of time is useless, because all peaks and troughs are averaged out into a meaningless hum of noise, so I always ask for one report per snapshot period (usually an hour) covering many hours or days. And I always ask for text instead of HTML because then I can process them automatically.

That’s all well and good, but generating a hundred AWR reports is a laborious and mind-numbingly dull task. So to make things easier I’ve written a SQL script to do it. I know there are many other scripts out there to do this, but none of them met the criteria I needed – mainly that they were SQL not shell (for portability) and that they didn’t create temporary objects (such as directories).

If it is of use to anyone then I offer it up here:

https://flashdba.com/database/useful-scripts/awr-generator/

Likewise if you manage to break it, please let me know! Thanks to Paul for confirming that it works on RAC and Windows systems (you know you love testing my SQL…)

Querying DBA_HIST_SNAPSHOT and DBA_HIST_SYSSTAT

Why is it so hard in Oracle to get a decent answer to the question of how many seconds elapsed between two timestamps?

I’m looking at DBA_HIST_SNAPSHOT and wondering how many seconds each snapshot spans, because later on I want to use this to generate metrics like Redo Size per Second, etc.

SQL> desc dba_hist_snapshot
 Name                                Null?    Type
 ----------------------------------- -------- ------------------------
 SNAP_ID                             NOT NULL NUMBER
 DBID                                NOT NULL NUMBER
 INSTANCE_NUMBER                     NOT NULL NUMBER
 STARTUP_TIME                        NOT NULL TIMESTAMP(3)
 BEGIN_INTERVAL_TIME                 NOT NULL TIMESTAMP(3)
 END_INTERVAL_TIME                   NOT NULL TIMESTAMP(3)

So surely I can just subtract the begin time from the end time, right?

SQL> select SNAP_ID, END_INTERVAL_TIME - BEGIN_INTERVAL_TIME as elapsed
  2  from DBA_HIST_SNAPSHOT
  3  where SNAP_ID < 6
  4 order by 1;
   SNAP_ID ELAPSED
---------- ------------------------------
         1 +000000000 00:00:44.281
         2 +000000000 00:04:32.488
         3 +000000000 00:51:39.969
         4 +000000000 01:00:01.675
         5 +000000000 00:59:01.697

Gaaaah…. It’s given me one of those stupid interval datatypes! I’ve never been a fan of these. I just want to know the value in seconds.

Luckily I can cast a timestamp (the datatype in DBA_HIST_SNAPSHOT) as a good old fashioned DATE. We love dates, you can treat them as numbers and add, subtract etc. The integer values represent days, so you just need to multiply by 24 x 60 x 60 =Ā 86400 to get seconds:

SQL> select SNAP_ID, END_INTERVAL_TIME - BEGIN_INTERVAL_TIME as elapsed,
  2  (cast(END_INTERVAL_TIME as date) - cast(BEGIN_INTERVAL_TIME as date))
  3    *86400 as elapsed2
  4  from dba_hist_snapshot
  5  where snap_id < 6
  6  order by 1;
   SNAP_ID ELAPSED                          ELAPSED2
---------- ------------------------------ ----------
         1 +000000000 00:00:44.281                44
         2 +000000000 00:04:32.488               272
         3 +000000000 00:51:39.969              3100
         4 +000000000 01:00:01.675              3602
         5 +000000000 00:59:01.697              3542

That’s much better. But I notice that, in snapshot 1 for example, the elapsed time was 44.281 seconds and in my CAST version it’s only 44 seconds. In casting to the DATA datatype there has been some rounding. Maybe that isn’t an issue, but surely there’s a way to keep that extra accuracy?

Here’s the answer I came up with – using EXTRACT:

SQL> select SNAP_ID, END_INTERVAL_TIME - BEGIN_INTERVAL_TIME as elapsed,
  2  (cast(END_INTERVAL_TIME as date) - cast(BEGIN_INTERVAL_TIME as date))
  3    *86400 as elapsed2,
  4  (extract(day from END_INTERVAL_TIME)-extract(day from BEGIN_INTERVAL_TIME))*86400 +
  5  (extract(hour from END_INTERVAL_TIME)-extract(hour from BEGIN_INTERVAL_TIME))*3600 +
  6  (extract(minute from END_INTERVAL_TIME)-extract(minute from BEGIN_INTERVAL_TIME))*60 +
  7  (extract(second from END_INTERVAL_TIME)-extract(second from BEGIN_INTERVAL_TIME)) as elapsed3
  8  from dba_hist_snapshot
  9  where snap_id < 6
 10  order by 1;
   SNAP_ID ELAPSED                          ELAPSED2   ELAPSED3
---------- ------------------------------ ---------- ----------
         1 +000000000 00:00:44.281                44     44.281
         2 +000000000 00:04:32.488               272    272.488
         3 +000000000 00:51:39.969              3100   3099.969
         4 +000000000 01:00:01.675              3602   3601.675
         5 +000000000 00:59:01.697              3542   3541.697

Not particularly simple, but at least accurate. I’m happy to be told that there’s an easier way?

Why?

Why am I doing this? Because I am trying to look at the performance of some benchmarks I am working on. The load generation tool creates regular AWR snapshots so I want to look at the peak IO rates for each snapshot to save myself generating a million AWR reports.

I am specifically interested in the statisticsĀ redo size,Ā physical reads, andĀ physical writes from DBA_HIST_SYSSTAT. The tests aim to push each of these metrics (independently though, not all at the same time… yet!)

With that in mind, and with thanks toĀ Ludovico Caldara for some code on which to build, here is the SQL that I am using to view the performance in each snapshot. First the output though, truncated to save some space on the screen:

SNAPSHOTID SNAPSHOTTIME    REDO_MBSEC REDO_GRAPH           READ_MBSEC READ_GRAPH           WRITE_MBSEC WRITE_GRAPH
---------- --------------- ---------- -------------------- ---------- -------------------- ----------- --------------------
       239 30-AUG 13:37:14     384.83 ******                        0 *                            .04 ******
       240 30-AUG 13:38:07     284.27 ****                          0 *                            .03 ****
       241 30-AUG 13:42:14     296.62 ****                          0                              .03 ****
       242 30-AUG 13:43:00    1242.08 ********************          0 *                            .12 ********************
       243 30-AUG 13:47:14     258.75 ****                          0                              .02 ****
       244 30-AUG 13:48:28     866.83 **************                0 *                            .08 *************
       245 30-AUG 13:52:14     456.24 *******                       0                              .04 *******
       246 30-AUG 13:54:43     773.61 ************                  0                              .07 ************
       247 30-AUG 13:57:38     624.23 **********                    0                              .06 *********
       248 30-AUG 14:00:22     613.98 *********                     0                              .05 *********

I’m currently running redo generation tests so I’m only interested in theĀ redo size metric and the calculation of redo per second, i.e. column 3. I can use the graph at column 4 to instantly see which snapshot I need to look at: 243. That’s the one where redo was being generated at over 1.2Gb/sec – not bad for a two-socket machine attached to a single Violin 6616 array.

Now for the SQL… I warn you now, it’s a bit dense!

-- for educational use only - use at your own risk!
-- display physical IO statistics from DBA_HIST_SYSSTAT
-- specifically redo size, physical reads and physical writes

set lines 140 pages 45
accept num_days prompt 'Enter the number of days to report on [default is 0.5]: '
set verify off

SELECT redo_hist.snap_id AS SnapshotID
,      TO_CHAR(redo_hist.snaptime, 'DD-MON HH24:MI:SS') as SnapshotTime
,      ROUND(redo_hist.statval/elapsed_time/1048576,2) AS Redo_MBsec
,      SUBSTR(RPAD('*', 20 * ROUND ((redo_hist.statval/elapsed_time) / MAX (redo_hist.statval/elapsed_time) OVER (), 2), '*'), 1, 20) AS Redo_Graph
,      ROUND(physical_read_hist.statval/elapsed_time/1048576,2) AS Read_MBsec
,      SUBSTR(RPAD('*', 20 * ROUND ((physical_read_hist.statval/elapsed_time) / MAX (physical_read_hist.statval/elapsed_time) OVER (), 2), '*'), 1, 20) AS Read_Graph
,      ROUND(physical_write_hist.statval/elapsed_time/1048576,2) AS Write_MBsec
,      SUBSTR(RPAD('*', 20 * ROUND ((physical_write_hist.statval/elapsed_time) / MAX (physical_write_hist.statval/elapsed_time) OVER (), 2), '*'), 1, 20) AS Write_Graph
FROM (SELECT s.snap_id
            ,g.value AS stattot
            ,s.end_interval_time AS snaptime
            ,NVL(DECODE(GREATEST(VALUE, NVL(lag (VALUE) OVER (PARTITION BY s.dbid, s.instance_number, g.stat_name
                 ORDER BY s.snap_id), 0)), VALUE, VALUE - LAG (VALUE) OVER (PARTITION BY s.dbid, s.instance_number, g.stat_name
                     ORDER BY s.snap_id), VALUE), 0) AS statval
            ,(EXTRACT(day FROM s.end_interval_time)-EXTRACT(day FROM s.begin_interval_time))*86400 +
             (EXTRACT(hour FROM s.end_interval_time)-EXTRACT(hour FROM s.begin_interval_time))*3600 +
             (EXTRACT(minute FROM s.end_interval_time)-EXTRACT(minute FROM s.begin_interval_time))*60 +
             (EXTRACT(second FROM s.end_interval_time)-EXTRACT(second FROM s.begin_interval_time)) as elapsed_time
        FROM dba_hist_snapshot s,
             dba_hist_sysstat g,
             v$instance i
       WHERE s.snap_id = g.snap_id
         AND s.begin_interval_time >= sysdate-NVL('&num_days', 0.5)
         AND s.instance_number = i.instance_number
         AND s.instance_number = g.instance_number
         AND g.stat_name = 'redo size') redo_hist,
     (SELECT s.snap_id
            ,g.value AS stattot
            ,NVL(DECODE(GREATEST(VALUE, NVL(lag (VALUE) OVER (PARTITION BY s.dbid, s.instance_number, g.stat_name
                 ORDER BY s.snap_id), 0)), VALUE, VALUE - LAG (VALUE) OVER (PARTITION BY s.dbid, s.instance_number, g.stat_name
                     ORDER BY s.snap_id), VALUE), 0) AS statval
        FROM dba_hist_snapshot s,
             dba_hist_sysstat g,
             v$instance i
       WHERE s.snap_id = g.snap_id
         AND s.begin_interval_time >= sysdate-NVL('&num_days', 0.5)
         AND s.instance_number = i.instance_number
         AND s.instance_number = g.instance_number
         AND g.stat_name = 'physical read total bytes') physical_read_hist,
     (SELECT s.snap_id
            ,g.value AS stattot
            ,NVL(DECODE(GREATEST(VALUE, NVL(lag (VALUE) OVER (PARTITION BY s.dbid, s.instance_number, g.stat_name
                 ORDER BY s.snap_id), 0)), VALUE, VALUE - LAG (VALUE) OVER (PARTITION BY s.dbid, s.instance_number, g.stat_name
                     ORDER BY s.snap_id), VALUE), 0) AS statval
        FROM dba_hist_snapshot s,
             dba_hist_sysstat g,
             v$instance i
       WHERE s.snap_id = g.snap_id
         AND s.begin_interval_time >= sysdate-NVL('&num_days', 0.5)
         AND s.instance_number = i.instance_number
         AND s.instance_number = g.instance_number
         AND g.stat_name = 'physical write total bytes') physical_write_hist
WHERE redo_hist.snap_id = physical_read_hist.snap_id
  AND redo_hist.snap_id = physical_write_hist.snap_id
ORDER BY 1;