What Every CIO Wants

Some weeks ago I was fortunate enough to read a preview copy of Stephen O’Donnell’s book What Every CIO Wants: A Guide for Global Technology Sales People. Steve is, amongst other things, the Chairman of the Industry Advisory Board for Violin Memory, as well as being the former Global Head of Data Centre Operations for BT (a company where I spent many years working when I was in Oracle Advanced Customer Services).

Needless to say I think it’s an excellent book (or I wouldn’t be blogging about it!) Aimed at sales people who need to pitch at C-level in the technology industry, it provides an insight into the mindset of Chief Information Officers and other C-level executives. In fact, I wonder if Steve has sold himself short in that it could almost be a guide for budding CIOs, explaining how they should be thinking and what their priorities are.

Without wanting to give away any of the content I strongly recommend reading and re-reading the section on ARC – Agility Risk and Cost. This is absolutely critical to understanding the way that the business of IT is run. In fact I would say that you don’t need to be in sales to need to understand the mindset of CIOs and CTOs – any DBA who is looking to propose any sort of investment would do well to read this and understand the concept of ARC, then frame their arguments accordingly.

 

SLOB on Violin 3000 Series with Infiniband

Last week I invited Martin Bach to the Violin Memory EMEA headquarters to do some testing on both our 3000 and 6000 series arrays. Martin was very interested in seeing how the Violin flash memory arrays performed, having already had some experience with PCIe-based flash card vendor.

There are a few problems with PCIe flash cards, but perhaps the two most critical are that a) the storage cannot be shared, meaning it isn’t highly-available; and b) the replacement of any PCIe card requires the server to be taken offline.

Violin’s approach is fundamentally different because the flash memory is contained in a separate unit which can then be presented over one of a number of connections: PCIe direct-attached, Fibre Channel, iSCSI… and now Infiniband. All of those, with the exception of PCIe, allow for the storage to be shared and highly-available. So why do we still provide PCIe?

There are two answers. The first and most simple answer is for flexibility – the design of the arrays makes it simple to provide multiple connectivity options, so why not? The second and more important (in terms of performance) is for latency. The overhead of adding fibre-channel to a flash memory is only in the order of one or two hundred microseconds, but if you consider that the 6216 SLC array has a read and write latency of 90 and 25 microseconds respectively that’s quite an additional overheard.

The new and exciting addition to these options is therefore Infiniband, which allows for extremely low latencies yet with the ability to avoid the pitfalls of PCIe around sharing and HA.

To demonstrate the latency figures achievable through a 3205 SLC array connected via Infiniband, Martin and I ran a series of SLOB physical IO tests and monitored the latency. The tests consisted of gradually ramping up the number of readers to see how the latency fared as the number of IOPS increased – we always kept the number of writers as zero. As usual the database block size was 8k. Here are the results:

Filename      Event                          Waits  Time(s)  Latency       IOPS
------------- ------------------------ ------------ -------- ------- ----------
awr_0_1.txt   db file sequential read        9,999        1     100     2,063.8
awr_0_4.txt   db file sequential read       29,992        5     166     5,998.8
awr_0_8.txt   db file sequential read       39,965        6     150     8,285.5
awr_0_16.txt  db file sequential read       79,958       15     187    13,897.8
awr_0_32.txt  db file sequential read      159,914       43     269    18,133.9
awr_0_64.txt  db file sequential read   21,595,919    6,035     280   115,461.1
awr_0_128.txt db file sequential read   99,762,808   69,007     691   124,907.4

The interesting thing is to note how the latency scales linearly. The tests were performed on a 2s8c16t Supermicro server with 2x QDR Infiniband connections via a switch to the array. The Supermicro starts having trouble driving the IO once we get beyond 32 readers – and by the time we get to 128 the load average is so high on the machine that even logging on is hard work. I guess it’s time to ask for a bigger server in the lab…

Exadata Re-Racking Service

I’ve heard from a few sources now that Oracle is offering a new Exadata Re-racking service for quarter and half racks. The idea, as I understand it, is that if you have your own rack equipment in your data centre and don’t want to use the rack that Exadata comes preinstalled in, you can pay an extra fee for Oracle’s Advanced Customer Services engineers (a fine bunch of people I must say!) to re-rack it. It appears that the machine is delivered to your data centre and then ACS will disassemble it at your site and reassemble it in your rack.

There appear to be some caveats, such as a pre-installation survey to check that your rack kit is suitable and a ban on putting anything else in the same rack. Also, since you cannot have this with the full rack I presume that this would preclude you from upgrading to a full machine in the future – at least not without having to relocate the kit, which I guess means downtime. I must stress that I don’t have the exact details, so talk to your friendly local Exadata sales rep if you want to know.

What I will say is that in all my time at Oracle the idea that customers could not re-rack the Exadata component servers was one of the few rules which was set in stone. Many customers asked, but all were told no. So what’s changed?

If you ask Oracle I am sure they would say that they are “listening to customer demand” and being “flexible”. On the other hand surely there must be some who will see this as a simple case of abandoning a principle in order to increase the attraction of Exadata and get more sales.

I’d love to know what happens to the empty Exadata rack once the kit has been moved. I’ll start checking to see if they appear on eBay…

SLOB testing on Violin and Exadata

I love SLOB, the Silly Little Oracle Benchmark introduced to me by Kevin Closson in his blog.

I love it because it’s so simple to setup and use. Benchmarking tools such as Hammerora have their place of course, but let’s say you’ve just got your hands on an Exadata X2-8 machine and want to see what sort of level of physical IO it can drive… what’s the quickest way to do that?

Host Name        Platform                         CPUs Cores Sockets Memory(GB)
---------------- -------------------------------- ---- ----- ------- ----------
exadataX2-8.vmem Linux x86 64-bit                  128    64       8    1009.40

Anyone who knows their Exadata configuration details will spot that this is one of the older X2-8’s as it “only” has eight-core Beckton processors instead of the ten-core Westmeres buzzing away in today’s boxes. But for the purposes of creating physical I/O this shouldn’t be a major problem.

Running with a small buffer cache recycle pool and calling SLOB with 256 readers (and zero writers) gives:

Load Profile              Per Second
~~~~~~~~~~~~         ---------------
  Physical reads:          138,010.5

So that’s 138k read IOPS at an 8k database block size. Not bad eh? I tried numerous values for readers and 256 gave me the best result.

Now let’s try it on the Violin 3000 series flash memory array I have here in the lab. I don’t have anything like the monster Sun Fire X4800 servers in the X2-8 with their 1TB of RAM and proliferation of 14 IB-connected storage cells. All I have is a Supermicro server with two quad-core E5530 Gainestown processors and under 100GB RAM:

Host Name        Platform                         CPUs Cores Sockets Memory(GB)
---------------- -------------------------------- ---- ----- ------- ----------
oel57            Linux x86 64-bit                   16     8       2      11.74

You can probably guess from the hostname that I’ve installed Oracle Linux 5 Update 7. I’m also running the Oracle Unbreakable Enterprise Kernel (v1) and using Oracle 11.2.0.3 database and Grid Infrastructure in order to take advantage of the raw performance of Violin LUNs on ASM. For each of the 8x100GB LUNs I have set the IO scheduler to use noop, as described in the installation cookbook.

So let’s see what happens when we run SLOB with the same small buffer cache recycle pool and 16 readers (zero writers):

Load Profile              Per Second
~~~~~~~~~~~~         ---------------
  Physical reads:          159,183.9

That’s 159k read IOPS at an 8k database block size. I’m getting almost exactly 20k IOPS per core, which funnily enough is what Kevin told me to expect as a rough limit.

The thing is, my Supermicro has four dual-port 8Gb fibre-channel cards in it, but only two of them have connections to the Violin array I’m testing here. The other two are connected to an identical 3000 series array, so maybe I should present another 8 LUNs from that and add them to my ASM diskgroup… Let’s see what happens when I rerun SLOB with the same 16 readers / 0 writers:

Load Profile              Per Second
~~~~~~~~~~~~         ---------------
  Physical reads:          236,486.7

Again this is an 8k blocksize so I’ve achieved 236k read IOPS. That’s nearly 30k IOPS per core!

I haven’t run this set of tests as a marketing exercise or even an attempt to make Violin look good. I was generally interested in seeing how the two configurations compared – and I’m blown away by the performance of the Violin Memory arrays. I should probably spend some more time investigating these 3000 arrays to see whether I can better that value, but like a kid with a new toy I have one eye on the single 6000 series array which has just arrived in the lab here. I wonder what I can get that to deliver with SLOB?

The History of Exadata

I’ve been working on a timeline for the history of Exadata, starting with the HP Oracle Database Machine and working through to the X2 series.

It’s interesting to see how Oracle’s presentation of the product has changed over time, particularly the marketing messages.

Also, if you didn’t know better you would probably think that Engineered Systems were something Oracle had been planning for years. But the original plans for the Oracle Database Machine were to allow multiple vendors and ports of the storage software, basically an open architecture.

Things have changed a lot since then…

Oracle minimises the Exadata minimal pack

As of Exadata Storage Software version 11.2.3.1 released in March 2012 the “minimal pack” has now been deprecated. This is a component of the storage server software patch which is actually applied to the database servers in order to bring them up to the same image version.

Those who have been patching Exadata for a while now may remember the days when the database servers were patched using the ironically-named “convenience pack”. At some point in 2011 that was renamed to be the minimal pack. Well now it is gone entirely, to be replaced with a yum channel on the Unbreakable Linux Network.

There appears to be a channel per software version, e.g. exadata_dbserver_11.2.3.1.0_x86_64_base.

In a way that sounds like a better solution – but it does of course mean some logistical changes if you are going to do it the way Oracle suggests. For a start you will need the database servers to have direct network access to the repositories on the ULN. Or failing that you may need to create your own mirror repositories somewhere on the internal network and point the Exadata machines at those.

One thing which isn’t made explicitly clear in the patch readme for 11.2.3.1 is that this will update the kernel on the X2-2 to 2.6.18-274… meaning your database servers are effectively moving from Oracle Linux 5 Update 6 to Update 7. The X2-8 on the other hand updates to 2.6.32-300.

It’s also interesting to note that Oracle is still persisting with the 2.6.18 Red Hat compatible kernel on the X2-2 database servers despite the 2.6.32 Oracle Unbreakable Enterprise Kernel (UEK) being out for years. In fact there’s even a UEKv2 out now.

Another thing I notice is that those customers who were brave enough to choose to run their Exadata database servers on Solaris 11 Express have now been served a desupport notice and have six months to upgrade to Solaris 11 proper. It’s not a drastically difficult upgrade to perform but I’m surprised about that six month limit, it seems a little unfair considering the one year grace period customers usually get with database patchsets.

ASM Metadata Utilities

One of the things I meant to write about when I started this blog was the undocumented stuff in Oracle that is publicly available. Since I used to spend a lot of time working with ASM I had an idea that I would write an article about kfed, the kernel file editor used to query (and in desperate circumstances actually change) the mysterious dark matter known as ASM Metadata.

I say mysterious, it isn’t actually that unfathomable, but I have heard a lot of people get confused between the ASM Metadata which resides at the start of each ASM disk (and contains structures such as the Partner Status Table) and the ASM “metadata” that can be backed up and restored using the commands md_backup and md_restore (essentially just information about directory structure and aliases etc in the diskgroup). As usual Oracle’s naming convention does not make things completely clear.

Anyway after a quick bit of Google-fu I’ve realised that I will have to scrap the whole idea anyway, because my ex-Oracle colleague Bane Radulović has written a great article all about kfed and then added insult to injury by eloquently explaining all about ASM Metadata.

Race you to write an article about AMDU then Bane…

Oh too late.