SLOB testing on Violin and Exadata

May 8, 2012 20 Comments

I love SLOB, the Silly Little Oracle Benchmark introduced to me by Kevin Closson in his blog.

I love it because it’s so simple to setup and use. Benchmarking tools such as Hammerora have their place of course, but let’s say you’ve just got your hands on an Exadata X2-8 machine and want to see what sort of level of physical IO it can drive… what’s the quickest way to do that?

Host Name        Platform                         CPUs Cores Sockets Memory(GB)
---------------- -------------------------------- ---- ----- ------- ----------
exadataX2-8.vmem Linux x86 64-bit                  128    64       8    1009.40

Anyone who knows their Exadata configuration details will spot that this is one of the older X2-8’s as it “only” has eight-core Beckton processors instead of the ten-core Westmeres buzzing away in today’s boxes. But for the purposes of creating physical I/O this shouldn’t be a major problem.

Running with a small buffer cache recycle pool and calling SLOB with 256 readers (and zero writers) gives:

Load Profile              Per Second
~~~~~~~~~~~~         ---------------
  Physical reads:          138,010.5

So that’s 138k read IOPS at an 8k database block size. Not bad eh? I tried numerous values for readers and 256 gave me the best result.

Now let’s try it on the Violin 3000 series flash memory array I have here in the lab. I don’t have anything like the monster Sun Fire X4800 servers in the X2-8 with their 1TB of RAM and proliferation of 14 IB-connected storage cells. All I have is a Supermicro server with two quad-core E5530 Gainestown processors and under 100GB RAM:

Host Name        Platform                         CPUs Cores Sockets Memory(GB)
---------------- -------------------------------- ---- ----- ------- ----------
oel57            Linux x86 64-bit                   16     8       2      11.74

You can probably guess from the hostname that I’ve installed Oracle Linux 5 Update 7. I’m also running the Oracle Unbreakable Enterprise Kernel (v1) and using Oracle 11.2.0.3 database and Grid Infrastructure in order to take advantage of the raw performance of Violin LUNs on ASM. For each of the 8x100GB LUNs I have set the IO scheduler to use noop, as described in the installation cookbook.

So let’s see what happens when we run SLOB with the same small buffer cache recycle pool and 16 readers (zero writers):

Load Profile              Per Second
~~~~~~~~~~~~         ---------------
  Physical reads:          159,183.9

That’s 159k read IOPS at an 8k database block size. I’m getting almost exactly 20k IOPS per core, which funnily enough is what Kevin told me to expect as a rough limit.

The thing is, my Supermicro has four dual-port 8Gb fibre-channel cards in it, but only two of them have connections to the Violin array I’m testing here. The other two are connected to an identical 3000 series array, so maybe I should present another 8 LUNs from that and add them to my ASM diskgroup… Let’s see what happens when I rerun SLOB with the same 16 readers / 0 writers:

Load Profile              Per Second
~~~~~~~~~~~~         ---------------
  Physical reads:          236,486.7

Again this is an 8k blocksize so I’ve achieved 236k read IOPS. That’s nearly 30k IOPS per core!

I haven’t run this set of tests as a marketing exercise or even an attempt to make Violin look good. I was generally interested in seeing how the two configurations compared – and I’m blown away by the performance of the Violin Memory arrays. I should probably spend some more time investigating these 3000 arrays to see whether I can better that value, but like a kid with a new toy I have one eye on the single 6000 series array which has just arrived in the lab here. I wonder what I can get that to deliver with SLOB?

Filed under Blog, Flash, Oracle Exadata, SLOB Tagged with SLOB

20 Responses to SLOB testing on Violin and Exadata

Hello says:

May 15, 2012 at 4:52 am

Your results and/or the test that is being run here is incorrect. An x2-8 rack is rated at 1.5MM physical reads, so your test should atleast read at 750,000 per node. Given that this test doesnt seem to do this right and I have seen much higher physical reads, I can only tell you that this data point is wrong. You should check your awrs. This test is lying and wrong.

Reply
- flashdba says:
  
  May 17, 2012 at 2:07 pm
  
  Hello @Hello
  
  Your statement that “This test is lying and wrong” appears to be backed up only by the reference to the Exadata X208 datasheet (which states that “Up to 1,500,000 database flash IOPS” can be delivered by an X2-8) as well as a claim that you have seen much high physical reads. You also say that I should check my AWRs. I’m pretty sure that I can cut and paste an AWR without accidentally modifying the results, but ok.
  
  So let’s address the idea that the test is wrong. Well I’m running SLOB, a freely-available and frequently-used benchmarking tool. In case you aren’t familiar with it, the README is now available on Kevin Closson’s blog here. I ran these tests using a small buffer cache recycle pool to drive physical IO. Given that there is hardly anything to change when installing and running SLOB, I do not see how this could be “wrong” by any definition of the world, other than because it doesn’t match the Oracle datasheet. However, if you want to try the same test on an X2-8 and send me the results I am happy to publish them underneath my own.
  
  Now for the idea that the test is “lying”. What does that actually mean? The ouput of SLOB is an AWR report, how can that be lying? Unless you suspect that either a) the report output is wrong because of some bug, or b) the output has been artificially modified? I’m happy to send you the AWR output if you want to “check” it.
  
  And finally, the statement that I find rather bizarre, which is “this data point is wrong”. It isn’t. It cannot possibly be. This post was not entitled “Testing the maximum possible performance of an X2-8”. It is “SLOB testing on Violin and Exadata”, in which I ran SLOB on both systems. It is a documentary of my findings. This is what happened and these are the results.
  
  If it takes me three hours to drive my car from A to B then it doesn’t matter how fast the car is capable of going, that three hours is a fact, it’s what actually happened on that specific occasion.
  
  Reply
karlarao says:

May 17, 2012 at 6:28 pm

Nice! I’m interested to see the comparison of SLOB on the same test case environments with 256 writers (and zero readers) 🙂

Reply
- flashdba says:
  
  May 20, 2012 at 8:18 pm
  
  Always happy to oblige!
  
  Unfortunately the WordPress blogging system isn’t very nice for posting AWR reports into replies, so I’ve had to pad it with dots.
  
  Load.Profile..............Per.Second....Per.Transaction ~~~~~~~~~~~~.........---------------....--------------- ......DB.Time(s):...............13.4................0.8 .......DB.CPU(s):................2.0................0.1 .......Redo.size:.......24,273,456.1........1,396,735.5 ...Logical.reads:............9,750.4..............561.1 ...Block.changes:...........10,927.9..............628.8 ..Physical.reads:............5,782.1..............332.7 .Physical.writes:............6,148.2..............353.8
  
  Basically we have 5,782 RIOPS and 6148 WIOPS. And to confirm, that was the same X2-8 with 256 writers and 0 readers. If you are wondering what happened, I’ll show you:
  
  Event.................................Waits.....Time(s)...(ms)...time.Wait.Class ------------------------------.------------.-----------.------.------.---------- flashback.buf.free.by.RVWR...........19,358.......2,211....114...47.5.Configurat cell.single.block.physical.rea....2,003,269.......1,238......1...26.6.User.I/O DB.CPU..............................................695..........14.9 latch.free..........................135,055.........377......3....8.1.Other free.buffer.waits....................17,982.........135......8....2.9.Configurat
  
  Flashback didn’t like it at all did it? I need to re-run this test with flashback disabled, but right now I’m having trouble getting time on the machine.
  
  Reply
  - karlarao says:
    
    May 20, 2012 at 8:29 pm
    
    Thanks! is this for Exadata? if yes, can you also post the Violin run of 256 writers (and zero readers)..
    
    -Karl
    
    Reply
    - flashdba says:
      
      May 20, 2012 at 8:39 pm
      
      That was for Exadata yes. Don’t forget that I only have a 2-socket 8-core server for my Violin SLOB testing so running with 256 readers or writers isn’t going to be an option just now. And as it happens, I can’t test anything right now. I took some days off last week and when I came back someone had liberated my test server for something else. Tomorrow I need to go and reclaim it by force.
      
      Reply
- karlarao says:
  
  June 13, 2012 at 5:34 pm
  
  I’m still looking forward for that test case comparison of SLOB on the “same test case environments” with 256 writers (and zero readers) ..
  
  -Karl
  
  Reply
  - flashdba says:
    
    June 13, 2012 at 5:54 pm
    
    You might be waiting a while. I only have 8 cores on my lab kit, which makes running 256 readers and/or writers virtually pointless. I’ve put in a request for some new kit but it isn’t going to have the 80 cores that an X2-8 compute node has, at least not unless my boss suddenly has a moment of benevolence and splashes the entire IT budget on me.
    
    Of course one of my customers (many of whom use DL980s due to our reseller agreement with HP) may be able to give me some SLOB time so don’t lose hope entirely…
    
    Reply
  - karlarao says:
    
    June 13, 2012 at 6:05 pm
    
    I don’t think that 8cores from your violin box would stop you from executing the SLOB test case. I’ve seen an x2-8 Exadata do a 60K workload IOPS writes being done on just one node with just about 10-20% CPU utilization. Check it out here http://goo.gl/v60EK
    
    You can make the init.ora parameter on SLOB cpu_count=1 to make an apples to apples write comparison.
    
    -Karl
    
    Reply
    - flashdba says:
      
      June 14, 2012 at 6:20 pm
      
      My cores max out at 20k IOPS. But I’m getting some Sandy Bridge servers in so all that will change soon.
      
      Reply
karlarao says:

May 17, 2012 at 8:53 pm

BTW you might be interested in this, I’ve seen this image before by Greg Rahn doing 1 million IOPS / 9GB/s http://twitter.com/#!/GregRahn/media/slideshow?url=http%3A%2F%2Ftwitpic.com%2F2tq9xu

Reply
cohenjo says:

May 21, 2012 at 10:49 am

Hi, First I enjoy reading your blog 🙂

We were also thinking of looking into Violin agaist Exadata performance.

your Exadata results seemed a little low, so i ran SLOB against our LAB’s quarter rack (V2)

I used 64 readers (altough 32 gave similar resaults)
from the awr.txt:
Host Name Platform CPUs Cores Sockets Memory(GB)
—————- ——————————– —- —– ——- ———-
Linux x86 64-bit 16 8 2 70.60
…
Load Profile Per Second
~~~~~~~~~~~~ —————
Physical reads: 156,519.1

i think you could get better results from your exadata…

it was interesting to see your Violin result as your server is similar to the V2 server…

make sure to blog on the resaults you get from the 6000 Violin 🙂

Reply
- flashdba says:
  
  May 25, 2012 at 6:22 pm
  
  Thank you for posting your SLOB results, the more we have the clearer we can see how these products perform.
  
  I ran the 6000 testing this week with Martin Bach as my “impartial observer”, so I just need to find the time to write it up. We also had a chance to test the 3000 series but using Infiniband connectivity… the latency figures were astonishing. I need to write that up too… hopefully soon!
  
  Reply
Alex says:

May 23, 2012 at 5:54 pm

Hello,

Interesting .. just curious – would not having the Violin array connected via FC instead of PCIe cable( I think was called ) somewhat reduce the performance of your Violin results ?
SLOB is nice benchlmark I agree with that but is really OLTP oriented . Personally I think where Exadata is good at is the DW/OLAP( insert here any big data/analytcal terms in fashion lately ) since what you really buy is the “cell offloading” in case of Exadata
Also would be interesting to see how Violin compares to something like EMC VF Cache with EMC VMAX backend . Thanks

Regards,
Alex

Reply
- flashdba says:
  
  May 25, 2012 at 6:26 pm
  
  Hi Alex
  
  You are correct yes, adding the FC layer will increase the latency compared to direct-attached PCIe, probably by around 200 microseconds. However, the FC attachment allows for the use of a Violin gateway which then means the array can shared amongst multiple servers (e.g. for RAC). It also allows for multiple arrays to be managed as a cluster.
  
  You are probably thinking that it would be nice to have the performance of the PCIe direct attachment version along with the high availability and manageability of the FC version… well this is now possible, because I have now got an Infiniband-connected array in my lab. I will blog about it as soon as I can but for now let me tell you that I was getting 100 microseconds latency from a 3000 connected via IB. Amazing stuff!
  
  Reply
Pascal Phillip says:

February 19, 2014 at 3:44 pm

Hi,
The exadata numbers seem to be a little small. I just tested this on a single instance 11.2.0.3 DB on OEL 6.4 and a 3PAR Storage.

Cache Sizes Begin End
~~~~~~~~~~~ ———- ———-
Buffer Cache: 5,056M 5,056M Std Block Size: 8K
Shared Pool Size: 5,024M 5,024M Log Buffer: 16,528K

Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~ ————— ————— ———- ———-
DB Time(s): 79.5 2,184.2 0.01 14.51
DB CPU(s): 31.7 870.3 0.00 5.78
Redo size: 8,040.1 220,858.9
Logical reads: 3,262,027.4 89,607,002.5
Block changes: 44.4 1,219.5
Physical reads: 586,285.5 16,105,101.7
Physical writes: 30.3 832.3
User calls: 5.5 150.6
Parses: 5.7 156.4
Hard parses: 0.6 15.7
W/A MB processed: 0.2 5.7
Logons: 0.0 0.9
Executes: 12,680.0 348,316.2
Rollbacks: 0.0 0.0
Transactions: 0.0

Reply
- flashdba says:
  
  February 19, 2014 at 4:58 pm
  
  That post is nearly two years old now – and the Exadata model in question was two generations previous.
  
  Your SLOB results are interesting but the buffer cache is very large at 5GB. Also, the really interesting value is missing – we need to see the number of waits and total wait time for the wait event db file sequential read so we can calculate the average latency.
  
  Reply
Pascal Phillip says:

February 20, 2014 at 8:57 am

Hi,
i also think Exadata X4 will exceed 500K IOPS, but i have not tested SLOB2 on it as yet.

Below are my wait events:

Avg
%Time Total Wait wait Waits % DB
Event Waits -outs Time (s) (ms) /txn time
————————– ———— —– ———- ——- ——– ——
db file sequential read 177,142,915 0 4,804 0 1.61E+07 20.0
latch: cache buffers chain 124,865 0 1,442 12 1.14E+04 6.0
latch free 16,703 0 226 14 1,518.5 .9
library cache lock 79 0 16 202 7.2 .1
log file sync 167 0 1 6 15.2 .0

Reply
- flashdba says:
  
  February 20, 2014 at 9:59 am
  
  You can find Exadata X3 SLOB test results here:
  
  http://dbsid.com/how-to-get-extreme-exadata-iops-with-slob/
  
  Reply
Pascal Phillip says:

February 20, 2014 at 8:59 am

And here top 5 times events:

Top 5 Timed Foreground Events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Avg
wait % DB
Event Waits Time(s) (ms) time Wait Class
—————————— ———— ———– —— —— ———-
DB CPU 9,573 39.8
db file sequential read 177,142,915 4,804 0 20.0 User I/O
latch: cache buffers chains 124,865 1,442 12 6.0 Concurrenc
latch free 16,703 226 14 .9 Other
library cache lock 79 16 202 .1 Concurrenc
Host CPU (CPUs: 32 Cores: 32 Sockets: 8)
~~~~~~~~ Load Average
Begin End %User %System %WIO %Idle
——— ——— ——— ——— ——— ———
23.16 75.62 79.7 19.7 0.0 0.6

Reply