SLOB testing on Violin and Exadata

I love SLOB, the Silly Little Oracle Benchmark introduced to me by Kevin Closson in his blog.

I love it because it’s so simple to setup and use. Benchmarking tools such as Hammerora have their place of course, but let’s say you’ve just got your hands on an Exadata X2-8 machine and want to see what sort of level of physical IO it can drive… what’s the quickest way to do that?

Host Name        Platform                         CPUs Cores Sockets Memory(GB)
---------------- -------------------------------- ---- ----- ------- ----------
exadataX2-8.vmem Linux x86 64-bit                  128    64       8    1009.40

Anyone who knows their Exadata configuration details will spot that this is one of the older X2-8’s as it “only” has eight-core Beckton processors instead of the ten-core Westmeres buzzing away in today’s boxes. But for the purposes of creating physical I/O this shouldn’t be a major problem.

Running with a small buffer cache recycle pool and calling SLOB with 256 readers (and zero writers) gives:

Load Profile              Per Second
~~~~~~~~~~~~         ---------------
  Physical reads:          138,010.5

So that’s 138k read IOPS at an 8k database block size. Not bad eh? I tried numerous values for readers and 256 gave me the best result.

Now let’s try it on the Violin 3000 series flash memory array I have here in the lab. I don’t have anything like the monster Sun Fire X4800 servers in the X2-8 with their 1TB of RAM and proliferation of 14 IB-connected storage cells. All I have is a Supermicro server with two quad-core E5530 Gainestown processors and under 100GB RAM:

Host Name        Platform                         CPUs Cores Sockets Memory(GB)
---------------- -------------------------------- ---- ----- ------- ----------
oel57            Linux x86 64-bit                   16     8       2      11.74

You can probably guess from the hostname that I’ve installed Oracle Linux 5 Update 7. I’m also running the Oracle Unbreakable Enterprise Kernel (v1) and using Oracle 11.2.0.3 database and Grid Infrastructure in order to take advantage of the raw performance of Violin LUNs on ASM. For each of the 8x100GB LUNs I have set the IO scheduler to use noop, as described in the installation cookbook.

So let’s see what happens when we run SLOB with the same small buffer cache recycle pool and 16 readers (zero writers):

Load Profile              Per Second
~~~~~~~~~~~~         ---------------
  Physical reads:          159,183.9

That’s 159k read IOPS at an 8k database block size. I’m getting almost exactly 20k IOPS per core, which funnily enough is what Kevin told me to expect as a rough limit.

The thing is, my Supermicro has four dual-port 8Gb fibre-channel cards in it, but only two of them have connections to the Violin array I’m testing here. The other two are connected to an identical 3000 series array, so maybe I should present another 8 LUNs from that and add them to my ASM diskgroup… Let’s see what happens when I rerun SLOB with the same 16 readers / 0 writers:

Load Profile              Per Second
~~~~~~~~~~~~         ---------------
  Physical reads:          236,486.7

Again this is an 8k blocksize so I’ve achieved 236k read IOPS. That’s nearly 30k IOPS per core!

I haven’t run this set of tests as a marketing exercise or even an attempt to make Violin look good. I was generally interested in seeing how the two configurations compared – and I’m blown away by the performance of the Violin Memory arrays. I should probably spend some more time investigating these 3000 arrays to see whether I can better that value, but like a kid with a new toy I have one eye on the single 6000 series array which has just arrived in the lab here. I wonder what I can get that to deliver with SLOB?

Advertisements

20 Responses to SLOB testing on Violin and Exadata

  1. Hello says:

    Your results and/or the test that is being run here is incorrect. An x2-8 rack is rated at 1.5MM physical reads, so your test should atleast read at 750,000 per node. Given that this test doesnt seem to do this right and I have seen much higher physical reads, I can only tell you that this data point is wrong. You should check your awrs. This test is lying and wrong.

    • flashdba says:

      Hello @Hello

      Your statement that “This test is lying and wrong” appears to be backed up only by the reference to the Exadata X208 datasheet (which states that “Up to 1,500,000 database flash IOPS” can be delivered by an X2-8) as well as a claim that you have seen much high physical reads. You also say that I should check my AWRs. I’m pretty sure that I can cut and paste an AWR without accidentally modifying the results, but ok.

      So let’s address the idea that the test is wrong. Well I’m running SLOB, a freely-available and frequently-used benchmarking tool. In case you aren’t familiar with it, the README is now available on Kevin Closson’s blog here. I ran these tests using a small buffer cache recycle pool to drive physical IO. Given that there is hardly anything to change when installing and running SLOB, I do not see how this could be “wrong” by any definition of the world, other than because it doesn’t match the Oracle datasheet. However, if you want to try the same test on an X2-8 and send me the results I am happy to publish them underneath my own.

      Now for the idea that the test is “lying”. What does that actually mean? The ouput of SLOB is an AWR report, how can that be lying? Unless you suspect that either a) the report output is wrong because of some bug, or b) the output has been artificially modified? I’m happy to send you the AWR output if you want to “check” it.

      And finally, the statement that I find rather bizarre, which is “this data point is wrong”. It isn’t. It cannot possibly be. This post was not entitled “Testing the maximum possible performance of an X2-8”. It is “SLOB testing on Violin and Exadata”, in which I ran SLOB on both systems. It is a documentary of my findings. This is what happened and these are the results.

      If it takes me three hours to drive my car from A to B then it doesn’t matter how fast the car is capable of going, that three hours is a fact, it’s what actually happened on that specific occasion.

  2. karlarao says:

    Nice! I’m interested to see the comparison of SLOB on the same test case environments with 256 writers (and zero readers) 🙂

    • flashdba says:

      Always happy to oblige!

      Unfortunately the WordPress blogging system isn’t very nice for posting AWR reports into replies, so I’ve had to pad it with dots.

      Load.Profile..............Per.Second....Per.Transaction
      ~~~~~~~~~~~~.........---------------....---------------
      ......DB.Time(s):...............13.4................0.8
      .......DB.CPU(s):................2.0................0.1
      .......Redo.size:.......24,273,456.1........1,396,735.5
      ...Logical.reads:............9,750.4..............561.1
      ...Block.changes:...........10,927.9..............628.8
      ..Physical.reads:............5,782.1..............332.7
      .Physical.writes:............6,148.2..............353.8

      Basically we have 5,782 RIOPS and 6148 WIOPS. And to confirm, that was the same X2-8 with 256 writers and 0 readers. If you are wondering what happened, I’ll show you:

      Event.................................Waits.....Time(s)...(ms)...time.Wait.Class
      ------------------------------.------------.-----------.------.------.----------
      flashback.buf.free.by.RVWR...........19,358.......2,211....114...47.5.Configurat
      cell.single.block.physical.rea....2,003,269.......1,238......1...26.6.User.I/O
      DB.CPU..............................................695..........14.9
      latch.free..........................135,055.........377......3....8.1.Other
      free.buffer.waits....................17,982.........135......8....2.9.Configurat

      Flashback didn’t like it at all did it? I need to re-run this test with flashback disabled, but right now I’m having trouble getting time on the machine.

      • karlarao says:

        Thanks! is this for Exadata? if yes, can you also post the Violin run of 256 writers (and zero readers)..

        -Karl

        • flashdba says:

          That was for Exadata yes. Don’t forget that I only have a 2-socket 8-core server for my Violin SLOB testing so running with 256 readers or writers isn’t going to be an option just now. And as it happens, I can’t test anything right now. I took some days off last week and when I came back someone had liberated my test server for something else. Tomorrow I need to go and reclaim it by force.

    • karlarao says:

      I’m still looking forward for that test case comparison of SLOB on the “same test case environments” with 256 writers (and zero readers) ..

      -Karl

      • flashdba says:

        You might be waiting a while. I only have 8 cores on my lab kit, which makes running 256 readers and/or writers virtually pointless. I’ve put in a request for some new kit but it isn’t going to have the 80 cores that an X2-8 compute node has, at least not unless my boss suddenly has a moment of benevolence and splashes the entire IT budget on me.

        Of course one of my customers (many of whom use DL980s due to our reseller agreement with HP) may be able to give me some SLOB time so don’t lose hope entirely…

      • karlarao says:

        I don’t think that 8cores from your violin box would stop you from executing the SLOB test case. I’ve seen an x2-8 Exadata do a 60K workload IOPS writes being done on just one node with just about 10-20% CPU utilization. Check it out here http://goo.gl/v60EK

        You can make the init.ora parameter on SLOB cpu_count=1 to make an apples to apples write comparison.

        -Karl

  3. karlarao says:

    BTW you might be interested in this, I’ve seen this image before by Greg Rahn doing 1 million IOPS / 9GB/s http://twitter.com/#!/GregRahn/media/slideshow?url=http%3A%2F%2Ftwitpic.com%2F2tq9xu

  4. cohenjo says:

    Hi, First I enjoy reading your blog 🙂

    We were also thinking of looking into Violin agaist Exadata performance.

    your Exadata results seemed a little low, so i ran SLOB against our LAB’s quarter rack (V2)

    I used 64 readers (altough 32 gave similar resaults)
    from the awr.txt:
    Host Name Platform CPUs Cores Sockets Memory(GB)
    —————- ——————————– —- —– ——- ———-
    Linux x86 64-bit 16 8 2 70.60

    Load Profile Per Second
    ~~~~~~~~~~~~ —————
    Physical reads: 156,519.1

    i think you could get better results from your exadata…

    it was interesting to see your Violin result as your server is similar to the V2 server…

    make sure to blog on the resaults you get from the 6000 Violin 🙂

    • flashdba says:

      Thank you for posting your SLOB results, the more we have the clearer we can see how these products perform.

      I ran the 6000 testing this week with Martin Bach as my “impartial observer”, so I just need to find the time to write it up. We also had a chance to test the 3000 series but using Infiniband connectivity… the latency figures were astonishing. I need to write that up too… hopefully soon!

  5. Alex says:

    Hello,

    Interesting .. just curious – would not having the Violin array connected via FC instead of PCIe cable( I think was called ) somewhat reduce the performance of your Violin results ?
    SLOB is nice benchlmark I agree with that but is really OLTP oriented . Personally I think where Exadata is good at is the DW/OLAP( insert here any big data/analytcal terms in fashion lately ) since what you really buy is the “cell offloading” in case of Exadata
    Also would be interesting to see how Violin compares to something like EMC VF Cache with EMC VMAX backend . Thanks

    Regards,
    Alex

    • flashdba says:

      Hi Alex

      You are correct yes, adding the FC layer will increase the latency compared to direct-attached PCIe, probably by around 200 microseconds. However, the FC attachment allows for the use of a Violin gateway which then means the array can shared amongst multiple servers (e.g. for RAC). It also allows for multiple arrays to be managed as a cluster.

      You are probably thinking that it would be nice to have the performance of the PCIe direct attachment version along with the high availability and manageability of the FC version… well this is now possible, because I have now got an Infiniband-connected array in my lab. I will blog about it as soon as I can but for now let me tell you that I was getting 100 microseconds latency from a 3000 connected via IB. Amazing stuff!

  6. Pascal Phillip says:

    Hi,
    The exadata numbers seem to be a little small. I just tested this on a single instance 11.2.0.3 DB on OEL 6.4 and a 3PAR Storage.

    Cache Sizes Begin End
    ~~~~~~~~~~~ ———- ———-
    Buffer Cache: 5,056M 5,056M Std Block Size: 8K
    Shared Pool Size: 5,024M 5,024M Log Buffer: 16,528K

    Load Profile Per Second Per Transaction Per Exec Per Call
    ~~~~~~~~~~~~ ————— ————— ———- ———-
    DB Time(s): 79.5 2,184.2 0.01 14.51
    DB CPU(s): 31.7 870.3 0.00 5.78
    Redo size: 8,040.1 220,858.9
    Logical reads: 3,262,027.4 89,607,002.5
    Block changes: 44.4 1,219.5
    Physical reads: 586,285.5 16,105,101.7
    Physical writes: 30.3 832.3
    User calls: 5.5 150.6
    Parses: 5.7 156.4
    Hard parses: 0.6 15.7
    W/A MB processed: 0.2 5.7
    Logons: 0.0 0.9
    Executes: 12,680.0 348,316.2
    Rollbacks: 0.0 0.0
    Transactions: 0.0

    • flashdba says:

      That post is nearly two years old now – and the Exadata model in question was two generations previous.

      Your SLOB results are interesting but the buffer cache is very large at 5GB. Also, the really interesting value is missing – we need to see the number of waits and total wait time for the wait event db file sequential read so we can calculate the average latency.

  7. Pascal Phillip says:

    Hi,
    i also think Exadata X4 will exceed 500K IOPS, but i have not tested SLOB2 on it as yet.

    Below are my wait events:

    Avg
    %Time Total Wait wait Waits % DB
    Event Waits -outs Time (s) (ms) /txn time
    ————————– ———— —– ———- ——- ——– ——
    db file sequential read 177,142,915 0 4,804 0 1.61E+07 20.0
    latch: cache buffers chain 124,865 0 1,442 12 1.14E+04 6.0
    latch free 16,703 0 226 14 1,518.5 .9
    library cache lock 79 0 16 202 7.2 .1
    log file sync 167 0 1 6 15.2 .0

  8. Pascal Phillip says:

    And here top 5 times events:

    Top 5 Timed Foreground Events
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Avg
    wait % DB
    Event Waits Time(s) (ms) time Wait Class
    —————————— ———— ———– —— —— ———-
    DB CPU 9,573 39.8
    db file sequential read 177,142,915 4,804 0 20.0 User I/O
    latch: cache buffers chains 124,865 1,442 12 6.0 Concurrenc
    latch free 16,703 226 14 .9 Other
    library cache lock 79 16 202 .1 Concurrenc
    Host CPU (CPUs: 32 Cores: 32 Sockets: 8)
    ~~~~~~~~ Load Average
    Begin End %User %System %WIO %Idle
    ——— ——— ——— ——— ——— ———
    23.16 75.62 79.7 19.7 0.0 0.6

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s