Using SLOB to Test Physical I/O

slob ghostFor some time now I’ve been using the Silly Little Oracle Benchmark (SLOB) tool to drive physical I/O against the various high performance flash memory arrays I have in my lab (one of the benefits of working for Violin Memory is a lab stuffed with flash arrays!)

I wrote a number of little scripts and tools to automate this process and always intended to publish some of them to the community. However, time flies and I never seem to get around to finishing them off or making them presentable. I’ve come to the conclusion now that this will probably never happen, so I’ve published them in all their nasty, badly-written and uncommented glory. I can only apologise in advance.

You can find them here.

Advertisement

AWR Generator

tools

As part of my role at Violin I spend a lot of time profiling customer’s databases to see how their performance varies over time. The easiest way to do this (since I often don’t have remote access) is to ask for lots of AWR reports. One single report covering a large span of time is useless, because all peaks and troughs are averaged out into a meaningless hum of noise, so I always ask for one report per snapshot period (usually an hour) covering many hours or days. And I always ask for text instead of HTML because then I can process them automatically.

That’s all well and good, but generating a hundred AWR reports is a laborious and mind-numbingly dull task. So to make things easier I’ve written a SQL script to do it. I know there are many other scripts out there to do this, but none of them met the criteria I needed – mainly that they were SQL not shell (for portability) and that they didn’t create temporary objects (such as directories).

If it is of use to anyone then I offer it up here:

https://flashdba.com/database/useful-scripts/awr-generator/

Likewise if you manage to break it, please let me know! Thanks to Paul for confirming that it works on RAC and Windows systems (you know you love testing my SQL…)

Querying DBA_HIST_SNAPSHOT and DBA_HIST_SYSSTAT

Why is it so hard in Oracle to get a decent answer to the question of how many seconds elapsed between two timestamps?

I’m looking at DBA_HIST_SNAPSHOT and wondering how many seconds each snapshot spans, because later on I want to use this to generate metrics like Redo Size per Second, etc.

SQL> desc dba_hist_snapshot
 Name                                Null?    Type
 ----------------------------------- -------- ------------------------
 SNAP_ID                             NOT NULL NUMBER
 DBID                                NOT NULL NUMBER
 INSTANCE_NUMBER                     NOT NULL NUMBER
 STARTUP_TIME                        NOT NULL TIMESTAMP(3)
 BEGIN_INTERVAL_TIME                 NOT NULL TIMESTAMP(3)
 END_INTERVAL_TIME                   NOT NULL TIMESTAMP(3)

So surely I can just subtract the begin time from the end time, right?

SQL> select SNAP_ID, END_INTERVAL_TIME - BEGIN_INTERVAL_TIME as elapsed
  2  from DBA_HIST_SNAPSHOT
  3  where SNAP_ID < 6
  4 order by 1;
   SNAP_ID ELAPSED
---------- ------------------------------
         1 +000000000 00:00:44.281
         2 +000000000 00:04:32.488
         3 +000000000 00:51:39.969
         4 +000000000 01:00:01.675
         5 +000000000 00:59:01.697

Gaaaah…. It’s given me one of those stupid interval datatypes! I’ve never been a fan of these. I just want to know the value in seconds.

Luckily I can cast a timestamp (the datatype in DBA_HIST_SNAPSHOT) as a good old fashioned DATE. We love dates, you can treat them as numbers and add, subtract etc. The integer values represent days, so you just need to multiply by 24 x 60 x 60 = 86400 to get seconds:

SQL> select SNAP_ID, END_INTERVAL_TIME - BEGIN_INTERVAL_TIME as elapsed,
  2  (cast(END_INTERVAL_TIME as date) - cast(BEGIN_INTERVAL_TIME as date))
  3    *86400 as elapsed2
  4  from dba_hist_snapshot
  5  where snap_id < 6
  6  order by 1;
   SNAP_ID ELAPSED                          ELAPSED2
---------- ------------------------------ ----------
         1 +000000000 00:00:44.281                44
         2 +000000000 00:04:32.488               272
         3 +000000000 00:51:39.969              3100
         4 +000000000 01:00:01.675              3602
         5 +000000000 00:59:01.697              3542

That’s much better. But I notice that, in snapshot 1 for example, the elapsed time was 44.281 seconds and in my CAST version it’s only 44 seconds. In casting to the DATA datatype there has been some rounding. Maybe that isn’t an issue, but surely there’s a way to keep that extra accuracy?

Here’s the answer I came up with – using EXTRACT:

SQL> select SNAP_ID, END_INTERVAL_TIME - BEGIN_INTERVAL_TIME as elapsed,
  2  (cast(END_INTERVAL_TIME as date) - cast(BEGIN_INTERVAL_TIME as date))
  3    *86400 as elapsed2,
  4  (extract(day from END_INTERVAL_TIME)-extract(day from BEGIN_INTERVAL_TIME))*86400 +
  5  (extract(hour from END_INTERVAL_TIME)-extract(hour from BEGIN_INTERVAL_TIME))*3600 +
  6  (extract(minute from END_INTERVAL_TIME)-extract(minute from BEGIN_INTERVAL_TIME))*60 +
  7  (extract(second from END_INTERVAL_TIME)-extract(second from BEGIN_INTERVAL_TIME)) as elapsed3
  8  from dba_hist_snapshot
  9  where snap_id < 6
 10  order by 1;
   SNAP_ID ELAPSED                          ELAPSED2   ELAPSED3
---------- ------------------------------ ---------- ----------
         1 +000000000 00:00:44.281                44     44.281
         2 +000000000 00:04:32.488               272    272.488
         3 +000000000 00:51:39.969              3100   3099.969
         4 +000000000 01:00:01.675              3602   3601.675
         5 +000000000 00:59:01.697              3542   3541.697

Not particularly simple, but at least accurate. I’m happy to be told that there’s an easier way?

Why?

Why am I doing this? Because I am trying to look at the performance of some benchmarks I am working on. The load generation tool creates regular AWR snapshots so I want to look at the peak IO rates for each snapshot to save myself generating a million AWR reports.

I am specifically interested in the statistics redo sizephysical reads, and physical writes from DBA_HIST_SYSSTAT. The tests aim to push each of these metrics (independently though, not all at the same time… yet!)

With that in mind, and with thanks to Ludovico Caldara for some code on which to build, here is the SQL that I am using to view the performance in each snapshot. First the output though, truncated to save some space on the screen:

SNAPSHOTID SNAPSHOTTIME    REDO_MBSEC REDO_GRAPH           READ_MBSEC READ_GRAPH           WRITE_MBSEC WRITE_GRAPH
---------- --------------- ---------- -------------------- ---------- -------------------- ----------- --------------------
       239 30-AUG 13:37:14     384.83 ******                        0 *                            .04 ******
       240 30-AUG 13:38:07     284.27 ****                          0 *                            .03 ****
       241 30-AUG 13:42:14     296.62 ****                          0                              .03 ****
       242 30-AUG 13:43:00    1242.08 ********************          0 *                            .12 ********************
       243 30-AUG 13:47:14     258.75 ****                          0                              .02 ****
       244 30-AUG 13:48:28     866.83 **************                0 *                            .08 *************
       245 30-AUG 13:52:14     456.24 *******                       0                              .04 *******
       246 30-AUG 13:54:43     773.61 ************                  0                              .07 ************
       247 30-AUG 13:57:38     624.23 **********                    0                              .06 *********
       248 30-AUG 14:00:22     613.98 *********                     0                              .05 *********

I’m currently running redo generation tests so I’m only interested in the redo size metric and the calculation of redo per second, i.e. column 3. I can use the graph at column 4 to instantly see which snapshot I need to look at: 243. That’s the one where redo was being generated at over 1.2Gb/sec – not bad for a two-socket machine attached to a single Violin 6616 array.

Now for the SQL… I warn you now, it’s a bit dense!

-- for educational use only - use at your own risk!
-- display physical IO statistics from DBA_HIST_SYSSTAT
-- specifically redo size, physical reads and physical writes

set lines 140 pages 45
accept num_days prompt 'Enter the number of days to report on [default is 0.5]: '
set verify off

SELECT redo_hist.snap_id AS SnapshotID
,      TO_CHAR(redo_hist.snaptime, 'DD-MON HH24:MI:SS') as SnapshotTime
,      ROUND(redo_hist.statval/elapsed_time/1048576,2) AS Redo_MBsec
,      SUBSTR(RPAD('*', 20 * ROUND ((redo_hist.statval/elapsed_time) / MAX (redo_hist.statval/elapsed_time) OVER (), 2), '*'), 1, 20) AS Redo_Graph
,      ROUND(physical_read_hist.statval/elapsed_time/1048576,2) AS Read_MBsec
,      SUBSTR(RPAD('*', 20 * ROUND ((physical_read_hist.statval/elapsed_time) / MAX (physical_read_hist.statval/elapsed_time) OVER (), 2), '*'), 1, 20) AS Read_Graph
,      ROUND(physical_write_hist.statval/elapsed_time/1048576,2) AS Write_MBsec
,      SUBSTR(RPAD('*', 20 * ROUND ((physical_write_hist.statval/elapsed_time) / MAX (physical_write_hist.statval/elapsed_time) OVER (), 2), '*'), 1, 20) AS Write_Graph
FROM (SELECT s.snap_id
            ,g.value AS stattot
            ,s.end_interval_time AS snaptime
            ,NVL(DECODE(GREATEST(VALUE, NVL(lag (VALUE) OVER (PARTITION BY s.dbid, s.instance_number, g.stat_name
                 ORDER BY s.snap_id), 0)), VALUE, VALUE - LAG (VALUE) OVER (PARTITION BY s.dbid, s.instance_number, g.stat_name
                     ORDER BY s.snap_id), VALUE), 0) AS statval
            ,(EXTRACT(day FROM s.end_interval_time)-EXTRACT(day FROM s.begin_interval_time))*86400 +
             (EXTRACT(hour FROM s.end_interval_time)-EXTRACT(hour FROM s.begin_interval_time))*3600 +
             (EXTRACT(minute FROM s.end_interval_time)-EXTRACT(minute FROM s.begin_interval_time))*60 +
             (EXTRACT(second FROM s.end_interval_time)-EXTRACT(second FROM s.begin_interval_time)) as elapsed_time
        FROM dba_hist_snapshot s,
             dba_hist_sysstat g,
             v$instance i
       WHERE s.snap_id = g.snap_id
         AND s.begin_interval_time >= sysdate-NVL('&num_days', 0.5)
         AND s.instance_number = i.instance_number
         AND s.instance_number = g.instance_number
         AND g.stat_name = 'redo size') redo_hist,
     (SELECT s.snap_id
            ,g.value AS stattot
            ,NVL(DECODE(GREATEST(VALUE, NVL(lag (VALUE) OVER (PARTITION BY s.dbid, s.instance_number, g.stat_name
                 ORDER BY s.snap_id), 0)), VALUE, VALUE - LAG (VALUE) OVER (PARTITION BY s.dbid, s.instance_number, g.stat_name
                     ORDER BY s.snap_id), VALUE), 0) AS statval
        FROM dba_hist_snapshot s,
             dba_hist_sysstat g,
             v$instance i
       WHERE s.snap_id = g.snap_id
         AND s.begin_interval_time >= sysdate-NVL('&num_days', 0.5)
         AND s.instance_number = i.instance_number
         AND s.instance_number = g.instance_number
         AND g.stat_name = 'physical read total bytes') physical_read_hist,
     (SELECT s.snap_id
            ,g.value AS stattot
            ,NVL(DECODE(GREATEST(VALUE, NVL(lag (VALUE) OVER (PARTITION BY s.dbid, s.instance_number, g.stat_name
                 ORDER BY s.snap_id), 0)), VALUE, VALUE - LAG (VALUE) OVER (PARTITION BY s.dbid, s.instance_number, g.stat_name
                     ORDER BY s.snap_id), VALUE), 0) AS statval
        FROM dba_hist_snapshot s,
             dba_hist_sysstat g,
             v$instance i
       WHERE s.snap_id = g.snap_id
         AND s.begin_interval_time >= sysdate-NVL('&num_days', 0.5)
         AND s.instance_number = i.instance_number
         AND s.instance_number = g.instance_number
         AND g.stat_name = 'physical write total bytes') physical_write_hist
WHERE redo_hist.snap_id = physical_read_hist.snap_id
  AND redo_hist.snap_id = physical_write_hist.snap_id
ORDER BY 1;