Oracle AWR Reports: When IOStats Lie

graph

If you’ve been unfortunate enough to follow my dithering on Twitter recently you’ll know that I’ve been lurching between thinking that there is and isn’t a problem with Oracle’s tracking of I/O statistics in its AWR reports.

I’m now convinced there is a problem, but I can’t work out what causes it… so step 1 is to describe the problem here, after which step 2 will probably be to sit back and hope someone far more intelligent than me will solve it.

But first some background:

AWR I/O Statistics

I’ve written about the I/O statistics contained in Oracle AWR Reports before, so I won’t repeat myself too much other than to highlight two critical areas which we’ll focus on today: Instance Statistics and IOStat Summaries. By the way, the format of AWR reports changed in 11.2.0.4 and 12c to include a new IO Profile section, but today we’re covering reports from 11.2.0.3.

First of all, here are some sensible statistics. I’m going to show you the IOStat by Function Summary section of a report from a real life database:

IOStat by Function summary
-> 'Data' columns suffixed with M,G,T,P are in multiples of 1024
    other columns suffixed with K,M,G,T,P are in multiples of 1000
-> ordered by (Data Read + Write) desc

                Reads:   Reqs   Data    Writes:  Reqs   Data    Waits:    Avg  
Function Name   Data    per sec per sec Data    per sec per sec Count    Tm(ms)
--------------- ------- ------- ------- ------- ------- ------- ------- -------
Buffer Cache Re   14.9G   504.1 4.22491      0M     0.0      0M 1690.3K     0.0
Direct Reads      12.2G     3.5 3.45612      0M     0.0      0M       0     N/A
DBWR                 0M     0.0      0M    7.9G   186.6 2.25336       0     N/A
Others             4.2G    12.1 1.18110    3.6G     2.5 1.03172   41.9K     0.1
LGWR                 1M     0.0 .000277    5.8G    11.9 1.65642   17.4K     1.0
Direct Writes        0M     0.0      0M      4M     0.0 .001110       0     N/A
TOTAL:            31.2G   519.7 8.86241   17.4G   201.0 4.94263 1749.6K     0.0
          -------------------------------------------------------------

This section of the report is breaking down all I/O operations into the functions that caused them (e.g. buffer cache reads, direct path reads or writes, redo log writes by LGWR, the writing of dirty buffers by DBWR and so on). Let’s ignore that level of detail now and just focus on the TOTAL row at the bottom.

To try and make this simple to describe, I’ve gone a bit crazy with the colours. In green I’ve highlighted the labels for reads and writes – and now let’s walk through the columns:

  1. The first column is the function name, but we’re just focussing on TOTAL
  2. The second column is the total amount of reads that happened in this snapshot: 31.2GB
  3. The third column is the average number of read requests per second, i.e. 519.7 read IOPS
  4. The fourth column is the average volume of data read per second, i.e. 8.86 MB/sec read throughput
  5. The fifth column is the total amount of writes that happened in this snapshot: 17.4GB
  6. The sixth column is the average number of write requests per second, i.e. 201 write IOPS
  7. The seventh column is the average volume of data written per second, i.e. 4.94 MB/sec write throughput
  8. The eighth and ninth columns are not of interest to us here

The whole section is based on the DBA_HIST_IOSTAT_FUNCTION view. What we care about today is the IOPS figures (shown in red) and the Throughput figures (shown in blue). Pay attention to the comments in the view header which show that Data columns (including throughput) are multiples of 1024 while other columns (including IOPS) are multiples of 1000. It’s interesting that the two throughput values are obviously measured in MB/sec and yet are missing the “M” suffix – I assume this “falls off the end” of the column because of the number of decimal places displayed.

Now that we have these figures explained, let’s compare them to what we see in the Instance Activity Stats section of the same AWR report:

Instance Activity Stats
-> Ordered by statistic name

Statistic                                     Total     per Second     per Trans
-------------------------------- ------------------ -------------- -------------
physical read IO requests                 1,848,576          513.3         121.5
physical read bytes                  29,160,300,544    8,096,201.8   1,917,053.5
physical read total IO requests           1,892,366          525.4         124.4
physical read total bytes            33,620,594,176    9,334,578.5   2,210,281.7
physical read total multi block              17,096            4.8           1.1
physical reads                            3,559,607          988.3         234.0

physical write IO requests                  671,728          186.5          44.2
physical write bytes                  8,513,257,472    2,363,660.5     559,677.7
physical write total IO requests            723,348          200.8          47.6
physical write total bytes           18,657,198,080    5,180,071.5   1,226,559.6
physical write total multi block             26,192            7.3           1.7
physical writes                           1,039,216          288.5          68.3

For this section we care about the per Second column because both IOPS and Throughput are measured using units per second. For both reads and writes there are two sets of statistics: those with the word total in them and those without. You can find the full description of 11gR2 statistics in the documentation, but the difference between the two is best summed up by this snippet which describes physical read total bytes:

Total size in bytes of disk reads by all database instance activity including application reads, backup and recovery, and other utilities. The difference between this value and “physical read bytes” gives the total read size in bytes by non-application workload.

I’ve underlined non-application workload here because this is critical. If you merely look at the Load Profile section at the top of an AWR report you will only see values for “application workload” I/O but this does not include stuff like RMAN backups, archive logging and so on… important stuff if you care about the actual I/O workload. For this reason, we only care about the following four statistics:

  1. physical read total IO requests (per second) = the average number of read IOPS
  2. physical read total bytes (per second) = the average read throughput in bytes per second
  3. physical write total IO requests (per second) = the average number of write IOPS
  4. physical write total bytes (per second) = the average write throughput in bytes per second

Again I’ve coloured the IOPS figures in red and the throughput figures in blue.

Tying It Together

Now that we have our two sets of values, let’s just compare them to make sure they align. The IOPS figures do not require any conversion but the throughput figures do: the values in the Instance Activity Stats section are in bytes/second and we want them to be in MB/second so we need to divide by 1048576 (i.e. 1024 * 1024).

IOStat by Function

Instance Activity Stats

Error Percentage

Read IOPS

519.7

525.4

1.09%

Write IOPS

201.0

200.8

0.10%

Read Throughput

8.86241 MB/sec

9,334,578.5 bytes/sec

8.90215 MB/sec

0.45%

Write Throughput

4.94263 MB/sec

5,180,071.5 bytes/sec

4.94010 MB/sec

0.05%

I’ve calculated the error percentages here to see how far the figures vary. It is my assumption that the Instance Activity Stats are accurate and that any margin of error in the IOStat figures comes as a result of sampling issues. The highest error percentage we see here is just over 1%, which is hardly a problem in my opinion.

Don’t Believe The Stats

So far I have no complaints about the matching of statistics in the AWR report. But now let me introduce you to the AWR report that has been puzzling me recently:

IOStat by Function summary
-> 'Data' columns suffixed with M,G,T,P are in multiples of 1024
    other columns suffixed with K,M,G,T,P are in multiples of 1000
-> ordered by (Data Read + Write) desc

                Reads:   Reqs   Data    Writes:  Reqs   Data    Waits:    Avg
Function Name   Data    per sec per sec Data    per sec per sec Count    Tm(ms)
--------------- ------- ------- ------- ------- ------- ------- ------- -------
Direct Reads       3.8T 2.7E+04    3.2G    405M     3.6 .335473       0     N/A
Direct Writes        0M     0.0      0M  164.5G  1159.0 139.519       0     N/A
Buffer Cache Re   63.2G  3177.6 53.6086      0M     0.0      0M 3292.5K    23.4
DBWR                 0M     0.0      0M      2G    45.3 1.71464       0     N/A
LGWR                 0M     0.0      0M    485M    19.2 .401739   19.7K     0.9
Others             220M    12.3 .182232     26M     0.7 .021536   15.6K    34.5
Streams AQ           0M     0.0      0M      0M     0.0      0M       1     9.0
TOTAL:             3.9T 3.0E+04    3.3G  167.4G  1227.8 141.992 3327.8K    23.3
          -------------------------------------------------------------

Instance Activity Stats
-> Ordered by statistic name

Statistic                                     Total     per Second     per Trans
-------------------------------- ------------------ -------------- -------------
physical read total IO requests          35,889,568       29,728.4       3,239.4
physical read total bytes         4,261,590,852,608 3.52999864E+09 3.8465483E+08

physical write total IO requests          1,683,381        1,394.4         151.9
physical write total bytes          205,090,714,624  169,882,555.1  18,511,663.0

Again I’ve coloured the IOPS measurements in red and the throughput measurements in blue. And as before we need to convert the bytes per second values shown in the Instance Activity Stats section to MB/sec as shown in the IOStat by Function Summary section.

IOStat by Function

Instance Activity Stats

Error Percentage

Read IOPS

30,000

29,728.4

0.91%

Write IOPS

1227.8

1,394.4

11.95%

Read Throughput

3.3 GB/sec

3,379.2 MB/sec

3,529,998,640 bytes/sec

3,366.469 MB/sec

0.38%

Write Throughput

141.992 MB/sec

169,882,555.1 bytes/sec

162.01263 MB/sec

12.36%

Do you see what’s bugging me here? The write values for both IOPS and throughput are out by over 10% when I compare the values in IOStat by Function against the Instance Activity Stats. Ten percent is a massive margin of error at this level – we’re talking 20MB/sec. To translate that into something easier to understand, if 20MB/sec were sustained over a 24 hour period it would amount to over 1.6TB of data. I’ve seen smaller data warehouses.

So why is this happening? Unfortunately I don’t have access to the system where this AWR report was created, so I cannot tell if, for example, there was an instance restart between the start and end snapshots (although the elapsed time of the report was just 20 minutes so it seems unlikely).

The truth is I don’t know. Which is why I’m writing about it here… if you think you have the answer, or just as importantly if you see the same behaviour, let me know!

Advertisements

Oracle AWR Reports: When Averages Lie

truth-sign

I was recently contacted by a customer who runs a busy ERP system on their Oracle database. Their users have been screaming for some time about unpredictable performance, where application screens which would usually open in under a second suddenly take an age. ERP systems are legendary for this, of course, because as each screen loads the application tends to fire off a glut of queries to populate all the various fields, graphs and so on. When performance problems strike, the elapsed time for each query rises, which usually results in “stuff going slowly”. And unfortunately, because of something known to psychologists as the availability heuristic, users tend not to remember the times when performance was acceptable… just the occasions when they were left frustrated.

Initial Investigation

The first thing I ask for in situations like this is a set of AWR reports. Customers are generally happy to part with such information, whereas the running of bespoke SQL scripts to harvest data can often cause delays as the scripts have to be investigated and tested. AWR reports, I’m sure you know, contain information about the behaviour and performance of the database between two points in time – and much of the data is averaged. For this reason, I always always ask for a set of multiple, consecutive reports covering the problematic period where each report covers no more than one hour. I’d prefer less than one hour per report, but the default configuration of Oracle results in hourly snapshots so that’s usually the limit… as it was for this customer.

As a first pass, I like to run the AWR reports through my awr-parser.sh script. This parses them all and produces a CSV file as output. This can then be easily used to graph information such the amount of time spent working on CPU versus time spent waiting on I/O requests to complete. Often this is enough to get a basic idea of what’s happening….

awr-averages-lie-work-vs-wait

In the above graph, each column shows data from a single (hourly) AWR report; blue represents DB CPU while red represents User I/O wait time. If we are looking to increase the user experience with low-latency storage then we would hope to find two things: a) lots of User I/O wait time, and b) average latency which is higher than we would expect a flash array to deliver.

Unfortunately, the first requirement isn’t really there in the above graph. There just isn’t enough red, except for peaks towards the end of the day where there appears to be some batch job or backup taking place. A peak at the AWR reports shows that this is in fact a scheduled job which performs an online rebuild of an index. With more capable storage we could make this go a lot faster, so would that perhaps help the customer?

Sadly, the answer was no, it wouldn’t. The customer specifically stated that it’s the online day where they struggle and not the index rebuild. Hmmm.

More Graphs

Let’s have a look at some of the other information that came out of the AWR reports to see if anything interesting shows up:

awr-averages-lie-cpu-utilization

Plenty of spare CPU capacity here. In fact, if you think about how much Oracle licenses cost per CPU – and how much CPU is not being used… it looks like this database server represents a significant over-investment.

awr-averages-lie-throughput

Not much in the way of I/O… there are spikes here and there, including those for the index rebuilds, but overall the I/O rates are very low and for many samples we barely scratch 10MiB/sec.

So what have we got? A system which hardly performs any I/O and doesn’t seem to spend any time waiting on I/O during the period the users are complaining about. Time to go back to the customer and ask for some more detail…

Second Attempt

After a plea for more details, I received another batch of AWR reports covering a single 24 hour period, as well as the crucial information that complaints were received between 5am and 6am. Perfect. Let’s have a look at that throughput graph again:

awr-averages-lie-2nd-throughput

Again, the throughput is very low – including around the critical 5am-6am period where complaints were received – which might fool us into thinking any problems are not related to the storage. But what about if we plot a graph of I/O Operations Per Second (IOPS)?

awr-averages-lie-2nd-iops

Bingo! There’s a big spike of read IOPS at exactly the time where problems were reported, which we can put down to a large amount of random physical reads. We know from the customer that they have a multi-controller monolithic disk array – the top of the range product from the world’s biggest disk array vendor. But it’s still a disk array, which means random I/O frequently results in high latency as somewhere a disk head has to move across the platter and then the platter has to rotate to the correct sector.

So we should be able to just check the average wait time for random physical reads (the wait event db file sequential read) and it will reveal itself as the problem. Right? We’ll expect to see waits in the region of 4-8ms or perhaps even worse if it’s really struggling.

Here’s a graph of the average user I/O wait time seen across each AWR report – don’t forget we’re especially interested in the troublesome 5am-6am period:

awr-averages-lie-average-wait

Not what we expected. The 5am-6am slot is showing the best latency figures for the entire day! And this is when users are complaining?

Averages Lie

A disk array like this usually has a reasonably large DRAM cache and quite possibly a set of SSDs being used for further caching or tiering of data. If a physical read can be satisfied from the cache, it will likely be serviced in under a millisecond. But if we experience a cache miss, we’re forced to pay the penalty of mechanical latency and wait for many milliseconds. Thus there are two ranges of response times: sub-millisecond and many milliseconds.

Average wait times won’t show this because they are simply the average of all waits over the period of the AWR report (an entire hour in this case). What if many of the reads are cache hits but some still miss? Maybe it’s possible that there are enough misses to cause users to complain, but enough hits to make the average look acceptable?

Luckily, there is another section in the AWR report which can help here: the Wait Event Histogram. Here’s the histograms covering db file sequential read waits (random physical reads) for our 24 hour period:

$ cat header.txt; for file in `ls -1`; do
grep -A 40 "Wait Event Histogram" $file | grep "db file sequential read" | head -1
done
                                                    % of Waits                  
                                 -----------------------------------------------
                           Total                                                
Event                      Waits  <1ms  <2ms  <4ms  <8ms <16ms <32ms  <=1s   >1s
-------------------------- ----- ----- ----- ----- ----- ----- ----- ----- -----  
db file sequential read     3329  99.2    .8    .0    .0    .0    .0    .0    .0
db file sequential read     3487  99.4    .6    .0    .0
db file sequential read    35.4K  21.0   1.2   2.5  23.2  39.0  11.9   1.1
db file sequential read    82.9K  34.6   1.8   3.4  15.9  31.9  11.2   1.2
db file sequential read    428.6  67.4   2.1   2.5  10.1  15.2   2.4    .3
db file sequential read    4440.  92.6   1.3   1.2   1.9   2.6    .4    .0
db file sequential read    2013K  91.7   1.4   1.2   2.1   3.0    .5    .0
db file sequential read    131.7  45.7   1.1   2.8  15.6  29.1   5.4    .3
db file sequential read    76.3K  20.7   1.7   2.8  21.7  44.0   8.7    .4
db file sequential read    83.7K  31.1   1.1   2.4  16.6  38.1  10.0    .7
db file sequential read    430.4  79.6   1.5   2.0   4.4   9.3   2.9    .3
db file sequential read    106.6  32.3   1.7   3.0  15.4  36.1  10.3   1.2
db file sequential read      94K  23.8   1.8   2.7  16.6  41.7  12.1   1.2
db file sequential read    248.6  36.0   1.8   2.9  14.8  34.5   8.9   1.1
db file sequential read    202.5  31.9   1.4   2.0   9.0  38.0  16.1   1.6
db file sequential read    237.7  20.2   1.9   2.9  17.7  43.5  12.5   1.2
db file sequential read    49.5K   9.4   1.0   2.7  24.1  48.0  13.6   1.1
db file sequential read    75.6K  42.4   1.5   3.0  16.9  29.9   5.9    .5
db file sequential read    88.7K  61.4   1.3   2.5  12.4  18.2   3.8    .4
db file sequential read    644.3  84.3   1.0   1.5   4.3   7.3   1.5    .2
db file sequential read    61.3K  34.9   6.9   4.4  11.0  31.2  10.5   1.1
db file sequential read      330  96.7   3.1    .2    .0    .0
db file sequential read      12K  88.7   9.5   1.2    .4    .2    .1    .0    .0

Now the problem becomes clear. Although many reads are hitting the cache (<1ms bucket), there are an awful lot that are missing. The <2ms and <4ms buckets are virtually empty and then around <8ms we start to see big numbers again. Those are cache misses which incur the penalty of seek time on a hard disk drive somewhere in the array.

What’s more, it’s clear that the array is under some pressure because in many of the AWR reports there are substantial waits in the <16ms and above buckets. I’ve highlighted in red the occasions where over one third of read requests took over 8ms. That’s exactly the sort of behaviour that is making people replace disks with flash.

Let’s visualise the above data. I’ve gone back to the older set of AWR reports and graphed the histogram buckets to see if we can get a feel for the distribution of waits:

awr-averages-lie-random-read-histogram

As you can see, I’ve tried to go for a heat map-style colour scheme, with green indicating cache hits on the storage array and the colours moving from yellow through orange to red indicating cache misses which result in disk drive seek time. There are also some real outliers in black where requests took over one thousand milliseconds to be serviced (ouch).

Another way to look at it is to make the picture clearer by reducing the detail. In the graph below I’ve simply assumed that everything sub-millisecond is a storage array cache hit and everything else is a miss:

awr-averages-lie-cache-hits-and-misses

Clearly there are a lot of physical reads paying the price of mechanical latency here.

Conclusion

What this customer does next is up for debate – there are more than just technical issues to consider. But the point of this post is to illustrate that while average wait times are great for getting a rough feel of how a system is behaving, they also offer huge scope to lose details. And sometimes those details are important. The devil, as they say, is truly in the detail…

Update: January 2015 – I have now added the ability to read Wait Event Histograms to my awr-parser.sh script which you can find on my Useful Scripts page…

New AWR Report Format: Oracle 11.2.0.4 and 12c

statistics-and-graphs

This is a post about Oracle Automatic Workload Repository (AWR) Reports. If you are an Oracle professional you doubtless know what these are – and if you have to perform any sort of performance tuning as part of your day job it’s likely you spend a lot of time immersed in them. Goodness knows I do – a few weeks ago I had to analyse 2,304 of them in one (long) day. But for anyone else, they are (huge) reports containing all sorts of information about activities that happened between two points of time on an Oracle instance. If that doesn’t excite you now, please move along – there is nothing further for you here.

truthAWR Reports have been with us since the introduction of the Automatic Workload Repository back in 10g and can be considered a replacement for the venerable Statspack tool. Through each major incremental release the amount of information contained in an AWR Report has grown; for instance, the 10g reports didn’t even show the type of operating system, but 11g reports do. More information is of course a good thing, but sometimes it feels like there is so much data now it’s hard to find the truth hidden among all the distractions.

I recently commented in another post about the change in AWR report format introduced in 11.2.0.4. This came as a surprise to me because I cannot previously remember report formatting changing mid-release, especially given the scale of the change. Not only that, but I’m sure I’ve seen reports from 11.2.0.3 in the new format too (implying it was added via a patch set update), although I can’t find the evidence now so am forced to concede I may have imagined it. The same new format also continues into 12.1.0.1 incidentally.

The 11.2.0.4 New Features document doesn’t mention anything about a new report format. I can’t find anything about it on My Oracle Support (but then I can never find anything about anything I’m looking for on MOS these days). So I’m taking it upon myself to document the new format and the changes introduced – as well as point out a nasty little issue that’s caught me out a couple of times already.

Comparing Old and New Formats

From what I can tell, all of the major changes except one have taken place in the Report Summary section at the start of the AWR report. Oracle appears to have re-ordered the subsections and added a couple of new ones:

  • Wait Classes by Total Wait Time
  • IO Profile

The new Wait Classes section is interesting because there is already a section called Foreground Wait Class down in the Wait Event Statistics section of the Main Report, but the additional section appears to include background waits as well. The IO Profile section is especially useful for people like me who work with storage – and I’ve already blogged about it here.

In addition, the long-serving Top 5 Timed Foreground Events section has been renamed and extended to become Top 10 Foreground Events by Total Wait Time.

Here are the changes in tabular format:

Old Format

New Format

Cache Sizes

Load Profile

Instance Efficiency Percentages

Shared Pool Statistics

Top 5 Timed Foreground Events

Host CPU

Instance CPU

Memory Statistics

Time Model Statistics

Load Profile

Instance Efficiency Percentages

Top 10 Foreground Events by Total Wait Time

Wait Classes by Total Wait Time

Host CPU

Instance CPU

IO Profile

Memory Statistics

Cache Sizes

Shared Pool Statistics

Time Model Statistics

I also said there was one further change outside of the Report Summary section. It’s the long-standing Instance Activity Stats section, which has now been divided into two:

Old Format

New Format

Instance Activity Stats

Key Instance Activity Stats

Other Instance Activity Stats

I don’t really understand the point of that change, nor why a select few statistics are deemed to be more “key” than others. But hey, that’s the mystery of Oracle, right?

Tablespace / Filesystem IO Stats

Another, more minor change, is the addition of some cryptic-looking “1-bk” columns to the two sections Tablespace IO Stats and File IO Stats:

Tablespace
------------------------------
          Av       Av     Av      1-bk  Av 1-bk          Writes   Buffer  Av Buf
  Reads   Rds/s  Rd(ms) Blks/Rd   Rds/s  Rd(ms)  Writes   avg/s    Waits  Wt(ms)
------- ------- ------- ------- ------- ------- ------- ------- -------- -------
UNDOTBS1
8.4E+05      29     0.7     1.0 6.3E+06    29.2       1     220    1,054     4.2
SYSAUX
 95,054       3     0.8     1.0  11,893     3.3       1       0        1    60.0
SYSTEM
    745       0     0.0     1.0   1,055     0.0       0       0       13     0.8
USERS
    715       0     0.0     1.0     715     0.0       0       0        0     0.0
TEMP
      0       0     0.0     N/A       7     0.0       0       0        0     0.0

I have to confess it took me a while to figure out what they meant – in the end I had to consult the documentation for the view DBA_HIST_FILESTATXS:

Column Datatype NULL Description
SINGLEBLKRDS NUMBER Number of single block reads
SINGLEBLKRDTIM NUMBER Cumulative single block read time (in hundredths of a second)

Aha! So the AWR report is now giving us the number of single block reads (SINGLEBLKRDS) and the average read time for them (SINGLEBLKRDTIM / SINGLEBLKRDS). That’s actually pretty useful information for testing storage, since single block reads tell no lies. [If you want to know what I mean by that, visit Frits Hoogland’s blog and download his white paper on multiblock reads…]

Top 10: Don’t Believe The Stats

One thing you might want to be wary about is the new Top 10 section… Here are the first two lines from mine after running a SLOB test:

Top 10 Foreground Events by Total Wait Time
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                            Tota    Wait   % DB
Event                                 Waits Time Avg(ms)   time Wait Class
------------------------------ ------------ ---- ------- ------ ----------
db file sequential read        3.379077E+09 2527       1   91.4 User I/O
DB CPU                                      318.           11.5

Now, normally when I run SLOB and inspect the post-run awr.txt file I work out the average wait time for db file sequential read so I can work out the latency. Since AWR reports do not have enough decimal places for the sort of storage I use (the wait shows simply as 0 or 1), I have to divide the total wait time by the number of waits. But in the report above, the total wait time of 2,527 divided by 3,379,077,000 waits gives me an average of 0.000747 microseconds. Huh? Looking back at the numbers above it’s clear that the Total Time column has been truncated and some of the digits are missing. That’s bad news for me, as I regularly use scripts to strip this information out and parse it.

This is pretty poor in my opinion, because there is no warning and the number is just wrong. I assume this is an edge case because the number of waits contains so many digits, but for extended SLOB tests that’s not unlikely. Back in the good old Top 5 days it looked like this, which worked fine:

Top 5 Timed Foreground Events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                              Avg
                                                          wait   % DB
Event                                 Waits     Time(s)   (ms)   time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
db file sequential read         119,835,425      50,938      0   84.0 User I/O
latch: cache buffers lru chain   20,051,266       6,221      0   10.3 Other

Unfortunately, in the new 11.2.0.4 and above Top 10 report, the Total Time column simply isn’t wide enough. Instead, I have to scan down to the Foreground Wait Events section to get my true data:

                                                             Avg
                                        %Time Total Wait    wait    Waits   % DB
Event                             Waits -outs   Time (s)    (ms)     /txn   time
-------------------------- ------------ ----- ---------- ------- -------- ------
db file sequential read    3.379077E+09     0  2,527,552       1     11.3   91.4

This is something worth looking out for, especially if you also use scripts to fetch data from AWR files. Of course, the HTML reports don’t suffer from this problem, which just makes it even more annoying as I can’t parse HTML reports automatically (and thus I despise them immensely).

12.1.0.2 AWR Reports

One final thing to mention is the AWR report format of 12.1.0.2 (which was just released at the time of writing). There aren’t many changes from 12.1.0.1 but just a few extra lines have crept in, which I’ll highlight here. In the main, they are related to the new In Memory Database option.

Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               0.3               4.8      0.00      0.02
              DB CPU(s):               0.1               1.2      0.00      0.00
      Background CPU(s):               0.0               0.1      0.00      0.00
      Redo size (bytes):          50,171.6         971,322.7
  Logical read (blocks):             558.6          10,814.3
          Block changes:             152.2           2,947.0
 Physical read (blocks):              15.1             292.0
Physical write (blocks):               0.2               4.7
       Read IO requests:              15.1             292.0
      Write IO requests:               0.2               3.3
           Read IO (MB):               0.1               2.3
          Write IO (MB):               0.0               0.0
           IM scan rows:               0.0               0.0
Session Logical Read IM:
             User calls:              16.1             312.0
           Parses (SQL):              34.0             658.0
      Hard parses (SQL):               4.6              88.0
     SQL Work Area (MB):               0.9              17.2
                 Logons:               0.1               1.7
         Executes (SQL):              95.4           1,846.0
              Rollbacks:               0.0               0.0
           Transactions:               0.1

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Buffer Nowait %:   97.55       Redo NoWait %:  100.00
            Buffer  Hit   %:   97.30    In-memory Sort %:  100.00
            Library Hit   %:   81.75        Soft Parse %:   86.63
         Execute to Parse %:   64.36         Latch Hit %:   96.54
Parse CPU to Parse Elapsd %:   19.45     % Non-Parse CPU:   31.02
          Flash Cache Hit %:    0.00

<snip!>

Cache Sizes                       Begin        End
~~~~~~~~~~~                  ---------- ----------
               Buffer Cache:       960M       960M  Std Block Size:         8K
           Shared Pool Size:     4,096M     4,096M      Log Buffer:   139,980K
             In-Memory Area:         0M         0M

One other thing of note is that the Top 10 section now (finally) displays average wait times to two decimal places. This took a surprising amount of time to arrive, but it’s most welcome:

Top 10 Foreground Events by Total Wait Time
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                           Total Wait       Wait   % DB Wait
Event                                Waits Time (sec)    Avg(ms)   time Class
------------------------------ ----------- ---------- ---------- ------ --------
db file parallel read               63,157      828.6      13.12   86.1 User I/O
DB CPU                                          234.2              24.3
db file sequential read            113,786       67.8       0.60    7.0 User I/O

Oracle AWR Reports: Understanding I/O Statistics

truth

One consequence of my job is that I spend a lot of time looking at Oracle Automatic Workload Repository reports, specifically at information about I/O. I really do mean a lot of time (honestly, I’m not kidding, I have had dreams about AWR reports). One thing that comes up very frequently is the confusion relating to how the measurements of IOPS and throughput are displayed in the AWR report Load Profile section. The answer, is that they aren’t. Well, not exactly… let me explain.

Physical Read and Write I/O

Right at the top of an AWR report, just after the Database and Host details, you’ll find the familiar Load Profile section. Until recently, it had changed very little through the releases of Oracle since its introduction in 10g. Here’s a sample from 11g Release 2:

Load Profile              Per Second    Per Transaction   Per Exec   Per Call
~~~~~~~~~~~~         ---------------    --------------- ---------- ----------
      DB Time(s):               44.1                0.4       0.07       1.56
       DB CPU(s):                1.6                0.0       0.00       0.06
       Redo size:      154,034,644.3        1,544,561.0
    Logical read:          154,436.1            1,548.6
   Block changes:           82,491.9              827.2
  Physical reads:              150.6                1.5
 Physical writes:           18,135.2              181.9
      User calls:               28.3                0.3
          Parses:              142.7                1.4
     Hard parses:                7.5                0.1
W/A MB processed:                2.1                0.0
          Logons:                0.1                0.0
        Executes:              607.7                6.1
       Rollbacks:                0.0                0.0
    Transactions:               99.7

In my role I have to look at the amount of I/O being driven by a database, so I can size a solution based on flash memory. This means knowing two specific metrics: the number of I/Os per second (IOPS) and the throughput (typically measured in MB/sec). I need to know these values for both read and write I/O so that I can understand the ratio. I also want to understand things like the amount of random versus sequential I/O, but that’s beyond the scope of this post.

The first thing to understand is that none of this information is shown above. There are values for Physical reads and Physical writes but these are actually measured in database blocks. Even if we knew the block size (which we don’t because Oracle databases can have multiple block sizes) we do not know how many I/Os were required. Ten Oracle blocks could be written in one sequential I/O or ten individual “random” I/Os, completely changing the IOPS measurement. To find any of this information we have to descend into the depths of the AWR report to find the Instance Activity Stats section.

In Oracle 12c, the format of the AWR report changed, especially the AWR Load Profile section, which was modified to show the units that each measurement uses. It also includes some new lines such as Read/Write IO Requests and Read/Write IO. Here’s a sample from a 12c database (taken during a 30 second run of SLOB):

Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):              44.1               0.4      0.07      1.56
              DB CPU(s):               1.6               0.0      0.00      0.06
      Redo size (bytes):     154,034,644.3       1,544,561.0
  Logical read (blocks):         154,436.1           1,548.6
          Block changes:          82,491.9             827.2
 Physical read (blocks):             150.6               1.5
Physical write (blocks):          18,135.2             181.9
       Read IO requests:             150.3               1.5
      Write IO requests:          15,096.9             151.4
           Read IO (MB):               1.2               0.0
          Write IO (MB):             141.7               1.4
             User calls:              28.3               0.3
           Parses (SQL):             142.7               1.4
      Hard parses (SQL):               7.5               0.1
     SQL Work Area (MB):               2.1               0.0
                 Logons:               0.1               0.0
         Executes (SQL):             607.7               6.1
              Rollbacks:               0.0               0.0
           Transactions:              99.7

Now, you might be forgiven for thinking that the values highlighted in red and blue above tell me the very IOPS and throughput information I need. If this were the case, we could say that this system performed 150 physical read IOPS and 15k write IOPS, with throughput of 1.2 MB/sec reads and 141.7 MB/sec writes. Right?

But that isn’t the case – and to understand why, we need to page down five thousand times through the increasingly-verbose AWR report until we eventually find the Other Instance Activity Stats section (or just Instance Activity Stats in pre-12c reports) and see this information (edited for brevity):

Other Instance Activity Stats                  DB/Inst: ORCL/orcl  Snaps: 7-8
-> Ordered by statistic name

Statistic                                     Total     per Second     per Trans
-------------------------------- ------------------ -------------- -------------
physical read IO requests                     5,123          150.3           1.5
physical read bytes                      42,049,536    1,233,739.3      12,371.2
physical read total IO requests              37,162        1,090.3          10.9
physical read total bytes            23,001,900,544  674,878,987.9   6,767,255.2
physical read total multi block              21,741          637.9           6.4
....
physical write IO requests                  514,547       15,096.9         151.4
physical write bytes                  5,063,483,392  148,563,312.9   1,489,698.0
physical write total IO requests            537,251       15,763.0         158.1
physical write total bytes           18,251,309,056  535,495,967.4   5,369,611.4
physical write total multi block             18,152          532.6           5.3

The numbers in red and blue match up with those above, albeit with the throughput values using different units of bytes/sec instead of MB/sec. But the problem is, these aren’t the “total” values – which are highlighted in green. So what are those total values showing us?

Over to the Oracle Database 12c Reference Guide:

physical read IO requests:  Number of read requests for application activity (mainly buffer cache and direct load operation) which read one or more database blocks per request. This is a subset of “physical read total IO requests” statistic.

physical read total IO requests: Number of read requests which read one or more database blocks for all instance activity including application, backup and recovery, and other utilities. The difference between this value and “physical read total multi block requests” gives the total number of single block read requests.

The values that don’t have the word total in them, i.e. the values shown in the AWR Profile section at the start of a report, are only showing what Oracle describes as “application activity“. That’s all very well, but it’s meaningless if you want to know how much your database is driving your storage. This is why the values with total in the name are the ones you should consider. And in the case of my sample report above, there is a massive discrepancy between the two: for example, the read throughput value for application activity is just 1.2 MB/sec while the total value is actually 644 MB/sec – over 500x higher! That extra non-application activity is definitely worth knowing about. (In fact, I was running a highly parallelised RMAN backup during the test just to make the point…)

[Note: There was another section here detailing how to find and include the I/O generated by redo into the totals, but after consultation with guru and legend Tanel Poder it’s come to my attention that this is incorrect. In fact, reads and writes to redo logs are included in the physical read/write total statistics…]

Oracle 12c IO Profile Section

Luckily, Oracle 12c now has a new section which presents all the information in one table. Here’s a sample extracted from the same report as above:

IO Profile                  Read+Write/Second     Read/Second    Write/Second
~~~~~~~~~~                  ----------------- --------------- ---------------
            Total Requests:          16,853.4         1,090.3        15,763.0
         Database Requests:          15,247.2           150.3        15,096.9
        Optimized Requests:               0.1             0.0             0.0
             Redo Requests:             517.5             1.2           516.3
                Total (MB):           1,154.3           643.6           510.7
             Database (MB):             142.9             1.2           141.7
      Optimized Total (MB):               0.0             0.0             0.0
                 Redo (MB):             295.7             0.0           295.7
         Database (blocks):          18,285.8           150.6        18,135.2
 Via Buffer Cache (blocks):          18,282.1           150.0        18,132.0
           Direct (blocks):               3.7             0.6             3.1

Suddenly life is more simple. You want to know the total IOPS and throughput? It’s all in one place. You want to calculate the ratio of reads to writes? Just compare the read and write columns. Happy days.

One word of warning though: there are other database processes driving I/O which may not be tracked in these statistics. I see no evidence for control file reads and writes being shown, although these are insignificant in magnitude. More significant would be I/O from the archiver process for databases running in archive log mode, as each redo log must be sequentially read and re-written out as an archive log. Are these included? Yet another possibility would be the Recovery Writer (RVWR) process which is responsible for writing flashback logs when database flashback logging is enabled. [Discussions with Jonathan Lewis suggest these stats are all included – and let’s face it, he wrote the book on the subject…!]  It all adds up… Oracle really needs to provide better clarity on what these statistics are measuring.

Conclusion

If you want to know how much I/O is being driven by your database, do not use the information in the Load Profile section of an AWR report. Use the I/O Profile section if available, or otherwise skip to the Instance Activity Stats section and look at the total values for physical reads and writes (and redo). Everything else is just lies, damned lies and (I/O) statistics.

AWR Generator

tools

As part of my role at Violin I spend a lot of time profiling customer’s databases to see how their performance varies over time. The easiest way to do this (since I often don’t have remote access) is to ask for lots of AWR reports. One single report covering a large span of time is useless, because all peaks and troughs are averaged out into a meaningless hum of noise, so I always ask for one report per snapshot period (usually an hour) covering many hours or days. And I always ask for text instead of HTML because then I can process them automatically.

That’s all well and good, but generating a hundred AWR reports is a laborious and mind-numbingly dull task. So to make things easier I’ve written a SQL script to do it. I know there are many other scripts out there to do this, but none of them met the criteria I needed – mainly that they were SQL not shell (for portability) and that they didn’t create temporary objects (such as directories).

If it is of use to anyone then I offer it up here:

https://flashdba.com/database/useful-scripts/awr-generator/

Likewise if you manage to break it, please let me know! Thanks to Paul for confirming that it works on RAC and Windows systems (you know you love testing my SQL…)