SLOB: PL/SQL Commit Optimization
May 20, 2013 10 Comments
I ran some SLOB tests over the weekend using the new SLOBv2 kit and noticed some interesting results. I was using SLOB to generate physical I/O but the “anomaly” is best demonstrated by putting SLOB in “Logical I/O mode”, i.e. by having a large enough buffer cache to satisfy all reads.
I’m calling SLOB with the following configuration parameters and 32 worker processes:
UPDATE_PCT=20 RUN_TIME=30000 WORK_LOOP=1000 SCALE=10000 WORK_UNIT=256 REDO_STRESS=HEAVY LOAD_PARALLEL_DEGREE=8 SHARED_DATA_MODULUS=0
Notice the WORK_LOOP value is non-zero and the RUN_TIME is fairly large – I’m choosing to run a specific set of SLOBops rather than use elapsed time to define each test length. With WORK_LOOP at 1,000 and 32 worker processes that should generate 32,000 SLOBops. Since UPDATE_PCT is 20% I would expect to see around (32,000 * 20%) = 6,400 update statements. So let’s have a look at a couple of interesting statistics in the AWR report generated from this run:
Statistic Total per Second per Trans -------------------------------- ------------------ -------------- ------------- redo synch writes 97 2.0 0.0 user commits 6,400 134.5 1.0
That’s exactly the number of user commits we expected. But the number of redo synch writes is interesting…
Redo Synch Writes
When a session places a commit record into the log buffer it posts the log writer process and then puts itself into a log file sync wait until LGWR notifies it that the record has been written to persistent storage. Actually there are times when the session will not post LGWR (because it can see via a flag that LGWR is already writing) but one thing it always does is increment the counter redo synch writes. So in the above AWR output we would expect to see a matching number of redo synch writes to user commits… yet we don’t. Why?
There’s a little-known optimization in Oracle PL/SQL which means that Oracle will not always wait for the log buffer flush to complete, but will instead carry on processing – effectively sacrificing the D (durability) of ACID compliance. This is best explained by Jonathan Lewis in Chapter 6 of his excellent book Oracle Core – if you haven’t read it, consider putting it at the top of your reading list.
Because SLOB’s engine is a PL/SQL block containing a WHILE … LOOP, Oracle decides that the concept of durability can be rather loosely defined to be at the level of the PL/SQL block rather than the transactions being created within it. According to Jonathan, one way of persuading Oracle not to use this optimization is to use a database link; so let’s modify the slob.sql update statement to include the use of a loopback database link and see if the number of redo synch writes now rises to around 6,400:
Statistic Total per Second per Trans -------------------------------- ------------------ -------------- ------------- redo synch writes 6,564 302.3 0.5 user commits 12,811 590.0 1.0
Indeed it does… but now the number of user commits has doubled, presumably as the result of Oracle performing a two-phase commit (Oracle doesn’t know the loopback database link points to the same database so assumes the transactions are distributed).
Conclusion
I blogged this because I found it interesting, rather than because I had a point I was trying to prove. However, if there were to be any conclusions to this entry they would be the following:
- SLOB is a great tool for experimenting with the behaviour of Oracle under load
- Jonathan’s Oracle Core book is essential reading for anyone who wants to understand Oracle to a deeper level
It’s probably also worth keeping in mind that SLOB’s use of PL/SQL blocks may result in slightly different behaviour from the log writer than you might see from alternative load generation tools or applications which generate I/O.
Check out the wait/nowait option in COMMIT. You can use this to control this within PLSQL.
Thanks Connor. it’s a pleasure to have you visiting my blog. I saw you present at the UKOUG once where you hatched a plan with the audience to confuse late-comers with a joke that didn’t make sense. Half way through your presentation you told this half-joke punchline and of course those of us that were “in” on it burst out laughing while the late arrivals looked utterly confused. You could see them thinking, “That’s not funny, so why is everyone laughing?”
I don’t know whether you do that regularly or not but it sticks in my mind as one of the most entertaining Oracle talks I’ve ever been to.
I have tried some variations on commit syntax, even commit work write immediate nowait;
But I can’t get log file sync to look ‘normal’ to me. Even ‘user commits’ look really low from my slob tests. Any ideas.
Hi – you’ll have to give me some more information. What sort of SLOB test are you running – and which version of SLOB? What parameters are you using? What do you see in the AWR reports? My email address is in the About Me section if you want to send me anything.
thanks for the reply. I am using slob 2, 2013.05.05.slob2.tar. I am running 100 threads with runit.sh and I changed slob.sql after both the ‘heavy’ and the ‘light’ update statement to commit this way on my most recent test:
commit work write immediate nowait;
And the slob.conf, except for my tnsnames stuff is this:
UPDATE_PCT=5
RUN_TIME=180
WORK_LOOP=0
SCALE=500000
WORK_UNIT=256
REDO_STRESS=HEAVY
LOAD_PARALLEL_DEGREE=32
SHARED_DATA_MODULUS=0
In AWR I was hoping to see transactions committed, user commit, to be the same as the execution count for the update. And log file sync would be the same also, to be more similar to the lgwr behavior we usually see from our applications. But in AWR, with this and a few other commit syntax attempts I get lower counts for commits and log file sync. This last run showed that UPDATE executed 9,928 times. But AWR shows 1,107 user commits and only 134 log file syncs.
Thanks again
100 threads with a SCALE value of half a million is a very large workload. Does your host have the CPU capacity to run this without being saturated? How many processors do you have and of what type? What does the run queue look like during the test?
What is it that you are aiming to test by setting the NOWAIT option for commits?
sure, we have plenty of cpu capacity in the RAC cluster. I am watching that. I am aiming to test the suggestion made above by Connor regarding wait/nowait. I was thinking the implication in your blog conversation was that some commit syntax change along those lines might force log file sync for every commit that slob’s pl/sql script performs.thanks
Ok. Well, you are specifying immediate nowait on your commits, which means the processes will not post the log writer upon a commit and then wait on a log file sync event. You should be able to confirm this by checking the statistic “messages sent”, which will be substantially lower than for a similar test without altered commit behaviour. You should also see a much lower value for “redo synch writes”, which is all expected behaviour for COMMIT WRITE IMMEDIATE NOWAIT.
Duh, yeah, I realize nowait looks counterintuitive, but like I said I have tried various syntax combinations. That’s just what I had in the script when I posted here.
I have tried wait also. commit with wait gives me more log file syncs. But curiously, if update/commit executes 37000 times, I get about 3700 user commits and log file syncs with wait. So ‘plain’ commit; gives very, very few log file syncs and commit wait gives more log file syncs, but still only for about 10% of the commits.
could this still be the pl/sql optimization that your blog discusses above? I am just looking for a way to ‘turn off’ that optimization and hoping you had found one.
I think it probably is the optimization that Jonathan Lewis discusses, although if you want to be sure you should probably ask the master himself. I certainly don’t know of a way that this optimization can be “turned off”…