Oracle SLOB On Solaris
April 26, 2014 5 Comments
This is another guest post from my buddy Nate Fuzi, who performs the same role as me for Violin but is based in the US instead of EMEA. Nate believes that all English people live in the Dickensian London of the 19th century and speak in Cockney rhyming slang. I hate to disappoint, so have a butcher’s below and feast your mince pies on his attempts to make SLOB work on Solaris without going chicken oriental. Over to you Nate, me old china plate.
Note: The Silly Little Oracle Benchmark, or SLOB, is a Linux-only tool designed and released for the community by Kevin Closson. There are no ports for other operating systems – and Kevin has always advised that the solution for testing on another platform is to use a Linux VM and connect via TNS. The purpose of this post is simply to show what happens when you have no other choice but to try and get SLOB working natively on Solaris…
I wrestled with SLOB 2 for a couple hours last week for a demo build we had in-house to show our capabilities to a prospective customer. I should mention I’ve had great success—and ease!—with SLOB 2 previously. But that was on Linux. This was on Solaris 10—to mimic the setup the customer has in-house. No problem, I thought; there’s some C files to compile, but then there’s just shell scripts to drive the thing. What could go wrong?
Well, it would seem Kevin Closson, the creator of SLOB and SLOB 2, did his development on an OS with a better sense of humor than Solaris. The package unzipped, and the setup.sh script appeared to run successfully, but runit.sh would load up the worker threads and wait several seconds before launching them—and then immediately call it “done” and bail out, having executed on the database only a couple seconds. Huh? I had my slob.conf set to execute for 300 seconds.
I had two databases created: one with 4K blocks and one with 8K blocks. I had a tablespace created for SLOB data called SLOB4K and SLOB8K, respectively. I ran setup.sh SLOB4K 128, and the log file showed no errors. All good, I thought. Now run runit.sh 12, and it stops as quickly as it starts. Oof.
It took Bryan Wood, a much better shell script debugger (hey, I said DEbugger) than myself, to figure out all the problems.
First, there was this interesting line of output from the runit.sh command:
NOTIFY: Connecting users 1 2 3 Usage: mpstat [-aq] [-p | -P processor_set] [interval [count]] 4 5 6 7 8 9 10
Seems Solaris doesn’t like mpstat –P ALL. However it seems that on Solaris 10 the mpstat command shows all processors even without the -P option.
Next, Solaris doesn’t like Kevin’s “sleep .5” command inside runit.sh; it wants whole numbers only. That raises the question in my mind why he felt the need to check for running processes every half second rather than just letting it wait a full second between checks, but fine. Modify the command in the wait_pids() function to sleep for a full second, and that part is happy.
But it still kicks out immediately and kills the OS level monitoring commands, even though there are active SQL*Plus sessions out there. It seems on Solaris the ps –p command to report status on a list of processes requires the list of process IDs to be escaped where Linux does not. IE:
-bash-3.2$ ps -p 1 2 3 usage: ps [ -aAdeflcjLPyZ ] [ -o format ] [ -t termlist ] [ -u userlist ] [ -U userlist ] [ -G grouplist ] [ -p proclist ] [ -g pgrplist ] [ -s sidlist ] [ -z zonelist ] 'format' is one or more of: user ruser group rgroup uid ruid gid rgid pid ppid pgid sid taskid ctid pri opri pcpu pmem vsz rss osz nice class time etime stime zone zoneid f s c lwp nlwp psr tty addr wchan fname comm args projid project pset
But with quotes:
-bash-3.2$ ps -p "1 2 3" PID TTY TIME CMD 1 ? 0:02 init 2 ? 0:00 pageout 3 ? 25:03 fsflush
After some messing about, Bryan had the great idea to simply replace the command:
while ( ps -p $pids > /dev/null 2>&1 )
while ( ps -p "$pids" > /dev/null 2>&1 )
Just thought I might save someone else some time and hair pulling by sharing this info… Here are the finished file diffs:
-bash-3.2$ diff runit.sh runit.sh.original 31c30 < while ( ps -p "$pids" > /dev/null 2>&1 ) --- > while ( ps -p $pids > /dev/null 2>&1 ) 33c32 < sleep 1 --- > sleep .5 219c218 < ( mpstat 3 > mpstat.out ) & --- > ( mpstat -P ALL 3 > mpstat.out ) &