Oracle SLOB On Solaris
April 26, 2014 5 Comments
Guest Post
This is another guest post from my buddy Nate Fuzi, who performs the same role as me for Violin but is based in the US instead of EMEA. Nate believes that all English people live in the Dickensian London of the 19th century and speak in Cockney rhyming slang. I hate to disappoint, so have a butcher’s below and feast your mince pies on his attempts to make SLOB work on Solaris without going chicken oriental. Over to you Nate, me old china plate.
Note: The Silly Little Oracle Benchmark, or SLOB, is a Linux-only tool designed and released for the community by Kevin Closson. There are no ports for other operating systems – and Kevin has always advised that the solution for testing on another platform is to use a Linux VM and connect via TNS. The purpose of this post is simply to show what happens when you have no other choice but to try and get SLOB working natively on Solaris…
I wrestled with SLOB 2 for a couple hours last week for a demo build we had in-house to show our capabilities to a prospective customer. I should mention I’ve had great success—and ease!—with SLOB 2 previously. But that was on Linux. This was on Solaris 10—to mimic the setup the customer has in-house. No problem, I thought; there’s some C files to compile, but then there’s just shell scripts to drive the thing. What could go wrong?
Well, it would seem Kevin Closson, the creator of SLOB and SLOB 2, did his development on an OS with a better sense of humor than Solaris. The package unzipped, and the setup.sh script appeared to run successfully, but runit.sh would load up the worker threads and wait several seconds before launching them—and then immediately call it “done” and bail out, having executed on the database only a couple seconds. Huh? I had my slob.conf set to execute for 300 seconds.
I had two databases created: one with 4K blocks and one with 8K blocks. I had a tablespace created for SLOB data called SLOB4K and SLOB8K, respectively. I ran setup.sh SLOB4K 128, and the log file showed no errors. All good, I thought. Now run runit.sh 12, and it stops as quickly as it starts. Oof.
It took Bryan Wood, a much better shell script debugger (hey, I said DEbugger) than myself, to figure out all the problems.
First, there was this interesting line of output from the runit.sh command:
NOTIFY: Connecting users 1 2 3 Usage: mpstat [-aq] [-p | -P processor_set] [interval [count]] 4 5 6 7 8 9 10
Seems Solaris doesn’t like mpstat –P ALL. However it seems that on Solaris 10 the mpstat command shows all processors even without the -P option.
Next, Solaris doesn’t like Kevin’s “sleep .5” command inside runit.sh; it wants whole numbers only. That raises the question in my mind why he felt the need to check for running processes every half second rather than just letting it wait a full second between checks, but fine. Modify the command in the wait_pids() function to sleep for a full second, and that part is happy.
But it still kicks out immediately and kills the OS level monitoring commands, even though there are active SQL*Plus sessions out there. It seems on Solaris the ps –p command to report status on a list of processes requires the list of process IDs to be escaped where Linux does not. IE:
-bash-3.2$ ps -p 1 2 3 usage: ps [ -aAdeflcjLPyZ ] [ -o format ] [ -t termlist ] [ -u userlist ] [ -U userlist ] [ -G grouplist ] [ -p proclist ] [ -g pgrplist ] [ -s sidlist ] [ -z zonelist ] 'format' is one or more of: user ruser group rgroup uid ruid gid rgid pid ppid pgid sid taskid ctid pri opri pcpu pmem vsz rss osz nice class time etime stime zone zoneid f s c lwp nlwp psr tty addr wchan fname comm args projid project pset
But with quotes:
-bash-3.2$ ps -p "1 2 3" PID TTY TIME CMD 1 ? 0:02 init 2 ? 0:00 pageout 3 ? 25:03 fsflush
After some messing about, Bryan had the great idea to simply replace the command:
while ( ps -p $pids > /dev/null 2>&1 )
With:
while ( ps -p "$pids" > /dev/null 2>&1 )
Just thought I might save someone else some time and hair pulling by sharing this info… Here are the finished file diffs:
-bash-3.2$ diff runit.sh runit.sh.original 31c30 < while ( ps -p "$pids" > /dev/null 2>&1 ) --- > while ( ps -p $pids > /dev/null 2>&1 ) 33c32 < sleep 1 --- > sleep .5 219c218 < ( mpstat 3 > mpstat.out ) & --- > ( mpstat -P ALL 3 > mpstat.out ) &
Every issue you hit was due to the fact that you are not running ported software.
If you want to do SLOB testing of instances on NT, Sol, AIX, HP-UX or whatever else the might be I recommend you use a linux VM in a laptop and set the SQL*Net parameters in slob.conf and away you go.
Good exercise in script debugging though I guess. 🙂
Spoken like a software vendor! I know what you are saying and I agree, but … sometimes you can persuade a customer to run a script when you can’t persuade them to install and configure a hypervisor… (or allow you to connect your own laptop to their network)
Well… if I had an interest in Solaris living beyond it’s shelf-life (expired about 2001 or so) I’d port it 🙂
Jokes aside… There’s no surprise you hit bumps. Small changes… I suppose I can roll them in if Linux is not detected.
The half-second snoop on dead people is important… no time to explain now though.
Thanks for this very helpful blog post. I also found on my Solaris 10 system that the Makefile could not find “cc”, for example:
dwtst1:oracle@ny-dw-db1:~/SLOB/wait_kit
> make all
rm -fr *.o mywait trigger create_sem
cc -c mywait.c
/usr/ucb/cc: language optional software package not installed
*** Error code 1
make: Fatal error: Command failed for target `mywait.o’
dwtst1:oracle@ny-dw-db1:~/SLOB/wait_kit
In many cases, on such Solaris 10 system, “gcc” will be available in /usr/sfw/bin. Therefore, the fix for this will be to do the following.
Check if gcc is available in /usr/sfw/bin
Check if /usr/sfw/bin is in the current $PATH and if not, add it as shown below:
-bash-3.00$ which gcc
no gcc in /usr/bin /usr/ucb/ /usr/local/bin
-bash-3.00$ export PATH=$PATH:/usr/sfw/bin
-bash-3.00$ which gcc
/usr/sfw/bin/gcc
-bash-3.00$ cd SLOB
-bash-3.00$ cd wait_kit
Shown below is the edited Makefile to use “gcc” instead of “cc”. The original version used “cc”. I have edited Makefile to use “gcc” instead as shown below.
-bash-3.00$ more Makefile
CC=gcc
all: clean mywait trigger create_sem install clean_again
mywait: mywait.o
gcc -o mywait mywait.o
trigger: trigger.o
gcc -o trigger trigger.o
create_sem: create_sem.o
gcc -o create_sem create_sem.o
install:
cp mywait trigger create_sem ../
clean:
rm -fr *.o mywait trigger create_sem
clean_again:
rm -fr *.o
-bash-3.00$
After having already made the changes described in the flashdba blog, and then these changes in this comment, “make all” run just fine as shown below.
> make all
rm -fr *.o mywait trigger create_sem
gcc -c mywait.c
gcc -o mywait mywait.o
gcc -c trigger.c
gcc -o trigger trigger.o
gcc -c create_sem.c
gcc -o create_sem create_sem.o
cp mywait trigger create_sem ../
rm -fr *.o
dwtst1:oracle@ny-dw-db1:~/SLOB/wait_kit
>
HTH,
Gilbert Standen
Immensely helpful information here. Big thanks to Gilbert above for helping with enabling gcc use on the wait_kit.
For the issues that Nate worked through above, I ran into those as well, but the “sleep .5” looks like it’s been changed by Kevin, so that wasn’t needed. I also ran into one other ps statement that needed quotes in the same process monitoring loop. My diffs now look like this:
$ diff runit.sh runit.sh.orig
114c114
/dev/null 2>&1 )
—
> while ( ps -p $pids > /dev/null 2>&1 )
122c122
ps -fp $pids
506c506
mpstat.out 2>&1) &
—
> ( mpstat -P ALL 3 > mpstat.out 2>&1) &
I am however still getting a problem with the ps monitoring loop, and also have no awr.txt being generated at the end, although it says otherwise. I can always generate the AWR manually though so it’s not a show stopper.
Thanks for all of the help!