Using SLOB2 for PIO Testing
Working for Violin Memory means I get to play with a lot of high performance storage. This is a Good Thing. But with so much cool stuff available to mess around with, how do I go about differentiating the various products and versions in my lab? For example, if I get hold of a new Violin Memory MLC flash memory array, how hard can I drive it using Oracle? There are many load generation and benchmarking tools available out there, but I’m not interested in raw I/O performance – I can get that from the datasheet. I want to view the array through the lens of an Oracle database instance.
Even more importantly, I want to test the differences in performance as I change various parts of the equation: operating system, kernel, multipathing parameters, UDEV parameters, ASM configuration… What simple but effective tool can I use for this? Why I’m glad you asked. It’s the Silly Little Oracle Benchmark, also known as SLOB – and it’s currently available in version 2, AKA SLOB2.
I’m not going to explain any more about SLOB here, because that task has already been performed by Kevin. If you don’t already know what SLOB is, I highly recommend testing it out. The README and the download is located via the SLOB2 page on Kevin’s blog. You can also find more useful SLOB information from Yury, Brian and Karl, among others.
The reason I like SLOB is that I can use it to test physical I/O. I can run SLOB with a defined set of workers and then use the resultant AWR report to calculate the number of read/write IOPS and the average latency of those reads/writes. I can also set parameters to modify the percentage of calls which are modify DML and the elapsed time for each test. This allows me to create graphs of the target system’s performance characteristics.
One of the foibles of Oracle is that sometimes it decides it knows better than you about the size of various memory structures. This is true of the buffer cache – and one side effect of this behaviour is that sometimes it’s difficult to make Oracle perform physical I/O instead of logical I/O. For example, if I set my buffer cache to be 12M and then start up my instance I see a buffer cache of size 64M. Why? Well, from what I can make out based on limited testing and blatant ransacking of Paul Till’s work, it’s because Oracle appears to be setting the minimum buffer cache size to be the number of CPUs multiplied by the SGA granule size:
Min Buffer Cache = CPU_COUNT * _ksmg_granule_size
This needs further testing, but it definitely means the number of processor cores on my system is impeding me from pushing physical I/O. Here’s what happens when I call SLOB with 0 writers and 2 readers on the instance with the 64M buffer cache:
Logical reads: 9,773.7 Physical reads: 6,327.4 Buffer Hit %: 35.3
Now let’s drop the CPU_COUNT parameter down to 1 so that my buffer cache is actually confirming to the 12M setting in the parameter file:
Logical reads: 6,900.1 Physical reads: 6,547.0 Buffer Hit %: 5.1
I’m now driving much more physical I/O – the buffer cache hit ratio is down to 5%.
I could mess around with this more – and as I increase the number of processes it becomes less of a problem, but I’m quite happy sticking with CPU_COUNT set to 1 for now. The most important requirements I have are lots of physical I/O and a consistent configuration across all tests.
This is the Oracle parameter file that I like to use when performing SLOB tests with the SLOB Harness:
*._db_block_prefetch_limit=0 *._db_block_prefetch_quota=0 *._db_file_noncontig_mblock_read_count=0 *._disk_sector_size_override=TRUE #*.control_files='+DATA/orcl/controlfile/blahblah','+RECO/orcl/controlfile/blahblah' *.cpu_count=1 *.db_block_size=8192 *.db_cache_size=10M *.db_create_file_dest='+DATA' *.db_name='orcl' *.db_files=2000 *.db_recovery_file_dest='+RECO' *.db_recovery_file_dest_size=200G *.diagnostic_dest='/u01/app/oracle' *.filesystemio_options='setall' *.log_buffer=134217728 *.parallel_max_servers=0 *.pga_aggregate_target=1G *.processes=500 *.recyclebin=OFF *.resource_manager_plan='' *.remote_login_passwordfile='EXCLUSIVE' *.shared_pool_size=1G *.undo_tablespace='UNDOTBS1'
Obviously the control_file parameter needs to be updated prior to deployment…as well as any other obvious changes like the db_name and ASM diskgroup names. If you are curious about the underscore parameters then you need to read this blog post by Yury.
Now that I have SLOB configured ready to go, I want to run a series of SLOB tests and record the results. What I want to see is the affect of running consecutive tests but with the number of readers or writers steadily increasing from 1 up to 256. I will perform two complete sets of tests: read only and read / write mixed workload. Note that for the mixed workload I will keep the number of readers set to zero, because each write operation will also cause a read as the target block is read from storage into the buffer cache prior to being updated. Thus calling SLOB with zero readers and n writers is (approximately) creating a 50:50 read to write mix.
For this purpose I created a “SLOB Harness” script to run call SLOB using parameters fed from a configuration file. You can find both the shell script and a sample configuration file here:
I should point out that this script is hardly the best piece of coding I’ve ever created. It was something that developed organically without me ever finding the time to go back and clean it up to make it presentable. I originally intended to spend more time on it but realised that this might never happen. In the meantime, someone who shall remain nameless applied some gentle pressure to get it out into the community. So here it is!
What To Do
This is the process I follow:
- Build database “orcl” using DBCA or tool of preference. Alternatively use SLOB’s create_database_kit (in the misc directory)
- Ensure the redo logs are large enough (I use 5 log groups with members 1GB each. I also use a 4k blocksize for redo because my storage is 4k-based)
- Create a tablespace called IOPS size 15GB (capacity is not an issue when you work for a storage vendor!)
- Extract SLOB into ~oracle/SLOB, change directory to ./wait_kit and execute: make all
- Install SLOB into database using: ./setup.sh IOPS 128
- Disable the scheduler and resource manager so they don’t interfere (see note 786346.1)
- Edit slob.ora to use correct control_files parameter
- Restart database using pfile=slob.ora
- Add the slob2-harness.sh and slob2-harness.inf files to the SLOB directory and make the .sh file executable. Examine the .inf file to ensure you are happy with the test criteria
- Run slob harness: nohup ./slob2-harness.sh > slob2-harness.out 2>&1 &
- Wait a long time (many hours) until job completes. Tail the output file to see the percentage completion (if you use the default slob.conf settings each test will take 5 minutes): tail -100f slob2-harness.out
- Create CSV file using slob2-analyze.sh (see next section)
Once the harness has completed you will be left with a large number of AWR reports in the form of text files. What do you want to do with them? In my case, I want to parse them all to read the number of IOPS (the “Physical reads” and “Physical writes” values). I then want to calculate the latency. Unfortunately, in AWR reports Oracle only calculates the average wait time in milliseconds – and for flash storage this often means a value of zero. So instead, I need to manually calculate the average latency by dividing the values for total time waited by the number of waits:
Top 5 Timed Foreground Events ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Avg wait % DB Event Waits Time(s) (ms) time Wait Class ------------------------------ ------------ ----------- ------ ------ ---------- db file sequential read 119,835,425 50,938 0 84.0 User I/O
So here, for example, I want to calculate the random read latency (hence I am looking at the wait event for db file sequential read. The AWR report shows an average wait of 0ms (thanks Oracle), but if I divide the total time (shown in blue) by the number of waits (shown in red) I get a value of 0.000425 seconds, which (multiplying by one million) translates into 425 microseconds.
I’m also interested in the number of IOPS and average latency for writes from the database buffer (db file parallel write) as well as redo writes (log file parallel write). Calculating these values by hand for each file is a pain, so again I wrote a shell script to spit the values out for me. In the spirit of sharing I’ll also post this here:
This is a fairly straightforward script which parses AWR files using sed and grep to get the required values. It’s not elegant – and to be honest I have a much better-written version I now use which I cannot share (as it is Violin Memory intellectual property) – but it does the job. The output is a CSV file which can be loaded into spreadsheet software such as Microsoft Excel.
Once you have the date in a spreadsheet you can do what you want with it – I like to graph latency versus IOPS and plot lines for various combinations of operating systems, kernels etc to see what gives the best performance. Here are some examples to get you in the mood:
Read Only Workload