I’ve run into a few customers recently who have had problems with their ASM rebalance operations running too slowly. Surprisingly, there were some simple concepts being overlooked – and once these were understood, the rebalance times were dramatically improved. For that reason, I’m documenting the solutions here… I hope that somebody, somewhere benefits…
1. Don’t Overbalance
Every time you run an ALTER DISKGROUP REBALANCE operation you initiate a large amount of I/O workload as Oracle ASM works to evenly stripe data across all available ASM disks (i.e. LUNs). The most common cause of rebalance operations running slowly that I see (and I’m constantly surprised how much I see this) is to overbalance, i.e. cause ASM to perform more I/O than is necessary.
It almost always goes like this. The customer wants to migrate some data from one set of ASM disks to another, so they first add the new disks:
alter diskgroup data add disk 'ORCL:NEWDATA1','ORCL:NEWDATA2','ORCL:NEWDATA3','ORCL:NEWDATA4', 'ORCL:NEWDATA5','ORCL:NEWDATA6','ORCL:NEWDATA7','ORCL:NEWDATA8' rebalance power 11 wait;
Then they drop the old disks like this:
alter diskgroup data drop disk 'DATA1','DATA2','DATA3','DATA4', 'DATA5','DATA6','DATA7','DATA8' rebalance power 11 wait;
Well guess what? That causes double the amount of I/O that is actually necessary to migrate, because Oracle evenly stripes across all disks and then has to rebalance a second time once the original disks are dropped.
This is how it should be done – in one single operation:
alter diskgroup data add disk 'ORCL:NEWDATA1','ORCL:NEWDATA2','ORCL:NEWDATA3','ORCL:NEWDATA4', 'ORCL:NEWDATA5','ORCL:NEWDATA6','ORCL:NEWDATA7','ORCL:NEWDATA8' drop disk 'DATA1','DATA2','DATA3','DATA4', 'DATA5','DATA6','DATA7','DATA8' rebalance power 11 wait;
A customer of mine tried this earlier this week and reported back that their ASM rebalance time had reduced by a factor of five!
By the way, the WAIT command means the cursor doesn’t return until the command is finished. To have the command essentially run in the background you can simply change this to NOWAIT. Also, you could run the ADD and DROP commands separately if you used a POWER LIMIT of zero for the first command, as this would pause the rebalance and then the second command would kick it off.
2. Power Limit Goes Up To 1024
Simple one this, but easily forgotten. From the early days of ASM, the maximum power limit for rebalance operations was 11. See here if you don’t know why.
From 184.108.40.206, if the COMPATIBLE.ASM disk group attribute is set to 220.127.116.11 or higher the limit is now 1024. That means 11 really isn’t going to cut it anymore. If you are asking for full power, make sure you know what number that is.
3. Avoid The Compact Phase (for Flash Storage Systems)
An ASM rebalance operation comprises three phases, where the third one is the compact phase. This attempts to move data as close as possible to the outer tracks of the disks ASM is using.
Did you spot the issue there? Disks. This I/O-heavy phase is completely pointless on a flash system, where I/O is served evenly from any logical address within a LUN.
You can therefore avoid that potentially-massive I/O hit by disabling the compact phase, using the underscore parameter _DISABLE_REBALANCE_COMPACT=TRUE. Remember that you need to get Oracle Support’s permission before setting underscore parameters! Point your SR in the direction of the following My Oracle Support note:
What is ASM rebalance compact Phase and how it can be disabled (Doc ID 1902001.1)
Unfortunately it appears the parameter was deprecated in 12c, so from now on you have to set the ASM diskgroup attribute “_rebalance_compact” to FALSE (note the opposite value to that set at the instance level!), for example:
ALTER DISKGROUP SET ATTRIBUTE "_rebalance_compact”="FALSE";
If you want to know more about this topic (for example, what the first two rebalance phases are), or indeed anything about ASM in general, I highly recommend the legendary ASM blogger that is Bane Radulovic a.k.a. ASM Support Guy.
An ASM rebalance potentially creates a lot of I/O, which means you may need to wait for a long time before it finishes. For that reason, make sure you understand what you are doing and make every effort to perform only as much I/O as you actually need. Don’t forget you can use the EXPLAIN WORK command to gauge in advance how much work is required.