Oracle ASM and Thin Provisioning – How To Reclaim Space

It came to my attention last November that I had crossed the one year anniversary since my last post on flashdba.com. I was so surprised that I immediately decided to write a new post, which took another three months. There are reasons why I’m no longer posting technical blogs about databases and flash, but I’ll cover them in a later post. No, not that late – I hope.

In the meantime, I thought I’d write a note on this subject because I’ve lost count of the number of times I’ve been asked questions on the topic of Oracle ASM and Thin Provisioning. Normally, I’m asked by customers or prospects who think there is an issue with their storage system… whereas, in fact, the problem is entirely storage-agnostic.

But first, some background.

Thin Provisioning

Thin Provisioning (TP) is used to describe the overcommitment of storage capacity. Your host may think it’s been allocated 10TB of capacity and is currently using 2TB, but the storage platform has only really allocated the 2TB used and the remaining 8TB may not even exist. Why would you want this? Because in a multi-host environment (where hosts could be virtual or physical), the amount of allocated-but-unused capacity could be significant. Without TP, serious amounts of capacity would need to be provisioned which may never be used, but with TP all the hosts can be “fooled” into thinking they have been allocated what they want while the actual utilised capacity is only the sum of what each host has used.

Where things can get a bit complicated with TP is that many layers in your stack may be thin provisioning storage to the layers above them. Most storage arrays are capable of TP (or indeed mandate its use), but hypervisors often have thin provisioning options too. Meanwhile, some applications which create data store structures have options which can help or hinder the use of TP. For example, VMware has the ability to create virtual disks which are thin, thick (lazy zeroed) or thick (eager zeroed). As a result, it isn’t always obvious to the underlying storage whether a particular set of allocated blocks are really in use or not. Won’t somebody think of the poor storage array?

Trim and Unmap

Consider the situation where a large file is created and then deleted in a filesystem on a typical operating system. Commonly, the deletion process doesn’t really delete anything other than the metadata telling the filesystem where the file resided. Thus the underlying file data remains until such time as something else comes along and overwrites it. This is beneficial because it is faster and requires less work than trying to overwrite the file with (for example) zeros. But if the filesystem resides on a storage array which uses TP, how will the storage array know that the space allocated to the file is now free? It can’t – unless the filesystem has a way of telling it.

For this purpose there exists a set of OS calls known as trim commands – and for the SCSI protocol (used by most block storage devices such as SANs) the command is known as UNMAP. Issuing one of these commands allows the calling layer (the filesystem, or perhaps a volume manager) to notify the storage platform that a specific set of blocks are no longer in use and can be “unmapped”, freeing space. As a side note, large calls to UNMAP can often have temporary but unexpected consequences on storage performance, as large amounts of metadata may need to be updated.

Oracle ASM: Unmap is for Wimps

Let’s get straight to the point here: Oracle’s Automatic Storage Manager doesn’t natively use UNMAP commands. Quelle surprise. But there are still ways to free up space back to thin provisioned arrays. Two in fact: let’s call them the bad way and the good way. First though, let’s set up the scenario:

Test Scenario

Consider the situation where an Oracle ASM diskgroup is created on a 10TB volume group presented from a thin provisioning All-Flash storage array. The DBA then creates a large “bigfile” tablespace in the diskgroup, with a 5TB datafile (the rest of the database resides elsewhere). Anyone who has sat waiting for the CREATE TABLESPACE command for any period of time will be aware that, during the datafile creation process, Oracle likes to fill the whole file with empty blocks. From Oracle’s perspective, this has the advantage of ensuring that the entire datafile capacity has been marked as used by the storage array. In other words, it’s not “fake” thin provisioned space which may or may not be available, but real available capacity which now belongs to Oracle. (You may also recall that Oracle no longer takes this approach with tempfiles, instead using the faster “sparse” allocation method.)

At this point, what will the volumes on the storage array will be showing? We know that 10TB has been allocated, of which 5TB has been used. So shouldn’t that leave 5TB free? Probably not, because almost every All-Flash storage array uses data reduction technologies such as compression, deduplication and zero-detect. Since each block in the tablespace contains a unique block number, deduplication isn’t going to add any value (which is why arrays like the Kaminario allow dedupe to be disabled on a per-volume basis), but compression is going to have great fun with all the emptiness inside each Oracle block so the storage array will probably show significantly less than 5TB used.

Next, our enterprising DBA watches a Connor McDonald video about DBMS_RANDOM and gets a little overexcited, then fills the entire tablespace with random data to the point that the storage array can hardly achieve any compression. The outcome? Allocated = 10TB, Used = 5TB, Free = 5TB.

Finally, after watching a video of Larry Ellison explaining that the Oracle Autonomous Database needs “no human intervention” and thus fearing for his job, the DBA deletes the tablespace and goes home. Back to 10TB free? No.
The tablespace deletion command does a number of things, including notifying Oracle ASM that the file’s allocation units are no longer in use and removing the datafile from the database’s controlfile. But at no point does anybody bother to tell the storage array that the used space is now free, so the array’s capacity statistics remain: Allocated = 10TB, Used = 5TB, Free = 5TB.

ASRU: The Bad Way

ASRU is Oracle’s ASM Reclamation Utility, a PERL script developed in conjunction with 3PAR (a storage array now owned by HPE) and designed to free up space from scenarios such as the one above. It is, in my personal opinion, a terrible botched solution which was created to serve a purpose which no longer exists – although, interestingly, many storage vendors still seem to recommend it by default (for example, Pure Storage still describe it as the only solution for reclaiming unused space with Oracle ASM).
ASRU doesn’t issue UNMAP commands. Instead, it takes advantage of the fact that most modern storage platforms (including 3PAR, Pure Storage and Kaminario) treat blocks full of zeros as free space (a feature known as zero detect). Thus what ASRU does – when manually run by a DBA (presumably during a change window in the middle of the night while rubbing a lucky rabbit’s foot and praying to the gods of all major religions) – is compact the remaining data in any diskgroup toward the start of the volume and then write zeros above the high watermark where this compacted data ends.
In our example above, this should return the capacity statistics to approximately: Allocated = 10TB, Used = 0TB, Free = 10TB. However, because zero detect is often considered to be a type of data reduction, some arrays then show horribly-skewed data reduction ratios as a result of ASRU.
Don’t get me wrong, many people have successfully used ASRU – and in some situations it may be your only choice. But there is another way…

ASM Filter Driver: The Good Way

Since Oracle Database version 12.1.0.2, the option has been available to install a piece of software called ASMFD, the ASM Filter Driver. ASMFD is a kernel module which resides in the I/O path of Oracle ASM disks – and is the natural successor to the Linux-only ASMLib kernel driver. Unlike ASMLib, or indeed native ASM, the ASMFD module contains support for SCSI UNMAP commands, which really is the missing piece of the jigsaw. Providing you use ASMFD, the deletion of files from within ASM will result in the storage array being notified as allocation units are freed up, resulting in the correct recalculation of Free and Used Capacity statistics – and without the unnecessary hack of writing zeros all over the place. It really is a no brainer.
Unless, of course, you’ve already installed your database and ASM and are now looking for some way to return freed capacity. In which case, installing ASMFD on an existing system may seem even more challenging than running ASRU. But you know what they say: it’s better to do it right first time than to be constantly forced into bodging it with PERL scripts.

TL;DR

If you want Oracle ASM to correctly free space back to your thin provisioned storage array, you need to choose between the correct method of using ASM Filter Driver or the botched method of running the ASRU reclamation tool, which comes in the form of a PERL script. Either way, it’s nothing to do with the storage platform, so don’t blame the storage guy…

Advertisements

Oracle’s ASM Filter Driver Revisited

filter

Almost exactly a year ago I published a post covering my first impressions of the ASM Filter Driver (ASMFD) released in Oracle 12.1.0.2, followed swiftly by a second post showing that it didn’t work with 4k native devices.

When I wrote that first post I was about to start my summer holidays, so I’m afraid to admit that I was a little sloppy and made some false assumptions toward the end – assumptions which were quickly overturned by eagle-eyed readers in the comments section. So I need to revisit that at some point in this post.

But first, some background.

Some Background

If you don’t know what ASMFD is, let me just quote from the 12.1 documentation:

Oracle ASM Filter Driver (Oracle ASMFD) is a kernel module that resides in the I/O path of the Oracle ASM disks. Oracle ASM uses the filter driver to validate write I/O requests to Oracle ASM disks.

The Oracle ASMFD simplifies the configuration and management of disk devices by eliminating the need to rebind disk devices used with Oracle ASM each time the system is restarted.

The Oracle ASM Filter Driver rejects any I/O requests that are invalid. This action eliminates accidental overwrites of Oracle ASM disks that would cause corruption in the disks and files within the disk group. For example, the Oracle ASM Filter Driver filters out all non-Oracle I/Os which could cause accidental overwrites.

This is interesting, because ASMFD is considered a replacement for Oracle ASMLib, yet the documentation for ASMFD doesn’t make all of the same claims that Oracle makes for ASMLib. Both ASMFD and ASMLib claim to simplify the configuration and management of disk devices, but ASMLib’s documentation also claims that it “greatly reduces kernel resource usage“. Doesn’t ASMFD have this effect too? What is definitely a new feature for ASMFD is the ability to reject invalid (i.e. non-Oracle) I/O operations to ASMFD devices – and that’s what I got wrong last time.

However, before we can revisit that, I need to install ASMFD on a brand new system.

Installing ASMFD

Last time I tried this I made the mistake of installing 12.1.0.2.0 with no patch set updates. Thanks to a reader called terry, I now know that the PSU is a very good idea, so this time I’m using 12.1.0.2.3. First let’s do some preparation.

Preparing To Install

I’m using an Oracle Linux 6 Update 5 system running the Oracle Unbreakable Enterprise Kernel v3:

[root@server4 ~]# cat /etc/oracle-release
Oracle Linux Server release 6.5
[root@server4 ~]# uname -r
3.8.13-26.2.3.el6uek.x86_64

As usual I have taken all of the necessary pre-installation steps to make the Oracle Universal Installer happy. I have disabled selinux and iptables, plus I’ve configured device mapper multipathing. I have two sets of 8 LUNs from my Violin storage: 8 using 512e emulation mode (512 byte logical block size but 4k physical block size) and 8 using 4kN native mode (4k logical and physical block size). If you have any doubts about what that means, read here.

[root@server4 ~]# ls -l /dev/mapper
total 0
crw-rw---- 1 root root 10, 236 Jul 20 16:52 control
lrwxrwxrwx 1 root root       7 Jul 20 16:53 mpatha -> ../dm-0
lrwxrwxrwx 1 root root       7 Jul 20 16:53 mpathap1 -> ../dm-1
lrwxrwxrwx 1 root root       7 Jul 20 16:53 mpathap2 -> ../dm-2
lrwxrwxrwx 1 root root       7 Jul 20 16:53 mpathap3 -> ../dm-3
lrwxrwxrwx 1 root root       7 Jul 20 16:53 v4kdata1 -> ../dm-6
lrwxrwxrwx 1 root root       7 Jul 20 16:53 v4kdata2 -> ../dm-7
lrwxrwxrwx 1 root root       7 Jul 20 16:53 v4kdata3 -> ../dm-8
lrwxrwxrwx 1 root root       7 Jul 20 16:53 v4kdata4 -> ../dm-9
lrwxrwxrwx 1 root root       8 Jul 20 16:53 v4kdata5 -> ../dm-10
lrwxrwxrwx 1 root root       8 Jul 20 16:53 v4kdata6 -> ../dm-11
lrwxrwxrwx 1 root root       8 Jul 20 16:53 v4kdata7 -> ../dm-12
lrwxrwxrwx 1 root root       8 Jul 20 16:53 v4kdata8 -> ../dm-13
lrwxrwxrwx 1 root root       8 Jul 20 17:00 v512data1 -> ../dm-14
lrwxrwxrwx 1 root root       8 Jul 20 17:00 v512data2 -> ../dm-15
lrwxrwxrwx 1 root root       8 Jul 20 17:00 v512data3 -> ../dm-16
lrwxrwxrwx 1 root root       8 Jul 20 17:00 v512data4 -> ../dm-17
lrwxrwxrwx 1 root root       8 Jul 20 17:00 v512data5 -> ../dm-18
lrwxrwxrwx 1 root root       8 Jul 20 17:00 v512data6 -> ../dm-19
lrwxrwxrwx 1 root root       8 Jul 20 17:00 v512data7 -> ../dm-20
lrwxrwxrwx 1 root root       8 Jul 20 17:00 v512data8 -> ../dm-21
lrwxrwxrwx 1 root root       8 Jul 20 16:53 vg_halfserver4-lv_home -> ../dm-22
lrwxrwxrwx 1 root root       7 Jul 20 16:53 vg_halfserver4-lv_root -> ../dm-4
lrwxrwxrwx 1 root root       7 Jul 20 16:53 vg_halfserver4-lv_swap -> ../dm-5

The 512e devices are shown in green and the 4k devices shown in red. The other devices here can be ignored as they are related to the default filesystem layout of the operating system.

Installing Oracle 12.1.0.2.3 Grid Infrastructure (software only)

This is where the first challenge comes. When you perform a standard install of Oracle 12c Grid Infrastructure you are asked for storage devices on which you can locate items such as the ASM SPFILE, OCR and voting disks. In the old days of using ASMLib you would have prepared these in advance, because ASMLib is a separate kernel module located outside of the Oracle GI home. But ASMFD is part of the Oracle Home and so doesn’t exist prior to installation. Thus we have a chicken and egg situation.

Even worse, I know from bitter experience that I need to install some patches prior to labelling my disks, but I can’t install patches without installing the Oracle home either.

So the only thing for it is to perform a Software Only installation from the Oracle Universal Installer, then apply the PSU, then create an ASM instance and finally label the LUNs with ASMFD. It’s all very long winded. It wouldn’t be a problem if I was migrating from an existing ASMLib setup, but this is a clean install. Such is the price of progress.

To save this post from becoming longer and more unreadable than a 12c AWR report, I’ve captured the entire installation and configuration of 12.1.0.2.3 GI and ASM on a separate installation cookbook page, here:

Installing 12.1.2.0.3 Grid Infrastructure with Oracle Linux 6 Update 5

It’s simpler that way. If you don’t want to go and read it, just take it from me that we now have a working ASM instance which currently has no devices under its control. The PSU has been applied so we are ready to start labelling.

Using ASM Filter Driver to Label Devices

The next step is to start labelling my LUNs with ASMFD. I’m using what the documentation describes as an “Oracle Grid Infrastructure Standalone (Oracle Restart) Environment”, so I’m following this set of steps in the documentation.

Step one tells me to run a dsget command and then a dsset command to add a diskstring of ‘AFD:*’. Ok:

[oracle@server4 ~]$ asmcmd dsget
parameter:
profile:++no-value-at-resource-creation--never-updated-through-ASM++
[oracle@server4 ~]$ asmcmd dsset 'AFD:*'
[oracle@server4 ~]$ asmcmd dsget
parameter:AFD:*
profile:AFD:*

Next I need to stop CRS (I’m using a standalone config so actually it’s HAS):

[root@server4 ~]# crsctl stop has
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'server4'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'server4'
CRS-2673: Attempting to stop 'ora.asm' on 'server4'
CRS-2673: Attempting to stop 'ora.evmd' on 'server4'
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'server4' succeeded
CRS-2677: Stop of 'ora.evmd' on 'server4' succeeded
CRS-2677: Stop of 'ora.asm' on 'server4' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'server4'
CRS-2677: Stop of 'ora.cssd' on 'server4' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'server4' has completed
CRS-4133: Oracle High Availability Services has been stopped.

And then I need to run the afd_configure command (all as the root user). Before and after doing so I will check for any loaded kernel modules with oracle in the name, so see what changes:

[root@server4 ~]# lsmod | grep oracle
oracleacfs           3308260  0
oracleadvm            508030  0
oracleoks             506741  2 oracleacfs,oracleadvm
[root@server4 ~]# asmcmd afd_configure
Connected to an idle instance.
AFD-627: AFD distribution files found.
AFD-636: Installing requested AFD software.
AFD-637: Loading installed AFD drivers.
AFD-9321: Creating udev for AFD.
AFD-9323: Creating module dependencies - this may take some time.
AFD-9154: Loading 'oracleafd.ko' driver.
AFD-649: Verifying AFD devices.
AFD-9156: Detecting control device '/dev/oracleafd/admin'.
AFD-638: AFD installation correctness verified.
Modifying resource dependencies - this may take some time.
[root@server4 ~]# lsmod | grep oracle
oracleafd             211540  0
oracleacfs           3308260  0
oracleadvm            508030  0
oracleoks             506741  2 oracleacfs,oracleadvm
[root@server4 ~]# asmcmd afd_state
Connected to an idle instance.
ASMCMD-9526: The AFD state is 'LOADED' and filtering is 'ENABLED' on host 'server4'

Notice the new kernel module called oracleafd. Also, AFD is showing that “filtering is enabled” – I guess this relates to the protection against invalid writes.

Time to start up HAS or CRS again:

[root@server4 ~]# crsctl start has
CRS-4123: Oracle High Availability Services has been started.

Ok, let’s start labelling those devices.

Labelling (Incorrectly)

Now remember that I am testing with two sets of devices here: 512e and 4k. The 512e devices are emulating a 512 byte blocksize, so they should result in ASM creating diskgroups with a blocksize of 512 bytes – thus avoiding all the tedious bugs from which Oracle suffers when using 4096 byte diskgroups.

So let’s just test a 512e LUN to see what happens when I label it and present it to ASM. First, the label is created using the afd_label command:

[oracle@server4 ~]$ ls -l /dev/mapper/v512data1
lrwxrwxrwx 1 root root 8 Jul 24 10:30 /dev/mapper/v512data1 -> ../dm-14
[oracle@server4 ~]$ ls -l /dev/dm-14
brw-rw---- 1 oracle dba 252, 14 Jul 24 10:30 /dev/dm-14
[oracle@server4 ~]$ asmcmd afd_label v512data1 /dev/mapper/v512data1
[oracle@server4 ~]$ asmcmd afd_lsdsk
--------------------------------------------------------------------------------
Label                     Filtering   Path
================================================================================
V512DATA1                   ENABLED   /dev/sdpz

Well, it worked.. sort of. The path we can see in the lsdsk output does not show the /dev/mapper/v512data1 multipath device I specified… instead it’s one of the non-multipath /dev/sd* devices. Why?

Even worse, look what happens when I check the SECTOR_SIZE column of the v$asm_disk view in ASM:

SQL> select group_number, name, sector_size, block_size, state
  2  from v$asm_diskgroup;

GROUP_NUMBER NAME	SECTOR_SIZE BLOCK_SIZE STATE
------------ ---------- ----------- ---------- -----------
	   1 V512DATA	       4096	  4096 MOUNTED

Even though my LUNs are presented as 512e, ASM has chosen to see them as 4096 byte. That’s not wanted I want. Gaah!

Labelling (Correctly)

To fix this I need to unlabel that LUN so that AFD has nothing under its control, then update the oracleafd_use_logical_block_size parameter via the special SYSFS files /sys/modules/oracleafd:

[root@server4 ~]# cd /sys/module/oracleafd
[root@server4 oracleafd]# ls -l
total 0
-r--r--r-- 1 root root 4096 Jul 20 14:43 coresize
drwxr-xr-x 2 root root    0 Jul 20 14:43 holders
-r--r--r-- 1 root root 4096 Jul 20 14:43 initsize
-r--r--r-- 1 root root 4096 Jul 20 14:43 initstate
drwxr-xr-x 2 root root    0 Jul 20 14:43 notes
drwxr-xr-x 2 root root    0 Jul 20 14:43 parameters
-r--r--r-- 1 root root 4096 Jul 20 14:43 refcnt
drwxr-xr-x 2 root root    0 Jul 20 14:43 sections
-r--r--r-- 1 root root 4096 Jul 20 14:43 srcversion
-r--r--r-- 1 root root 4096 Jul 20 14:43 taint
--w------- 1 root root 4096 Jul 20 14:43 uevent
[root@server4 oracleafd]# cd parameters
[root@server4 parameters]# ls -l
total 0
-rw-r--r-- 1 root root 4096 Jul 20 14:43 oracleafd_use_logical_block_size
[root@server4 parameters]# cat oracleafd_use_logical_block_size
0
[root@server4 parameters]# echo 1 > oracleafd_use_logical_block_size
[root@server4 parameters]# cat oracleafd_use_logical_block_size
1

After making this change, AFD will present the logical blocksize of 512 bytes to ASM rather than the physical blocksize of 4096 bytes. So let’s now label those disks again:

[root@server4 mapper]# for lun in 1 2 3 4 5 6 7 8; do
> asmcmd afd_label v512data$lun /dev/mapper/v512data$lun
> done
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
[root@server4 mapper]# asmcmd afd_lsdsk
Connected to an idle instance.
--------------------------------------------------------------------------------
Label                     Filtering   Path
================================================================================
V512DATA1                   ENABLED   /dev/mapper/v512data1
V512DATA2                   ENABLED   /dev/mapper/v512data2
V512DATA3                   ENABLED   /dev/mapper/v512data3
V512DATA4                   ENABLED   /dev/mapper/v512data4
V512DATA5                   ENABLED   /dev/mapper/v512data5
V512DATA6                   ENABLED   /dev/mapper/v512data6
V512DATA7                   ENABLED   /dev/mapper/v512data7
V512DATA8                   ENABLED   /dev/mapper/v512data8

Note the correct multipath devices (“/dev/mapper/*”) are now being shown in the lsdsk command output. If I now create an ASM diskgroup on these LUNs, it will have a 512 byte sector size:

SQL> get afd.sql
  1  CREATE DISKGROUP V512DATA EXTERNAL REDUNDANCY
  2  DISK 'AFD:V512DATA1', 'AFD:V512DATA2',
  3	  'AFD:V512DATA3', 'AFD:V512DATA4',
  4	  'AFD:V512DATA5', 'AFD:V512DATA6',
  5	  'AFD:V512DATA7', 'AFD:V512DATA8'
  6  ATTRIBUTE
  7	  'compatible.asm' = '12.1',
  8*	  'compatible.rdbms' = '12.1'
SQL> /

Diskgroup created.

SQL> select disk_number, mount_status, header_status, state, sector_size, path
  2  from v$asm_disk;

DISK_NUMBER MOUNT_S HEADER_STATU STATE	  SECTOR_SIZE PATH
----------- ------- ------------ -------- ----------- --------------------
	  0 CACHED  MEMBER	 NORMAL 	  512 AFD:V512DATA1
	  1 CACHED  MEMBER	 NORMAL 	  512 AFD:V512DATA2
	  2 CACHED  MEMBER	 NORMAL 	  512 AFD:V512DATA3
	  3 CACHED  MEMBER	 NORMAL 	  512 AFD:V512DATA4
	  4 CACHED  MEMBER	 NORMAL 	  512 AFD:V512DATA5
	  5 CACHED  MEMBER	 NORMAL 	  512 AFD:V512DATA6
	  6 CACHED  MEMBER	 NORMAL 	  512 AFD:V512DATA7
	  7 CACHED  MEMBER	 NORMAL 	  512 AFD:V512DATA8

8 rows selected.

SQL> select group_number, name, sector_size, block_size, state
  2  from v$asm_diskgroup;

GROUP_NUMBER NAME	SECTOR_SIZE BLOCK_SIZE STATE
------------ ---------- ----------- ---------- -----------
	   1 V512DATA		512	  4096 MOUNTED

Phew.

Failing To Label 4kN Devices

So what about my 4k native mode devices, the ones with a 4096 byte logical block size? What happens if I try to label them?

[root@server4 ~]# asmcmd afd_label V4KDATA1 /dev/mapper/v4kdata1
Connected to an idle instance.
ASMCMD-9513: ASM disk label set operation failed.

Yeah, that didn’t work out did it? Let’s look in the trace file:

[root@server4 ~]# tail -5 /u01/app/oracle/log/diag/asmcmd/user_root/server4/alert/alert.log
24-Jul-15 12:38 ASMCMD (PID = 8695) Given command - afd_label V4KDATA1 '/dev/mapper/v4kdata1'
24-Jul-15 12:38 NOTE: Verifying AFD driver state : loaded
24-Jul-15 12:38 NOTE: afdtool -add '/dev/mapper/v4kdata1' 'V4KDATA1'
24-Jul-15 12:38 NOTE:
24-Jul-15 12:38 ASMCMD-9513: ASM disk label set operation failed.

I’ve struggled to find any more meaningful message, even when I manually run the afdtool command shown in the log – but it seems pretty likely that this is failing due to the device being 4kN. I therefore assume that AFD still isn’t 4kN ready. I do wish Oracle would make some meaningful progress on its support of 4kN devices…

I/O Filter Protection

So now let’s investigate this protection that ASMFD claims to have against non-Oracle I/Os. First of all, what do those files in /dev/oracleafd/disks actually contain?

[root@server4 ~]# cd /dev/oracleafd/disks
[root@server4 disks]# ls -l
total 32
-rw-r--r-- 1 root root 22 Jul 24 12:34 V512DATA1
-rw-r--r-- 1 root root 22 Jul 24 12:34 V512DATA2
-rw-r--r-- 1 root root 22 Jul 24 12:34 V512DATA3
-rw-r--r-- 1 root root 22 Jul 24 12:34 V512DATA4
-rw-r--r-- 1 root root 22 Jul 24 12:34 V512DATA5
-rw-r--r-- 1 root root 22 Jul 24 12:34 V512DATA6
-rw-r--r-- 1 root root 22 Jul 24 12:34 V512DATA7
-rw-r--r-- 1 root root 22 Jul 24 12:34 V512DATA8
[root@server4 disks]# cat V512DATA1
/dev/mapper/v512data1

Aha. This is what I got wrong in my original post last year, because – keen as I was to start my summer vacation – I didn’t spot that these files are simply pointers to the relevant multipath device in /dev/mapper. So let’s follow the pointers this time.

Let’s remind ourselves that the files in /dev/mapper are actually symbolic links to /dev/dm-* devices:

root@server4 disks]# ls -l /dev/mapper/v512data1
lrwxrwxrwx 1 root root 8 Jul 24 12:34 /dev/mapper/v512data1 -> ../dm-14
[root@server4 disks]# ls -l /dev/dm-14
brw-rw---- 1 oracle dba 252, 14 Jul 24 12:34 /dev/dm-14

So it’s these /dev/dm-* devices that are at the end of the trail we just followed. If we dump the first 64 bytes of this /dev/dm-14 device, we should be able to see the AFD label:

[root@server4 disks]# od -c -N 64 /dev/dm-14
0000000                           (   o   u   t
0000020                                
0000040   O   R   C   L   D   I   S   K   V   5   1   2   D   A   T   A
0000060   1

There it is. We can also read it with kfed to see what ASM thinks of it:

[root@server4 ~]# kfed read /dev/dm-14
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                  1953853224 ; 0x00c: 0x74756f28
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
000000000 00000000 00000000 00000000 74756F28  [............(out]
000000010 00000000 00000000 00000000 00000000  [................]
000000020 4C43524F 4B534944 32313556 41544144  [ORCLDISKV512DATA]
000000030 00000031 00000000 00000000 00000000  [1...............]
000000040 00000000 00000000 00000000 00000000  [................]
  Repeat 251 times

So what happens if I overwrite it, as the root user, with some zeros? And maybe some text too just for good luck?

root@server4 ~]# dd if=/dev/zero of=/dev/dm-14 bs=4k count=1024
1024+0 records in
1024+0 records out
4194304 bytes (4.2 MB) copied, 0.00570833 s, 735 MB/s
[root@server4 ~]# echo CORRUPTION > /dev/dm-14
[root@server4 ~]# od -c -N 64 /dev/dm-14
0000000   C   O   R   R   U   P   T   I   O   N  \n          
0000020

It looks like it’s changed. I also see that if I dump it from another session which opens a fresh file descriptor. Yet in the /var/log/messages file there is now a new entry:

F 4626129.736/150724115533 flush-252:14[1807]  afd_mkrequest_fn: write IO on ASM managed device (major=252/minor=14)  not supported i=0 start=0 seccnt=8  pstart=0  pend=41943040
Jul 24 12:55:33 server4 kernel: quiet_error: 1015 callbacks suppressed
Jul 24 12:55:33 server4 kernel: Buffer I/O error on device dm-14, logical block 0
Jul 24 12:55:33 server4 kernel: lost page write due to I/O error on dm-14

Hmm. It seems like ASMFD has intervened to stop the write, yet when I query the device I see the “new” data. Where’s the old data gone? Well, let’s use kfed again:

[root@server4 ~]# kfed read /dev/dm-14
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                  1953853224 ; 0x00c: 0x74756f28
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
000000000 00000000 00000000 00000000 74756F28  [............(out]
000000010 00000000 00000000 00000000 00000000  [................]
000000020 4C43524F 4B534944 32313556 41544144  [ORCLDISKV512DATA]
000000030 00000031 00000000 00000000 00000000  [1...............]
000000040 00000000 00000000 00000000 00000000  [................]
  Repeat 251 times

The label is still there! Magic.

I have to confess, I don’t really know how ASM does this. Indeed, I struggled to get the system back to a point where I could manually see the label using the od command. In the end, the only way I managed it was to reboot the server – yet ASM works fine all along and the diskgroup was never affected:

SQL> alter diskgroup V512DATA check all;
Mon Jul 20 16:46:23 2015
NOTE: starting check of diskgroup V512DATA
Mon Jul 20 16:46:23 2015
GMON querying group 1 at 5 for pid 7, osid 9255
GMON checking disk 0 for group 1 at 6 for pid 7, osid 9255
GMON querying group 1 at 7 for pid 7, osid 9255
GMON checking disk 1 for group 1 at 8 for pid 7, osid 9255
GMON querying group 1 at 9 for pid 7, osid 9255
GMON checking disk 2 for group 1 at 10 for pid 7, osid 9255
GMON querying group 1 at 11 for pid 7, osid 9255
GMON checking disk 3 for group 1 at 12 for pid 7, osid 9255
GMON querying group 1 at 13 for pid 7, osid 9255
GMON checking disk 4 for group 1 at 14 for pid 7, osid 9255
GMON querying group 1 at 15 for pid 7, osid 9255
GMON checking disk 5 for group 1 at 16 for pid 7, osid 9255
GMON querying group 1 at 17 for pid 7, osid 9255
GMON checking disk 6 for group 1 at 18 for pid 7, osid 9255
GMON querying group 1 at 19 for pid 7, osid 9255
GMON checking disk 7 for group 1 at 20 for pid 7, osid 9255
Mon Jul 20 16:46:23 2015
SUCCESS: check of diskgroup V512DATA found no errors
Mon Jul 20 16:46:23 2015
SUCCESS: alter diskgroup V512DATA check all

So there you go. ASMFD: it does what it says on the tin. Just don’t try using it with 4kN devices…

ASM Rebalance Too Slow? 3 Tips To Improve Rebalance Times

see-saw

I’ve run into a few customers recently who have had problems with their ASM rebalance operations running too slowly. Surprisingly, there were some simple concepts being overlooked – and once these were understood, the rebalance times were dramatically improved. For that reason, I’m documenting the solutions here… I hope that somebody, somewhere benefits…

1. Don’t Overbalance

Every time you run an ALTER DISKGROUP REBALANCE operation you initiate a large amount of I/O workload as Oracle ASM works to evenly stripe data across all available ASM disks (i.e. LUNs). The most common cause of rebalance operations running slowly that I see (and I’m constantly surprised how much I see this) is to overbalance, i.e. cause ASM to perform more I/O than is necessary.

It almost always goes like this. The customer wants to migrate some data from one set of ASM disks to another, so they first add the new disks:

alter diskgroup data
add disk  'ORCL:NEWDATA1','ORCL:NEWDATA2','ORCL:NEWDATA3','ORCL:NEWDATA4',
          'ORCL:NEWDATA5','ORCL:NEWDATA6','ORCL:NEWDATA7','ORCL:NEWDATA8'
rebalance power 11 wait;

Then they drop the old disks like this:

alter diskgroup data
drop disk 'DATA1','DATA2','DATA3','DATA4',
          'DATA5','DATA6','DATA7','DATA8'
rebalance power 11 wait;

Well guess what? That causes double the amount of I/O that is actually necessary to migrate, because Oracle evenly stripes across all disks and then has to rebalance a second time once the original disks are dropped.

This is how it should be done – in one single operation:

alter diskgroup data
add disk  'ORCL:NEWDATA1','ORCL:NEWDATA2','ORCL:NEWDATA3','ORCL:NEWDATA4',
          'ORCL:NEWDATA5','ORCL:NEWDATA6','ORCL:NEWDATA7','ORCL:NEWDATA8'
drop disk 'DATA1','DATA2','DATA3','DATA4',
          'DATA5','DATA6','DATA7','DATA8'
rebalance power 11 wait;

A customer of mine tried this earlier this week and reported back that their ASM rebalance time had reduced by a factor of five!

By the way, the WAIT command means the cursor doesn’t return until the command is finished. To have the command essentially run in the background you can simply change this to NOWAIT. Also, you could run the ADD and DROP commands separately if you used a POWER LIMIT of zero for the first command, as this would pause the rebalance and then the second command would kick it off.

2. Power Limit Goes Up To 1024

Simple one this, but easily forgotten. From the early days of ASM, the maximum power limit for rebalance operations was 11. See here if you don’t know why.

From 11.2.0.2, if the COMPATIBLE.ASM disk group attribute is set to 11.2.0.2 or higher the limit is now 1024. That means 11 really isn’t going to cut it anymore. If you are asking for full power, make sure you know what number that is.

3. Avoid The Compact Phase (for Flash Storage Systems)

An ASM rebalance operation comprises three phases, where the third one is the compact phase. This attempts to move data as close as possible to the outer tracks of the disks ASM is using.

Did you spot the issue there? Disks. This I/O-heavy phase is completely pointless on a flash system, where I/O is served evenly from any logical address within a LUN.

You can therefore avoid that potentially-massive I/O hit by disabling the compact phase, using the underscore parameter _DISABLE_REBALANCE_COMPACT=TRUE. Remember that you need to get Oracle Support’s permission before setting underscore parameters! Point your SR in the direction of the following My Oracle Support note:

What is ASM rebalance compact Phase and how it can be disabled (Doc ID 1902001.1)

Unfortunately it appears the parameter was deprecated in 12c, so from now on you have to set the ASM diskgroup attribute “_rebalance_compact” to FALSE (note the opposite value to that set at the instance level!), for example:

ALTER DISKGROUP  SET ATTRIBUTE "_rebalance_compact”="FALSE";

If you want to know more about this topic (for example, what the first two rebalance phases are), or indeed anything about ASM in general, I highly recommend the legendary ASM blogger that is Bane Radulovic a.k.a. ASM Support Guy.

Conclusion

An ASM rebalance potentially creates a lot of I/O, which means you may need to wait for a long time before it finishes. For that reason, make sure you understand what you are doing and make every effort to perform only as much I/O as you actually need. Don’t forget you can use the EXPLAIN WORK command to gauge in advance how much work is required.

Happy rebalancing!

Viewing ASM trace files in VIM: Which Way Do You Use?

cafepress_womens_cap_sleeve_tshirt

A couple of people have asked me recently about a classic problem that most DBAs know: how to view ASM trace files in the VIM editor when the filenames start with a + character. To my surprise, there are actually quite a few different ways of doing it. Since it’s come up, I thought I’d list a few of them here… If you have another one to add, feel free to comment. I know that most people reading this already have an answer, I’m just interested in who uses the most efficient one…

The Problem

VIM is a text editor used in many different operating systems. You know the one, it’s incredibly powerful, utterly incomprehensible to the newcomer… and will forever have more options than you can remember. I mean, just check out the cheat sheet:

People love or hate vim (I love it), but it’s often used on Linux systems simply because it’s always there. The problem comes when you want to look at ASM trace files, because they have a silly name:

oracle@server3 trace]$ pwd
/u01/app/oracle/diag/asm/+asm/+ASM/trace
[oracle@server3 trace]$ ls -l +ASM_ora_27425*
-rw-r----- 1 oracle oinstall 20625 Aug 20 15:42 +ASM_ora_27425.trc
-rw-r----- 1 oracle oinstall   528 Aug 20 15:42 +ASM_ora_27425.trm

Oracle trace files tend to have names in the format <oracle-sid>-<process-name>-<process-id>.trc, which is fine until the Oracle SID is that of the Automatic Storage Management instance, i.e. “+ASM”.

It’s that “+” prefix character that does it:

[oracle@server3 trace]$ vim +ASM_ora_27425.trc

Error detected while processing command line:
E492: Not an editor command: ASM_ora_27425.trc
Press ENTER or type command to continue

Why does this happen? Well because in among the extensive options of vim are to be found the following:

[oracle@server3 trace]$ man vim
...
OPTIONS
       The  options may be given in any order, before or after filenames.  Options without an argument can be combined after a
       single dash.

       +[num]      For the first file the cursor will be positioned on line "num".  If "num" is missing, the  cursor  will  be
                   positioned on the last line.

       +/{pat}     For  the first file the cursor will be positioned on the first occurrence of {pat}.  See ":help search-pat-
                   tern" for the available search patterns.

       +{command}
...

So… the plus character is actually being interpreted by VIM as an option. Surely we can just escape it then, right?

[oracle@server3 trace]$ vim \+ASM_ora_27425.trc

Error detected while processing command line:
E492: Not an editor command: ASM_ora_27425.trc
Press ENTER or type command to continue

Nope. And neither single nor double quotes around the filename work either. So what are the options?

Solution 1: Make Sure The “+” Isn’t The Prefix

Simple, but effective. If the + character isn’t leading the filename, VIM won’t try to interpret it. So instead of a relative filename, I could use the absolute:

[oracle@server3 trace]$ vi /u01/app/oracle/diag/asm/+asm/+ASM/trace/+ASM_ora_27425.trc

Or even just use a ./ to denote the current directory:

[oracle@server3 trace]$ vi ./+ASM_ora_27425.trc

Solution 2: Double Dash

Even simpler, but less well known (I think?) is the double-dash or hyphen option. If you browse the VIM man page a little further on, you’ll find this:

[oracle@server3 trace]$ man vim
...
 --          Denotes  the end of the options.  Arguments after this will be handled as a file name.  This can be used to
                   edit a filename that starts with a ’-’.
...

And it works perfectly:

[oracle@server3 trace]$ vi -- +ASM_ora_27425.trc

Solution 3: Use Find and -Exec

Another, slightly messy option is to use the find command to send the file to VIM. I know people who still do this, despite it being more work than the other options – sometimes a lazy hack can become unconscious habit:

[oracle@server3 trace]$ find . -name +ASM_ora_27425.trc -exec vi {} \;

In fact, I actually know somebody who used to look up the file’s inode number and then pass that into find:

[oracle@server3 trace]$ ls -li +ASM_ora_27425*
138406 -rw-r----- 1 oracle oinstall 20625 Aug 20 15:42 +ASM_ora_27425.trc
138407 -rw-r----- 1 oracle oinstall   528 Aug 20 15:42 +ASM_ora_27425.trm
[oracle@server3 trace]$ find . -inum 138406 -exec vi {} \;

Luckily nobody will ever know who that somebody is*.

Solution 4: Rename It

My least favourite option, but it’s actually quite efficient. Simple create a copy of the file with a new name that doesn’t contain a plus – luckily the cp command doesn’t care about the + prefix:

[oracle@server3 trace]$ cp +ASM_ora_27425.trc me.trc
[oracle@server3 trace]$ vi me.trc

Of course, you’ll want to tidy up that new file afterwards and not just leave it lying around… won’t you?

Less Is More

Maybe you’re not the sort of person that likes to use VIM. Maybe you prefer the more basic OS tools like cat (which works fine on ASM trace files), or more (which doesn’t), or even less.

In fact, less has pretty much the same options as VIM, which means you can use all of the above solutions with it. If you are using more, you cannot pass this a double dash but the others will work. And if you’re using cat, good luck to you… I hope you have a big screen.

* Yes, of course, it was me.

Oracle 12.1.0.2 ASM Filter Driver: Advanced Format Fail

wrong-way

[Please note that a more up-to-date post on this subject can be found here]

In my previous post on the subject of the new ASM Filter Driver (AFD) feature introduced in Oracle’s 12.1.0.2 patchset, I installed the AFD to see how it fulfilled its promise that it “filters out all non-Oracle I/Os which could cause accidental overwrites“. However, because I was ten minutes away from my summer vacation at the point of finishing that post, I didn’t actually get round to writing about what happens when you try and create ASM diskgroups on the devices it presents.

Obviously I’ve spent the intervening period constantly worrying about this oversight – indeed, it was only through the judicious application of good food and drink plus some committed relaxation in the sun that I was able to pull through. However, I’m back now and it seems like time to rectify that mistake. So here goes.

Creating ASM Diskgroups with the ASM Filter Driver

It turns out I need not have worried, because it doesn’t work right now… at least, not for me. Here’s why:

First of all, I installed Oracle 12.1.0.2 Grid Infrastructure. I then labelled some block devices presented from my Violin storage array. As I’ve already pasted all the output from those two steps in the previous post, I won’t repeat myself.

The next step is therefore to create a diskgroup. Since I’ve only just come back from holiday and so I’m still half brain-dead, I’ll choose the simple route and fire up the ASM Configuration Assistant (ASMCA) so that I don’t have to look up any of that nasty SQL. Here goes:

afd_create

But guess what happened when I hit the OK button? It failed, bigtime. Here’s the alert log – if you don’t like huge amounts of meaningless text I suggest you skip down… a lot… (although thinking about it, my entire blog could be described as meaningless text):

SQL> CREATE DISKGROUP DATA EXTERNAL REDUNDANCY  DISK 'AFD:DATA1' SIZE 72704M ,
'AFD:DATA2' SIZE 72704M ,
'AFD:DATA3' SIZE 72704M ,
'AFD:DATA4' SIZE 72704M ,
'AFD:DATA5' SIZE 72704M ,
'AFD:DATA6' SIZE 72704M ,
'AFD:DATA7' SIZE 72704M ,
'AFD:DATA8' SIZE 72704M  ATTRIBUTE 'compatible.asm'='12.1.0.0.0','au_size'='1M' /* ASMCA */
Fri Jul 25 16:25:33 2014
WARNING: Library 'AFD Library - Generic , version 3 (KABI_V3)' does not support advanced format disks
Fri Jul 25 16:25:33 2014
NOTE: Assigning number (1,0) to disk (AFD:DATA1)
NOTE: Assigning number (1,1) to disk (AFD:DATA2)
NOTE: Assigning number (1,2) to disk (AFD:DATA3)
NOTE: Assigning number (1,3) to disk (AFD:DATA4)
NOTE: Assigning number (1,4) to disk (AFD:DATA5)
NOTE: Assigning number (1,5) to disk (AFD:DATA6)
NOTE: Assigning number (1,6) to disk (AFD:DATA7)
NOTE: Assigning number (1,7) to disk (AFD:DATA8)
NOTE: initializing header (replicated) on grp 1 disk DATA1
NOTE: initializing header (replicated) on grp 1 disk DATA2
NOTE: initializing header (replicated) on grp 1 disk DATA3
NOTE: initializing header (replicated) on grp 1 disk DATA4
NOTE: initializing header (replicated) on grp 1 disk DATA5
NOTE: initializing header (replicated) on grp 1 disk DATA6
NOTE: initializing header (replicated) on grp 1 disk DATA7
NOTE: initializing header (replicated) on grp 1 disk DATA8
NOTE: initializing header on grp 1 disk DATA1
NOTE: initializing header on grp 1 disk DATA2
NOTE: initializing header on grp 1 disk DATA3
NOTE: initializing header on grp 1 disk DATA4
NOTE: initializing header on grp 1 disk DATA5
NOTE: initializing header on grp 1 disk DATA6
NOTE: initializing header on grp 1 disk DATA7
NOTE: initializing header on grp 1 disk DATA8
NOTE: Disk 0 in group 1 is assigned fgnum=1
NOTE: Disk 1 in group 1 is assigned fgnum=2
NOTE: Disk 2 in group 1 is assigned fgnum=3
NOTE: Disk 3 in group 1 is assigned fgnum=4
NOTE: Disk 4 in group 1 is assigned fgnum=5
NOTE: Disk 5 in group 1 is assigned fgnum=6
NOTE: Disk 6 in group 1 is assigned fgnum=7
NOTE: Disk 7 in group 1 is assigned fgnum=8
NOTE: initiating PST update: grp = 1
Fri Jul 25 16:25:33 2014
GMON updating group 1 at 1 for pid 7, osid 16745
NOTE: group DATA: initial PST location: disk 0000 (PST copy 0)
NOTE: set version 1 for asmCompat 12.1.0.0.0
Fri Jul 25 16:25:33 2014
NOTE: PST update grp = 1 completed successfully
NOTE: cache registered group DATA 1/0xD9B6AE8D
NOTE: cache began mount (first) of group DATA 1/0xD9B6AE8D
NOTE: cache is mounting group DATA created on 2014/07/25 16:25:33
NOTE: cache opening disk 0 of grp 1: DATA1 label:DATA1
NOTE: cache opening disk 1 of grp 1: DATA2 label:DATA2
NOTE: cache opening disk 2 of grp 1: DATA3 label:DATA3
NOTE: cache opening disk 3 of grp 1: DATA4 label:DATA4
NOTE: cache opening disk 4 of grp 1: DATA5 label:DATA5
NOTE: cache opening disk 5 of grp 1: DATA6 label:DATA6
NOTE: cache opening disk 6 of grp 1: DATA7 label:DATA7
NOTE: cache opening disk 7 of grp 1: DATA8 label:DATA8
NOTE: cache creating group 1/0xD9B6AE8D (DATA)
NOTE: cache mounting group 1/0xD9B6AE8D (DATA) succeeded
WARNING: cache read a corrupt block: group=1(DATA) dsk=0 blk=1 disk=0 (DATA1) incarn=3493224069 au=0 blk=1 count=1
Fri Jul 25 16:25:33 2014
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM/trace/+ASM_ora_16745.trc:
ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [2147483648] [1] [0 != 1]
NOTE: a corrupted block from group DATA was dumped to /u01/app/oracle/diag/asm/+asm/+ASM/trace/+ASM_ora_16745.trc
WARNING: cache read (retry) a corrupt block: group=1(DATA) dsk=0 blk=1 disk=0 (DATA1) incarn=3493224069 au=0 blk=1 count=1
Fri Jul 25 16:25:33 2014
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM/trace/+ASM_ora_16745.trc:
ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [2147483648] [1] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [2147483648] [1] [0 != 1]
WARNING: cache read (retry) a corrupt block: group=1(DATA) dsk=0 blk=1 disk=0 (DATA1) incarn=3493224069 au=11 blk=1 count=1
Fri Jul 25 16:25:33 2014
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM/trace/+ASM_ora_16745.trc:
ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [2147483648] [1] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [2147483648] [1] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [2147483648] [1] [0 != 1]
NOTE: a corrupted block from group DATA was dumped to /u01/app/oracle/diag/asm/+asm/+ASM/trace/+ASM_ora_16745.trc
WARNING: cache read (retry) a corrupt block: group=1(DATA) dsk=0 blk=1 disk=0 (DATA1) incarn=3493224069 au=11 blk=1 count=1
Fri Jul 25 16:25:33 2014
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM/trace/+ASM_ora_16745.trc:
ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [2147483648] [1] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [2147483648] [1] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [2147483648] [1] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [2147483648] [1] [0 != 1]
ERROR: cache failed to read group=1(DATA) dsk=0 blk=1 from disk(s): 0(DATA1) 0(DATA1)
ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [2147483648] [1] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [2147483648] [1] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [2147483648] [1] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [2147483648] [1] [0 != 1]

NOTE: cache initiating offline of disk 0 group DATA
NOTE: process _user16745_+asm (16745) initiating offline of disk 0.3493224069 (DATA1) with mask 0x7e in group 1 (DATA) with client assisting
NOTE: initiating PST update: grp 1 (DATA), dsk = 0/0xd0365e85, mask = 0x6a, op = clear
Fri Jul 25 16:25:34 2014
GMON updating disk modes for group 1 at 2 for pid 7, osid 16745
ERROR: disk 0(DATA1) in group 1(DATA) cannot be offlined because the disk group has external redundancy.
Fri Jul 25 16:25:34 2014
ERROR: too many offline disks in PST (grp 1)
Fri Jul 25 16:25:34 2014
ERROR: no read quorum in group: required 1, found 0 disks
ERROR: Could not read PST for grp 1. Force dismounting the disk group.
Fri Jul 25 16:25:34 2014
NOTE: halting all I/Os to diskgroup 1 (DATA)
Fri Jul 25 16:25:34 2014
ERROR: no read quorum in group: required 1, found 0 disks
ASM Health Checker found 1 new failures
Fri Jul 25 16:25:36 2014
ERROR: no read quorum in group: required 1, found 0 disks
Fri Jul 25 16:25:36 2014
ERROR: Could not read PST for grp 1. Force dismounting the disk group.
Fri Jul 25 16:25:36 2014
ERROR: no read quorum in group: required 1, found 0 disks
ERROR: Could not read PST for grp 1. Force dismounting the disk group.
Fri Jul 25 16:25:36 2014
ERROR: no read quorum in group: required 1, found 0 disks
ERROR: Could not read PST for grp 1. Force dismounting the disk group.
Fri Jul 25 16:25:37 2014
NOTE: AMDU dump of disk group DATA initiated at /u01/app/oracle/diag/asm/+asm/+ASM/trace
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM/trace/+ASM_ora_16745.trc  (incident=3257):
ORA-15335: ASM metadata corruption detected in disk group 'DATA'
ORA-15130: diskgroup "DATA" is being dismounted
ORA-15066: offlining disk "DATA1" in group "DATA" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [2147483648] [1] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [2147483648] [1] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [2147483648] [1] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [2147483648] [1] [0 != 1]
Incident details in: /u01/app/oracle/diag/asm/+asm/+ASM/incident/incdir_3257/+ASM_ora_16745_i3257.trc
Fri Jul 25 16:25:37 2014
Sweep [inc][3257]: completed
Fri Jul 25 16:25:37 2014
SQL> alter diskgroup DATA check
System State dumped to trace file /u01/app/oracle/diag/asm/+asm/+ASM/incident/incdir_3257/+ASM_ora_16745_i3257.trc
NOTE: erasing header (replicated) on grp 1 disk DATA1
NOTE: erasing header (replicated) on grp 1 disk DATA2
NOTE: erasing header (replicated) on grp 1 disk DATA3
NOTE: erasing header (replicated) on grp 1 disk DATA4
NOTE: erasing header (replicated) on grp 1 disk DATA5
NOTE: erasing header (replicated) on grp 1 disk DATA6
NOTE: erasing header (replicated) on grp 1 disk DATA7
NOTE: erasing header (replicated) on grp 1 disk DATA8
NOTE: erasing header on grp 1 disk DATA1
NOTE: erasing header on grp 1 disk DATA2
NOTE: erasing header on grp 1 disk DATA3
NOTE: erasing header on grp 1 disk DATA4
NOTE: erasing header on grp 1 disk DATA5
NOTE: erasing header on grp 1 disk DATA6
NOTE: erasing header on grp 1 disk DATA7
NOTE: erasing header on grp 1 disk DATA8
Fri Jul 25 16:25:37 2014
NOTE: cache dismounting (clean) group 1/0xD9B6AE8D (DATA)
NOTE: messaging CKPT to quiesce pins Unix process pid: 16745, image: oracle@server3.local (TNS V1-V3)
NOTE: dbwr not being msg'd to dismount
NOTE: LGWR not being messaged to dismount
NOTE: cache dismounted group 1/0xD9B6AE8D (DATA)
NOTE: cache ending mount (fail) of group DATA number=1 incarn=0xd9b6ae8d
NOTE: cache deleting context for group DATA 1/0xd9b6ae8d
Fri Jul 25 16:25:37 2014
GMON dismounting group 1 at 3 for pid 7, osid 16745
Fri Jul 25 16:25:37 2014
NOTE: Disk DATA1 in mode 0x7f marked for de-assignment
NOTE: Disk DATA2 in mode 0x7f marked for de-assignment
NOTE: Disk DATA3 in mode 0x7f marked for de-assignment
NOTE: Disk DATA4 in mode 0x7f marked for de-assignment
NOTE: Disk DATA5 in mode 0x7f marked for de-assignment
NOTE: Disk DATA6 in mode 0x7f marked for de-assignment
NOTE: Disk DATA7 in mode 0x7f marked for de-assignment
NOTE: Disk DATA8 in mode 0x7f marked for de-assignment
ERROR: diskgroup DATA was not created
ORA-15018: diskgroup cannot be created
ORA-15335: ASM metadata corruption detected in disk group 'DATA'
ORA-15130: diskgroup "DATA" is being dismounted
Fri Jul 25 16:25:37 2014
ORA-15032: not all alterations performed
ORA-15066: offlining disk "DATA1" in group "DATA" may result in a data loss
ORA-15001: diskgroup "DATA" does not exist or is not mounted
ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [2147483648] [1] [0 != 1]

Now then. First of all, thanks for making it this far – I promise not to do that again in this post. Secondly, in case you really did just hit page down *a lot* you might want to skip back up and look for the bits I’ve conveniently highlighted in red. Specifically, this bit:

WARNING: Library 'AFD Library - Generic , version 3 (KABI_V3)' does not support advanced format disks

Many modern storage platforms use Advanced Format – if you want to know what that means, read here. The idea that AFD doesn’t support advanced format is somewhat alarming – and indeed incorrect, according to interactions I have subsequently had with Oracle’s ASM Product Management people. From what I understand, the problem is tracked as bug 19297177 (currently unpublished) and is caused by AFD incorrectly checking the physical blocksize of the storage device (4k) instead of the logical block size (which was 512 bytes). I currently have a request open with Oracle Support for the patch, so when that arrives I will re-test and add another blog article.

Until then, I guess I might as well take another well-earned vacation?

Oracle 12.1.0.2 ASM Filter Driver: First Impressions

This is a very quick post, because I’m about to log off and take an extended summer holiday (or vacation as my crazy American friends call it… but then they call football  “soccer” too). Before I go, I wanted to document my initial findings with the new ASM Filter Driver feature introduced in this week’s 12.1.0.2 patchset. [For a more recent post on this topic, read here]

Currently a Linux-only feature, the ASM Filter Driver (or AFD) is a replacement for ASMLib and is described by Oracle as follows:

Oracle ASM Filter Driver (Oracle ASMFD) is a kernel module that resides in the I/O path of the Oracle ASM disks. Oracle ASM uses the filter driver to validate write I/O requests to Oracle ASM disks.

The Oracle ASMFD simplifies the configuration and management of disk devices by eliminating the need to rebind disk devices used with Oracle ASM each time the system is restarted.

The Oracle ASM Filter Driver rejects any I/O requests that are invalid. This action eliminates accidental overwrites of Oracle ASM disks that would cause corruption in the disks and files within the disk group. For example, the Oracle ASM Filter Driver filters out all non-Oracle I/Os which could cause accidental overwrites.

Interesting, eh? So let’s find out how that works.

Installation

I found this a real pain as you need to have 12.1.0.2 installed before the AFD is available to label your disks, yet the default OUI mode wants to create an ASM diskgroup… and you cannot do that without any labelled disks.

The only solution I could come up with was to perform a software-only install, which in itself is a pain. I’ll skip the numerous screenshots of that part though and just skip straight to the bit where I have 12.1.0.2 Grid Infrastructure installed.

I’m following these instructions because I am using a single-instance Oracle Restart system rather than a true cluster.

First of all we need to do this:

[oracle@server3 ~]$ $ORACLE_HOME/bin/asmcmd dsset 'AFD:*'

[oracle@server3 ~]$ $ORACLE_HOME/bin/asmcmd dsget
parameter:AFD:*
profile:AFD:*
[oracle@server3 ~]$ srvctl config asm
ASM home: 
Password file:
ASM listener: LISTENER
Spfile: /u01/app/oracle/admin/+ASM/pfile/spfile+ASM.ora
ASM diskgroup discovery string: AFD:*

Then we need to stop HAS and run the AFD_CONFIGURE command:

[root@server3 ~]# $ORACLE_HOME/bin/crsctl stop has -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'server3'
CRS-2673: Attempting to stop 'ora.asm' on 'server3'
CRS-2673: Attempting to stop 'ora.evmd' on 'server3'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'server3'
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'server3' succeeded
CRS-2677: Stop of 'ora.evmd' on 'server3' succeeded
CRS-2677: Stop of 'ora.asm' on 'server3' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'server3'
CRS-2677: Stop of 'ora.cssd' on 'server3' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'server3' has completed
CRS-4133: Oracle High Availability Services has been stopped.

[root@server3 ~]# $ORACLE_HOME/bin/asmcmd afd_configure
Connected to an idle instance.
AFD-627: AFD distribution files found.
AFD-636: Installing requested AFD software.
AFD-637: Loading installed AFD drivers.
AFD-9321: Creating udev for AFD.
AFD-9323: Creating module dependencies - this may take some time.
AFD-9154: Loading 'oracleafd.ko' driver.
AFD-649: Verifying AFD devices.
AFD-9156: Detecting control device '/dev/oracleafd/admin'.
AFD-638: AFD installation correctness verified.
Modifying resource dependencies - this may take some time.
ASMCMD-9524: AFD configuration failed 'ERROR: OHASD start failed'

Er… that’s not really what I had in mind. But hey, let’s carry on regardless:

[root@server3 oracleafd]# $ORACLE_HOME/bin/asmcmd afd_state
Connected to an idle instance.
ASMCMD-9526: The AFD state is 'LOADED' and filtering is 'DEFAULT' on host 'server3.local'

[root@server3 oracleafd]# $ORACLE_HOME/bin/crsctl start has
CRS-4123: Oracle High Availability Services has been started.

Ok it seems to be working. I wonder what it’s done?

Investigation

The first thing I notice is some Oracle kernel modules have been loaded:

[root@server3 ~]# lsmod | grep ora
oracleafd             208499  1
oracleacfs           3307969  0
oracleadvm            506254  0
oracleoks             505749  2 oracleacfs,oracleadvm

I also see that, just like ASMLib, a driver has been plonked into the /opt/oracle/extapi directory:

[root@server3 1]# find /opt/oracle/extapi -ls
2752765    4 drwxr-xr-x   3 root     root         4096 Jul 25 15:15 /opt/oracle/extapi
2752766    4 drwxr-xr-x   3 root     root         4096 Jul 25 15:15 /opt/oracle/extapi/64
2753508    4 drwxr-xr-x   3 root     root         4096 Jul 25 15:15 /opt/oracle/extapi/64/asm
2756532    4 drwxr-xr-x   3 root     root         4096 Jul 25 15:15 /opt/oracle/extapi/64/asm/orcl
2756562    4 drwxr-xr-x   2 root     root         4096 Jul 25 15:15 /opt/oracle/extapi/64/asm/orcl/1
2756578  268 -rwxr-xr-x   1 oracle   dba        272513 Jul 25 15:15 /opt/oracle/extapi/64/asm/orcl/1/libafd12.so

And again, just like ASMLib, there is a new directory under /dev called /dev/oracleafd (whereas for ASMLib it’s called /dev/oracleasm):

[root@server3 ~]# ls -la /dev/oracleafd/
total 0
drwxrwx---  3 oracle dba      80 Jul 25 15:15 .
drwxr-xr-x 21 root   root  15820 Jul 25 15:15 ..
brwxrwx---  1 oracle dba  249, 0 Jul 25 15:15 admin
drwxrwx---  2 oracle dba      40 Jul 25 15:15 disks

The disks directory is currently empty. Maybe I should create some AFD devices and see what happens?

Labelling

So let’s look at my Violin devices and see if I can label them:

root@server3 mapper]# ls -l /dev/mapper
total 0
crw-rw---- 1 root root 10, 236 Jul 11 16:52 control
lrwxrwxrwx 1 root root       7 Jul 25 15:49 data1 -> ../dm-3
lrwxrwxrwx 1 root root       7 Jul 25 15:49 data2 -> ../dm-4
lrwxrwxrwx 1 root root       7 Jul 25 15:49 data3 -> ../dm-5
lrwxrwxrwx 1 root root       7 Jul 25 15:49 data4 -> ../dm-6
lrwxrwxrwx 1 root root       7 Jul 25 15:49 data5 -> ../dm-7
lrwxrwxrwx 1 root root       7 Jul 25 15:49 data6 -> ../dm-8
lrwxrwxrwx 1 root root       7 Jul 25 15:49 data7 -> ../dm-9
lrwxrwxrwx 1 root root       8 Jul 25 15:49 data8 -> ../dm-10
lrwxrwxrwx 1 root root       7 Jul 11 16:53 VolGroup-lv_home -> ../dm-2
lrwxrwxrwx 1 root root       7 Jul 11 16:53 VolGroup-lv_root -> ../dm-0
lrwxrwxrwx 1 root root       7 Jul 11 16:52 VolGroup-lv_swap -> ../dm-1

The documentation appears to be incorrect here, when it says to use the command $ORACLE_HOME/bin/afd_label. It’s actually $ORACLE_HOME/bin/asmcmd with the first parameter afd_label. I’m going to label the devices called /dev/mapper/data*:

[root@server3 mapper]# for lun in 1 2 3 4 5 6 7 8; do
> asmcmd afd_label DATA$lun /dev/mapper/data$lun
> done
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.

root@server3 mapper]# asmcmd afd_lsdsk
Connected to an idle instance.
--------------------------------------------------------------------------------
Label                     Filtering   Path
================================================================================
DATA1                       ENABLED   /dev/mapper/data1
DATA2                       ENABLED   /dev/mapper/data2
DATA3                       ENABLED   /dev/mapper/data3
DATA4                       ENABLED   /dev/mapper/data4
DATA5                       ENABLED   /dev/mapper/data5
DATA6                       ENABLED   /dev/mapper/data6
DATA7                       ENABLED   /dev/mapper/data7
DATA8                       ENABLED   /dev/mapper/data8

That seemed to work ok. So what’s going on in the /dev/oracleafd/disks directory now?

[root@server3 ~]# ls -l /dev/oracleafd/disks/
total 32
-rw-r--r-- 1 root root 26 Jul 25 15:52 DATA1
-rw-r--r-- 1 root root 26 Jul 25 15:49 DATA2
-rw-r--r-- 1 root root 26 Jul 25 15:49 DATA3
-rw-r--r-- 1 root root 26 Jul 25 15:49 DATA4
-rw-r--r-- 1 root root 26 Jul 25 15:49 DATA5
-rw-r--r-- 1 root root 26 Jul 25 15:49 DATA6
-rw-r--r-- 1 root root 26 Jul 25 15:49 DATA7
-rw-r--r-- 1 root root 26 Jul 25 15:49 DATA8

There they are, just like with ASMLib. But look at the permissions, they are all owned by root with read-only privs for other users. In an ASMLib environment these devices are owned by oracle:dba, which means non-Oracle processes can write to them and corrupt them in some situations. Is this how Oracle claims the AFD protects devices?

I haven’t had time to investigate further but I assume that the database will access the devices via this mysterious block device:

[oracle@server3 oracleafd]$ ls -l /dev/oracleafd/admin
brwxrwx--- 1 oracle dba 249, 0 Jul 25 16:25 /dev/oracleafd/admin

It will be interesting to find out.

Distruction

Of course, if you are logged in as root you aren’t going to be protected from any crazy behaviour:

[root@server3 ~]# cd /dev/oracleafd/disks
[root@server3 disks]# ls -l
total 496
-rw-r--r-- 1 root root 475877 Jul 25 16:40 DATA1
-rw-r--r-- 1 root root     26 Jul 25 15:49 DATA2
-rw-r--r-- 1 root root     26 Jul 25 15:49 DATA3
-rw-r--r-- 1 root root     26 Jul 25 15:49 DATA4
-rw-r--r-- 1 root root     26 Jul 25 15:49 DATA5
-rw-r--r-- 1 root root     26 Jul 25 15:49 DATA6
-rw-r--r-- 1 root root     26 Jul 25 15:49 DATA7
-rw-r--r-- 1 root root     26 Jul 25 15:49 DATA8
[root@server3 disks]# od -c -N 256 DATA8
0000000   /   d   e   v   /   m   a   p   p   e   r   /   d   a   t   a
0000020   8  \n
0000032
[root@server3 disks]# dmesg >> DATA8
[root@server3 disks]# od -c -N 256 DATA8
0000000   /   d   e   v   /   m   a   p   p   e   r   /   d   a   t   a
0000020   8   \n   z   r   d   b   t   e   2  l   I   n   i   t   i   a
0000040   l   i   z   i   n   g       c   g   r   o   u   p       s   u
0000060   b   s   y   s       c   p   u   s   e   t  \n   I   n   i   t
0000100   i   a   l   i   z   i   n   g       c   g   r   o   u   p
0000120   s   u   b   s   y   s       c   p   u  \n   L   i   n   u   x
0000140       v   e   r   s   i   o   n       3   .   8   .   1   3   -
0000160   2   6   .   2   .   3   .   e   l   6   u   e   k   .   x   8
0000200   6   _   6   4       (   m   o   c   k   b   u   i   l   d   @
0000220   c   a   -   b   u   i   l   d   4   4   .   u   s   .   o   r
0000240   a   c   l   e   .   c   o   m   )       (   g   c   c       v
0000260   e   r   s   i   o   n       4   .   4   .   7       2   0   1
0000300   2   0   3   1   3       (   R   e   d       H   a   t       4
0000320   .   4   .   7   -   3   )       (   G   C   C   )       )
0000340   #   2       S   M   P       W   e   d       A   p   r       1
0000360   6       0   2   :   5   1   :   1   0       P   D   T       2
0000400

Proof, if ever you need it, that root access is still the fastest and easiest route to total disaster…

[Update July 2015: Ok, so look. I was wrong in this post – these /dev/oracleafd/disks devices are simply pointers to devices in /dev/dm-* and thus I was only overwriting the pointer. To read a more accurate post on the subject, please read here]

New My Oracle Support note on Advanced Format (4k) storage

advanced-format-logo

In the past I have been a little critical of Oracle’s support notes and documentation regarding the use of Advanced Format 4k storage devices. I must now take that back, as my new friends in Oracle ASM Development and Product Management very kindly offered to let me write a new support note, which they have just published on My Oracle Support. It’s only supposed to be high level, but it does confirm that the _DISK_SECTOR_SIZE_OVERRIDE parameter can be safely set in database instances when using 512e storage and attempting to create 4k online redo logs.

The new support note is:

Using 4k Redo Logs on Flash and SSD-based Storage (Doc ID 1681266.1)

Don’t forget that you can read all about the basics of using Oracle with 4k sector storage here. And if you really feel up to it, I have a 4k deep dive page here.