Oracle’s ASM Filter Driver Revisited

filter

Almost exactly a year ago I published a post covering my first impressions of the ASM Filter Driver (ASMFD) released in Oracle 12.1.0.2, followed swiftly by a second post showing that it didn’t work with 4k native devices.

When I wrote that first post I was about to start my summer holidays, so I’m afraid to admit that I was a little sloppy and made some false assumptions toward the end – assumptions which were quickly overturned by eagle-eyed readers in the comments section. So I need to revisit that at some point in this post.

But first, some background.

Some Background

If you don’t know what ASMFD is, let me just quote from the 12.1 documentation:

Oracle ASM Filter Driver (Oracle ASMFD) is a kernel module that resides in the I/O path of the Oracle ASM disks. Oracle ASM uses the filter driver to validate write I/O requests to Oracle ASM disks.

The Oracle ASMFD simplifies the configuration and management of disk devices by eliminating the need to rebind disk devices used with Oracle ASM each time the system is restarted.

The Oracle ASM Filter Driver rejects any I/O requests that are invalid. This action eliminates accidental overwrites of Oracle ASM disks that would cause corruption in the disks and files within the disk group. For example, the Oracle ASM Filter Driver filters out all non-Oracle I/Os which could cause accidental overwrites.

This is interesting, because ASMFD is considered a replacement for Oracle ASMLib, yet the documentation for ASMFD doesn’t make all of the same claims that Oracle makes for ASMLib. Both ASMFD and ASMLib claim to simplify the configuration and management of disk devices, but ASMLib’s documentation also claims that it “greatly reduces kernel resource usage“. Doesn’t ASMFD have this effect too? What is definitely a new feature for ASMFD is the ability to reject invalid (i.e. non-Oracle) I/O operations to ASMFD devices – and that’s what I got wrong last time.

However, before we can revisit that, I need to install ASMFD on a brand new system.

Installing ASMFD

Last time I tried this I made the mistake of installing 12.1.0.2.0 with no patch set updates. Thanks to a reader called terry, I now know that the PSU is a very good idea, so this time I’m using 12.1.0.2.3. First let’s do some preparation.

Preparing To Install

I’m using an Oracle Linux 6 Update 5 system running the Oracle Unbreakable Enterprise Kernel v3:

[root@server4 ~]# cat /etc/oracle-release
Oracle Linux Server release 6.5
[root@server4 ~]# uname -r
3.8.13-26.2.3.el6uek.x86_64

As usual I have taken all of the necessary pre-installation steps to make the Oracle Universal Installer happy. I have disabled selinux and iptables, plus I’ve configured device mapper multipathing. I have two sets of 8 LUNs from my Violin storage: 8 using 512e emulation mode (512 byte logical block size but 4k physical block size) and 8 using 4kN native mode (4k logical and physical block size). If you have any doubts about what that means, read here.

[root@server4 ~]# ls -l /dev/mapper
total 0
crw-rw---- 1 root root 10, 236 Jul 20 16:52 control
lrwxrwxrwx 1 root root       7 Jul 20 16:53 mpatha -> ../dm-0
lrwxrwxrwx 1 root root       7 Jul 20 16:53 mpathap1 -> ../dm-1
lrwxrwxrwx 1 root root       7 Jul 20 16:53 mpathap2 -> ../dm-2
lrwxrwxrwx 1 root root       7 Jul 20 16:53 mpathap3 -> ../dm-3
lrwxrwxrwx 1 root root       7 Jul 20 16:53 v4kdata1 -> ../dm-6
lrwxrwxrwx 1 root root       7 Jul 20 16:53 v4kdata2 -> ../dm-7
lrwxrwxrwx 1 root root       7 Jul 20 16:53 v4kdata3 -> ../dm-8
lrwxrwxrwx 1 root root       7 Jul 20 16:53 v4kdata4 -> ../dm-9
lrwxrwxrwx 1 root root       8 Jul 20 16:53 v4kdata5 -> ../dm-10
lrwxrwxrwx 1 root root       8 Jul 20 16:53 v4kdata6 -> ../dm-11
lrwxrwxrwx 1 root root       8 Jul 20 16:53 v4kdata7 -> ../dm-12
lrwxrwxrwx 1 root root       8 Jul 20 16:53 v4kdata8 -> ../dm-13
lrwxrwxrwx 1 root root       8 Jul 20 17:00 v512data1 -> ../dm-14
lrwxrwxrwx 1 root root       8 Jul 20 17:00 v512data2 -> ../dm-15
lrwxrwxrwx 1 root root       8 Jul 20 17:00 v512data3 -> ../dm-16
lrwxrwxrwx 1 root root       8 Jul 20 17:00 v512data4 -> ../dm-17
lrwxrwxrwx 1 root root       8 Jul 20 17:00 v512data5 -> ../dm-18
lrwxrwxrwx 1 root root       8 Jul 20 17:00 v512data6 -> ../dm-19
lrwxrwxrwx 1 root root       8 Jul 20 17:00 v512data7 -> ../dm-20
lrwxrwxrwx 1 root root       8 Jul 20 17:00 v512data8 -> ../dm-21
lrwxrwxrwx 1 root root       8 Jul 20 16:53 vg_halfserver4-lv_home -> ../dm-22
lrwxrwxrwx 1 root root       7 Jul 20 16:53 vg_halfserver4-lv_root -> ../dm-4
lrwxrwxrwx 1 root root       7 Jul 20 16:53 vg_halfserver4-lv_swap -> ../dm-5

The 512e devices are shown in green and the 4k devices shown in red. The other devices here can be ignored as they are related to the default filesystem layout of the operating system.

Installing Oracle 12.1.0.2.3 Grid Infrastructure (software only)

This is where the first challenge comes. When you perform a standard install of Oracle 12c Grid Infrastructure you are asked for storage devices on which you can locate items such as the ASM SPFILE, OCR and voting disks. In the old days of using ASMLib you would have prepared these in advance, because ASMLib is a separate kernel module located outside of the Oracle GI home. But ASMFD is part of the Oracle Home and so doesn’t exist prior to installation. Thus we have a chicken and egg situation.

Even worse, I know from bitter experience that I need to install some patches prior to labelling my disks, but I can’t install patches without installing the Oracle home either.

So the only thing for it is to perform a Software Only installation from the Oracle Universal Installer, then apply the PSU, then create an ASM instance and finally label the LUNs with ASMFD. It’s all very long winded. It wouldn’t be a problem if I was migrating from an existing ASMLib setup, but this is a clean install. Such is the price of progress.

To save this post from becoming longer and more unreadable than a 12c AWR report, I’ve captured the entire installation and configuration of 12.1.0.2.3 GI and ASM on a separate installation cookbook page, here:

Installing 12.1.2.0.3 Grid Infrastructure with Oracle Linux 6 Update 5

It’s simpler that way. If you don’t want to go and read it, just take it from me that we now have a working ASM instance which currently has no devices under its control. The PSU has been applied so we are ready to start labelling.

Using ASM Filter Driver to Label Devices

The next step is to start labelling my LUNs with ASMFD. I’m using what the documentation describes as an “Oracle Grid Infrastructure Standalone (Oracle Restart) Environment”, so I’m following this set of steps in the documentation.

Step one tells me to run a dsget command and then a dsset command to add a diskstring of ‘AFD:*’. Ok:

[oracle@server4 ~]$ asmcmd dsget
parameter:
profile:++no-value-at-resource-creation--never-updated-through-ASM++
[oracle@server4 ~]$ asmcmd dsset 'AFD:*'
[oracle@server4 ~]$ asmcmd dsget
parameter:AFD:*
profile:AFD:*

Next I need to stop CRS (I’m using a standalone config so actually it’s HAS):

[root@server4 ~]# crsctl stop has
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'server4'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'server4'
CRS-2673: Attempting to stop 'ora.asm' on 'server4'
CRS-2673: Attempting to stop 'ora.evmd' on 'server4'
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'server4' succeeded
CRS-2677: Stop of 'ora.evmd' on 'server4' succeeded
CRS-2677: Stop of 'ora.asm' on 'server4' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'server4'
CRS-2677: Stop of 'ora.cssd' on 'server4' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'server4' has completed
CRS-4133: Oracle High Availability Services has been stopped.

And then I need to run the afd_configure command (all as the root user). Before and after doing so I will check for any loaded kernel modules with oracle in the name, so see what changes:

[root@server4 ~]# lsmod | grep oracle
oracleacfs           3308260  0
oracleadvm            508030  0
oracleoks             506741  2 oracleacfs,oracleadvm
[root@server4 ~]# asmcmd afd_configure
Connected to an idle instance.
AFD-627: AFD distribution files found.
AFD-636: Installing requested AFD software.
AFD-637: Loading installed AFD drivers.
AFD-9321: Creating udev for AFD.
AFD-9323: Creating module dependencies - this may take some time.
AFD-9154: Loading 'oracleafd.ko' driver.
AFD-649: Verifying AFD devices.
AFD-9156: Detecting control device '/dev/oracleafd/admin'.
AFD-638: AFD installation correctness verified.
Modifying resource dependencies - this may take some time.
[root@server4 ~]# lsmod | grep oracle
oracleafd             211540  0
oracleacfs           3308260  0
oracleadvm            508030  0
oracleoks             506741  2 oracleacfs,oracleadvm
[root@server4 ~]# asmcmd afd_state
Connected to an idle instance.
ASMCMD-9526: The AFD state is 'LOADED' and filtering is 'ENABLED' on host 'server4'

Notice the new kernel module called oracleafd. Also, AFD is showing that “filtering is enabled” – I guess this relates to the protection against invalid writes.

Time to start up HAS or CRS again:

[root@server4 ~]# crsctl start has
CRS-4123: Oracle High Availability Services has been started.

Ok, let’s start labelling those devices.

Labelling (Incorrectly)

Now remember that I am testing with two sets of devices here: 512e and 4k. The 512e devices are emulating a 512 byte blocksize, so they should result in ASM creating diskgroups with a blocksize of 512 bytes – thus avoiding all the tedious bugs from which Oracle suffers when using 4096 byte diskgroups.

So let’s just test a 512e LUN to see what happens when I label it and present it to ASM. First, the label is created using the afd_label command:

[oracle@server4 ~]$ ls -l /dev/mapper/v512data1
lrwxrwxrwx 1 root root 8 Jul 24 10:30 /dev/mapper/v512data1 -> ../dm-14
[oracle@server4 ~]$ ls -l /dev/dm-14
brw-rw---- 1 oracle dba 252, 14 Jul 24 10:30 /dev/dm-14
[oracle@server4 ~]$ asmcmd afd_label v512data1 /dev/mapper/v512data1
[oracle@server4 ~]$ asmcmd afd_lsdsk
--------------------------------------------------------------------------------
Label                     Filtering   Path
================================================================================
V512DATA1                   ENABLED   /dev/sdpz

Well, it worked.. sort of. The path we can see in the lsdsk output does not show the /dev/mapper/v512data1 multipath device I specified… instead it’s one of the non-multipath /dev/sd* devices. Why?

Even worse, look what happens when I check the SECTOR_SIZE column of the v$asm_disk view in ASM:

SQL> select group_number, name, sector_size, block_size, state
  2  from v$asm_diskgroup;

GROUP_NUMBER NAME	SECTOR_SIZE BLOCK_SIZE STATE
------------ ---------- ----------- ---------- -----------
	   1 V512DATA	       4096	  4096 MOUNTED

Even though my LUNs are presented as 512e, ASM has chosen to see them as 4096 byte. That’s not wanted I want. Gaah!

Labelling (Correctly)

To fix this I need to unlabel that LUN so that AFD has nothing under its control, then update the oracleafd_use_logical_block_size parameter via the special SYSFS files /sys/modules/oracleafd:

[root@server4 ~]# cd /sys/module/oracleafd
[root@server4 oracleafd]# ls -l
total 0
-r--r--r-- 1 root root 4096 Jul 20 14:43 coresize
drwxr-xr-x 2 root root    0 Jul 20 14:43 holders
-r--r--r-- 1 root root 4096 Jul 20 14:43 initsize
-r--r--r-- 1 root root 4096 Jul 20 14:43 initstate
drwxr-xr-x 2 root root    0 Jul 20 14:43 notes
drwxr-xr-x 2 root root    0 Jul 20 14:43 parameters
-r--r--r-- 1 root root 4096 Jul 20 14:43 refcnt
drwxr-xr-x 2 root root    0 Jul 20 14:43 sections
-r--r--r-- 1 root root 4096 Jul 20 14:43 srcversion
-r--r--r-- 1 root root 4096 Jul 20 14:43 taint
--w------- 1 root root 4096 Jul 20 14:43 uevent
[root@server4 oracleafd]# cd parameters
[root@server4 parameters]# ls -l
total 0
-rw-r--r-- 1 root root 4096 Jul 20 14:43 oracleafd_use_logical_block_size
[root@server4 parameters]# cat oracleafd_use_logical_block_size
0
[root@server4 parameters]# echo 1 > oracleafd_use_logical_block_size
[root@server4 parameters]# cat oracleafd_use_logical_block_size
1

After making this change, AFD will present the logical blocksize of 512 bytes to ASM rather than the physical blocksize of 4096 bytes. So let’s now label those disks again:

[root@server4 mapper]# for lun in 1 2 3 4 5 6 7 8; do
> asmcmd afd_label v512data$lun /dev/mapper/v512data$lun
> done
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
[root@server4 mapper]# asmcmd afd_lsdsk
Connected to an idle instance.
--------------------------------------------------------------------------------
Label                     Filtering   Path
================================================================================
V512DATA1                   ENABLED   /dev/mapper/v512data1
V512DATA2                   ENABLED   /dev/mapper/v512data2
V512DATA3                   ENABLED   /dev/mapper/v512data3
V512DATA4                   ENABLED   /dev/mapper/v512data4
V512DATA5                   ENABLED   /dev/mapper/v512data5
V512DATA6                   ENABLED   /dev/mapper/v512data6
V512DATA7                   ENABLED   /dev/mapper/v512data7
V512DATA8                   ENABLED   /dev/mapper/v512data8

Note the correct multipath devices (“/dev/mapper/*”) are now being shown in the lsdsk command output. If I now create an ASM diskgroup on these LUNs, it will have a 512 byte sector size:

SQL> get afd.sql
  1  CREATE DISKGROUP V512DATA EXTERNAL REDUNDANCY
  2  DISK 'AFD:V512DATA1', 'AFD:V512DATA2',
  3	  'AFD:V512DATA3', 'AFD:V512DATA4',
  4	  'AFD:V512DATA5', 'AFD:V512DATA6',
  5	  'AFD:V512DATA7', 'AFD:V512DATA8'
  6  ATTRIBUTE
  7	  'compatible.asm' = '12.1',
  8*	  'compatible.rdbms' = '12.1'
SQL> /

Diskgroup created.

SQL> select disk_number, mount_status, header_status, state, sector_size, path
  2  from v$asm_disk;

DISK_NUMBER MOUNT_S HEADER_STATU STATE	  SECTOR_SIZE PATH
----------- ------- ------------ -------- ----------- --------------------
	  0 CACHED  MEMBER	 NORMAL 	  512 AFD:V512DATA1
	  1 CACHED  MEMBER	 NORMAL 	  512 AFD:V512DATA2
	  2 CACHED  MEMBER	 NORMAL 	  512 AFD:V512DATA3
	  3 CACHED  MEMBER	 NORMAL 	  512 AFD:V512DATA4
	  4 CACHED  MEMBER	 NORMAL 	  512 AFD:V512DATA5
	  5 CACHED  MEMBER	 NORMAL 	  512 AFD:V512DATA6
	  6 CACHED  MEMBER	 NORMAL 	  512 AFD:V512DATA7
	  7 CACHED  MEMBER	 NORMAL 	  512 AFD:V512DATA8

8 rows selected.

SQL> select group_number, name, sector_size, block_size, state
  2  from v$asm_diskgroup;

GROUP_NUMBER NAME	SECTOR_SIZE BLOCK_SIZE STATE
------------ ---------- ----------- ---------- -----------
	   1 V512DATA		512	  4096 MOUNTED

Phew.

Failing To Label 4kN Devices

So what about my 4k native mode devices, the ones with a 4096 byte logical block size? What happens if I try to label them?

[root@server4 ~]# asmcmd afd_label V4KDATA1 /dev/mapper/v4kdata1
Connected to an idle instance.
ASMCMD-9513: ASM disk label set operation failed.

Yeah, that didn’t work out did it? Let’s look in the trace file:

[root@server4 ~]# tail -5 /u01/app/oracle/log/diag/asmcmd/user_root/server4/alert/alert.log
24-Jul-15 12:38 ASMCMD (PID = 8695) Given command - afd_label V4KDATA1 '/dev/mapper/v4kdata1'
24-Jul-15 12:38 NOTE: Verifying AFD driver state : loaded
24-Jul-15 12:38 NOTE: afdtool -add '/dev/mapper/v4kdata1' 'V4KDATA1'
24-Jul-15 12:38 NOTE:
24-Jul-15 12:38 ASMCMD-9513: ASM disk label set operation failed.

I’ve struggled to find any more meaningful message, even when I manually run the afdtool command shown in the log – but it seems pretty likely that this is failing due to the device being 4kN. I therefore assume that AFD still isn’t 4kN ready. I do wish Oracle would make some meaningful progress on its support of 4kN devices…

I/O Filter Protection

So now let’s investigate this protection that ASMFD claims to have against non-Oracle I/Os. First of all, what do those files in /dev/oracleafd/disks actually contain?

[root@server4 ~]# cd /dev/oracleafd/disks
[root@server4 disks]# ls -l
total 32
-rw-r--r-- 1 root root 22 Jul 24 12:34 V512DATA1
-rw-r--r-- 1 root root 22 Jul 24 12:34 V512DATA2
-rw-r--r-- 1 root root 22 Jul 24 12:34 V512DATA3
-rw-r--r-- 1 root root 22 Jul 24 12:34 V512DATA4
-rw-r--r-- 1 root root 22 Jul 24 12:34 V512DATA5
-rw-r--r-- 1 root root 22 Jul 24 12:34 V512DATA6
-rw-r--r-- 1 root root 22 Jul 24 12:34 V512DATA7
-rw-r--r-- 1 root root 22 Jul 24 12:34 V512DATA8
[root@server4 disks]# cat V512DATA1
/dev/mapper/v512data1

Aha. This is what I got wrong in my original post last year, because – keen as I was to start my summer vacation – I didn’t spot that these files are simply pointers to the relevant multipath device in /dev/mapper. So let’s follow the pointers this time.

Let’s remind ourselves that the files in /dev/mapper are actually symbolic links to /dev/dm-* devices:

root@server4 disks]# ls -l /dev/mapper/v512data1
lrwxrwxrwx 1 root root 8 Jul 24 12:34 /dev/mapper/v512data1 -> ../dm-14
[root@server4 disks]# ls -l /dev/dm-14
brw-rw---- 1 oracle dba 252, 14 Jul 24 12:34 /dev/dm-14

So it’s these /dev/dm-* devices that are at the end of the trail we just followed. If we dump the first 64 bytes of this /dev/dm-14 device, we should be able to see the AFD label:

[root@server4 disks]# od -c -N 64 /dev/dm-14
0000000                           (   o   u   t
0000020                                
0000040   O   R   C   L   D   I   S   K   V   5   1   2   D   A   T   A
0000060   1

There it is. We can also read it with kfed to see what ASM thinks of it:

[root@server4 ~]# kfed read /dev/dm-14
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                  1953853224 ; 0x00c: 0x74756f28
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
000000000 00000000 00000000 00000000 74756F28  [............(out]
000000010 00000000 00000000 00000000 00000000  [................]
000000020 4C43524F 4B534944 32313556 41544144  [ORCLDISKV512DATA]
000000030 00000031 00000000 00000000 00000000  [1...............]
000000040 00000000 00000000 00000000 00000000  [................]
  Repeat 251 times

So what happens if I overwrite it, as the root user, with some zeros? And maybe some text too just for good luck?

root@server4 ~]# dd if=/dev/zero of=/dev/dm-14 bs=4k count=1024
1024+0 records in
1024+0 records out
4194304 bytes (4.2 MB) copied, 0.00570833 s, 735 MB/s
[root@server4 ~]# echo CORRUPTION > /dev/dm-14
[root@server4 ~]# od -c -N 64 /dev/dm-14
0000000   C   O   R   R   U   P   T   I   O   N  \n          
0000020

It looks like it’s changed. I also see that if I dump it from another session which opens a fresh file descriptor. Yet in the /var/log/messages file there is now a new entry:

F 4626129.736/150724115533 flush-252:14[1807]  afd_mkrequest_fn: write IO on ASM managed device (major=252/minor=14)  not supported i=0 start=0 seccnt=8  pstart=0  pend=41943040
Jul 24 12:55:33 server4 kernel: quiet_error: 1015 callbacks suppressed
Jul 24 12:55:33 server4 kernel: Buffer I/O error on device dm-14, logical block 0
Jul 24 12:55:33 server4 kernel: lost page write due to I/O error on dm-14

Hmm. It seems like ASMFD has intervened to stop the write, yet when I query the device I see the “new” data. Where’s the old data gone? Well, let’s use kfed again:

[root@server4 ~]# kfed read /dev/dm-14
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                  1953853224 ; 0x00c: 0x74756f28
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
000000000 00000000 00000000 00000000 74756F28  [............(out]
000000010 00000000 00000000 00000000 00000000  [................]
000000020 4C43524F 4B534944 32313556 41544144  [ORCLDISKV512DATA]
000000030 00000031 00000000 00000000 00000000  [1...............]
000000040 00000000 00000000 00000000 00000000  [................]
  Repeat 251 times

The label is still there! Magic.

I have to confess, I don’t really know how ASM does this. Indeed, I struggled to get the system back to a point where I could manually see the label using the od command. In the end, the only way I managed it was to reboot the server – yet ASM works fine all along and the diskgroup was never affected:

SQL> alter diskgroup V512DATA check all;
Mon Jul 20 16:46:23 2015
NOTE: starting check of diskgroup V512DATA
Mon Jul 20 16:46:23 2015
GMON querying group 1 at 5 for pid 7, osid 9255
GMON checking disk 0 for group 1 at 6 for pid 7, osid 9255
GMON querying group 1 at 7 for pid 7, osid 9255
GMON checking disk 1 for group 1 at 8 for pid 7, osid 9255
GMON querying group 1 at 9 for pid 7, osid 9255
GMON checking disk 2 for group 1 at 10 for pid 7, osid 9255
GMON querying group 1 at 11 for pid 7, osid 9255
GMON checking disk 3 for group 1 at 12 for pid 7, osid 9255
GMON querying group 1 at 13 for pid 7, osid 9255
GMON checking disk 4 for group 1 at 14 for pid 7, osid 9255
GMON querying group 1 at 15 for pid 7, osid 9255
GMON checking disk 5 for group 1 at 16 for pid 7, osid 9255
GMON querying group 1 at 17 for pid 7, osid 9255
GMON checking disk 6 for group 1 at 18 for pid 7, osid 9255
GMON querying group 1 at 19 for pid 7, osid 9255
GMON checking disk 7 for group 1 at 20 for pid 7, osid 9255
Mon Jul 20 16:46:23 2015
SUCCESS: check of diskgroup V512DATA found no errors
Mon Jul 20 16:46:23 2015
SUCCESS: alter diskgroup V512DATA check all

So there you go. ASMFD: it does what it says on the tin. Just don’t try using it with 4kN devices…

Oracle 12.1.0.2 ASM Filter Driver: Advanced Format Fail

wrong-way

[Please note that a more up-to-date post on this subject can be found here]

In my previous post on the subject of the new ASM Filter Driver (AFD) feature introduced in Oracle’s 12.1.0.2 patchset, I installed the AFD to see how it fulfilled its promise that it “filters out all non-Oracle I/Os which could cause accidental overwrites“. However, because I was ten minutes away from my summer vacation at the point of finishing that post, I didn’t actually get round to writing about what happens when you try and create ASM diskgroups on the devices it presents.

Obviously I’ve spent the intervening period constantly worrying about this oversight – indeed, it was only through the judicious application of good food and drink plus some committed relaxation in the sun that I was able to pull through. However, I’m back now and it seems like time to rectify that mistake. So here goes.

Creating ASM Diskgroups with the ASM Filter Driver

It turns out I need not have worried, because it doesn’t work right now… at least, not for me. Here’s why:

First of all, I installed Oracle 12.1.0.2 Grid Infrastructure. I then labelled some block devices presented from my Violin storage array. As I’ve already pasted all the output from those two steps in the previous post, I won’t repeat myself.

The next step is therefore to create a diskgroup. Since I’ve only just come back from holiday and so I’m still half brain-dead, I’ll choose the simple route and fire up the ASM Configuration Assistant (ASMCA) so that I don’t have to look up any of that nasty SQL. Here goes:

afd_create

But guess what happened when I hit the OK button? It failed, bigtime. Here’s the alert log – if you don’t like huge amounts of meaningless text I suggest you skip down… a lot… (although thinking about it, my entire blog could be described as meaningless text):

SQL> CREATE DISKGROUP DATA EXTERNAL REDUNDANCY  DISK 'AFD:DATA1' SIZE 72704M ,
'AFD:DATA2' SIZE 72704M ,
'AFD:DATA3' SIZE 72704M ,
'AFD:DATA4' SIZE 72704M ,
'AFD:DATA5' SIZE 72704M ,
'AFD:DATA6' SIZE 72704M ,
'AFD:DATA7' SIZE 72704M ,
'AFD:DATA8' SIZE 72704M  ATTRIBUTE 'compatible.asm'='12.1.0.0.0','au_size'='1M' /* ASMCA */
Fri Jul 25 16:25:33 2014
WARNING: Library 'AFD Library - Generic , version 3 (KABI_V3)' does not support advanced format disks
Fri Jul 25 16:25:33 2014
NOTE: Assigning number (1,0) to disk (AFD:DATA1)
NOTE: Assigning number (1,1) to disk (AFD:DATA2)
NOTE: Assigning number (1,2) to disk (AFD:DATA3)
NOTE: Assigning number (1,3) to disk (AFD:DATA4)
NOTE: Assigning number (1,4) to disk (AFD:DATA5)
NOTE: Assigning number (1,5) to disk (AFD:DATA6)
NOTE: Assigning number (1,6) to disk (AFD:DATA7)
NOTE: Assigning number (1,7) to disk (AFD:DATA8)
NOTE: initializing header (replicated) on grp 1 disk DATA1
NOTE: initializing header (replicated) on grp 1 disk DATA2
NOTE: initializing header (replicated) on grp 1 disk DATA3
NOTE: initializing header (replicated) on grp 1 disk DATA4
NOTE: initializing header (replicated) on grp 1 disk DATA5
NOTE: initializing header (replicated) on grp 1 disk DATA6
NOTE: initializing header (replicated) on grp 1 disk DATA7
NOTE: initializing header (replicated) on grp 1 disk DATA8
NOTE: initializing header on grp 1 disk DATA1
NOTE: initializing header on grp 1 disk DATA2
NOTE: initializing header on grp 1 disk DATA3
NOTE: initializing header on grp 1 disk DATA4
NOTE: initializing header on grp 1 disk DATA5
NOTE: initializing header on grp 1 disk DATA6
NOTE: initializing header on grp 1 disk DATA7
NOTE: initializing header on grp 1 disk DATA8
NOTE: Disk 0 in group 1 is assigned fgnum=1
NOTE: Disk 1 in group 1 is assigned fgnum=2
NOTE: Disk 2 in group 1 is assigned fgnum=3
NOTE: Disk 3 in group 1 is assigned fgnum=4
NOTE: Disk 4 in group 1 is assigned fgnum=5
NOTE: Disk 5 in group 1 is assigned fgnum=6
NOTE: Disk 6 in group 1 is assigned fgnum=7
NOTE: Disk 7 in group 1 is assigned fgnum=8
NOTE: initiating PST update: grp = 1
Fri Jul 25 16:25:33 2014
GMON updating group 1 at 1 for pid 7, osid 16745
NOTE: group DATA: initial PST location: disk 0000 (PST copy 0)
NOTE: set version 1 for asmCompat 12.1.0.0.0
Fri Jul 25 16:25:33 2014
NOTE: PST update grp = 1 completed successfully
NOTE: cache registered group DATA 1/0xD9B6AE8D
NOTE: cache began mount (first) of group DATA 1/0xD9B6AE8D
NOTE: cache is mounting group DATA created on 2014/07/25 16:25:33
NOTE: cache opening disk 0 of grp 1: DATA1 label:DATA1
NOTE: cache opening disk 1 of grp 1: DATA2 label:DATA2
NOTE: cache opening disk 2 of grp 1: DATA3 label:DATA3
NOTE: cache opening disk 3 of grp 1: DATA4 label:DATA4
NOTE: cache opening disk 4 of grp 1: DATA5 label:DATA5
NOTE: cache opening disk 5 of grp 1: DATA6 label:DATA6
NOTE: cache opening disk 6 of grp 1: DATA7 label:DATA7
NOTE: cache opening disk 7 of grp 1: DATA8 label:DATA8
NOTE: cache creating group 1/0xD9B6AE8D (DATA)
NOTE: cache mounting group 1/0xD9B6AE8D (DATA) succeeded
WARNING: cache read a corrupt block: group=1(DATA) dsk=0 blk=1 disk=0 (DATA1) incarn=3493224069 au=0 blk=1 count=1
Fri Jul 25 16:25:33 2014
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM/trace/+ASM_ora_16745.trc:
ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [2147483648] [1] [0 != 1]
NOTE: a corrupted block from group DATA was dumped to /u01/app/oracle/diag/asm/+asm/+ASM/trace/+ASM_ora_16745.trc
WARNING: cache read (retry) a corrupt block: group=1(DATA) dsk=0 blk=1 disk=0 (DATA1) incarn=3493224069 au=0 blk=1 count=1
Fri Jul 25 16:25:33 2014
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM/trace/+ASM_ora_16745.trc:
ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [2147483648] [1] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [2147483648] [1] [0 != 1]
WARNING: cache read (retry) a corrupt block: group=1(DATA) dsk=0 blk=1 disk=0 (DATA1) incarn=3493224069 au=11 blk=1 count=1
Fri Jul 25 16:25:33 2014
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM/trace/+ASM_ora_16745.trc:
ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [2147483648] [1] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [2147483648] [1] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [2147483648] [1] [0 != 1]
NOTE: a corrupted block from group DATA was dumped to /u01/app/oracle/diag/asm/+asm/+ASM/trace/+ASM_ora_16745.trc
WARNING: cache read (retry) a corrupt block: group=1(DATA) dsk=0 blk=1 disk=0 (DATA1) incarn=3493224069 au=11 blk=1 count=1
Fri Jul 25 16:25:33 2014
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM/trace/+ASM_ora_16745.trc:
ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [2147483648] [1] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [2147483648] [1] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [2147483648] [1] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [2147483648] [1] [0 != 1]
ERROR: cache failed to read group=1(DATA) dsk=0 blk=1 from disk(s): 0(DATA1) 0(DATA1)
ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [2147483648] [1] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [2147483648] [1] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [2147483648] [1] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [2147483648] [1] [0 != 1]

NOTE: cache initiating offline of disk 0 group DATA
NOTE: process _user16745_+asm (16745) initiating offline of disk 0.3493224069 (DATA1) with mask 0x7e in group 1 (DATA) with client assisting
NOTE: initiating PST update: grp 1 (DATA), dsk = 0/0xd0365e85, mask = 0x6a, op = clear
Fri Jul 25 16:25:34 2014
GMON updating disk modes for group 1 at 2 for pid 7, osid 16745
ERROR: disk 0(DATA1) in group 1(DATA) cannot be offlined because the disk group has external redundancy.
Fri Jul 25 16:25:34 2014
ERROR: too many offline disks in PST (grp 1)
Fri Jul 25 16:25:34 2014
ERROR: no read quorum in group: required 1, found 0 disks
ERROR: Could not read PST for grp 1. Force dismounting the disk group.
Fri Jul 25 16:25:34 2014
NOTE: halting all I/Os to diskgroup 1 (DATA)
Fri Jul 25 16:25:34 2014
ERROR: no read quorum in group: required 1, found 0 disks
ASM Health Checker found 1 new failures
Fri Jul 25 16:25:36 2014
ERROR: no read quorum in group: required 1, found 0 disks
Fri Jul 25 16:25:36 2014
ERROR: Could not read PST for grp 1. Force dismounting the disk group.
Fri Jul 25 16:25:36 2014
ERROR: no read quorum in group: required 1, found 0 disks
ERROR: Could not read PST for grp 1. Force dismounting the disk group.
Fri Jul 25 16:25:36 2014
ERROR: no read quorum in group: required 1, found 0 disks
ERROR: Could not read PST for grp 1. Force dismounting the disk group.
Fri Jul 25 16:25:37 2014
NOTE: AMDU dump of disk group DATA initiated at /u01/app/oracle/diag/asm/+asm/+ASM/trace
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM/trace/+ASM_ora_16745.trc  (incident=3257):
ORA-15335: ASM metadata corruption detected in disk group 'DATA'
ORA-15130: diskgroup "DATA" is being dismounted
ORA-15066: offlining disk "DATA1" in group "DATA" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [2147483648] [1] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [2147483648] [1] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [2147483648] [1] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [2147483648] [1] [0 != 1]
Incident details in: /u01/app/oracle/diag/asm/+asm/+ASM/incident/incdir_3257/+ASM_ora_16745_i3257.trc
Fri Jul 25 16:25:37 2014
Sweep [inc][3257]: completed
Fri Jul 25 16:25:37 2014
SQL> alter diskgroup DATA check
System State dumped to trace file /u01/app/oracle/diag/asm/+asm/+ASM/incident/incdir_3257/+ASM_ora_16745_i3257.trc
NOTE: erasing header (replicated) on grp 1 disk DATA1
NOTE: erasing header (replicated) on grp 1 disk DATA2
NOTE: erasing header (replicated) on grp 1 disk DATA3
NOTE: erasing header (replicated) on grp 1 disk DATA4
NOTE: erasing header (replicated) on grp 1 disk DATA5
NOTE: erasing header (replicated) on grp 1 disk DATA6
NOTE: erasing header (replicated) on grp 1 disk DATA7
NOTE: erasing header (replicated) on grp 1 disk DATA8
NOTE: erasing header on grp 1 disk DATA1
NOTE: erasing header on grp 1 disk DATA2
NOTE: erasing header on grp 1 disk DATA3
NOTE: erasing header on grp 1 disk DATA4
NOTE: erasing header on grp 1 disk DATA5
NOTE: erasing header on grp 1 disk DATA6
NOTE: erasing header on grp 1 disk DATA7
NOTE: erasing header on grp 1 disk DATA8
Fri Jul 25 16:25:37 2014
NOTE: cache dismounting (clean) group 1/0xD9B6AE8D (DATA)
NOTE: messaging CKPT to quiesce pins Unix process pid: 16745, image: oracle@server3.local (TNS V1-V3)
NOTE: dbwr not being msg'd to dismount
NOTE: LGWR not being messaged to dismount
NOTE: cache dismounted group 1/0xD9B6AE8D (DATA)
NOTE: cache ending mount (fail) of group DATA number=1 incarn=0xd9b6ae8d
NOTE: cache deleting context for group DATA 1/0xd9b6ae8d
Fri Jul 25 16:25:37 2014
GMON dismounting group 1 at 3 for pid 7, osid 16745
Fri Jul 25 16:25:37 2014
NOTE: Disk DATA1 in mode 0x7f marked for de-assignment
NOTE: Disk DATA2 in mode 0x7f marked for de-assignment
NOTE: Disk DATA3 in mode 0x7f marked for de-assignment
NOTE: Disk DATA4 in mode 0x7f marked for de-assignment
NOTE: Disk DATA5 in mode 0x7f marked for de-assignment
NOTE: Disk DATA6 in mode 0x7f marked for de-assignment
NOTE: Disk DATA7 in mode 0x7f marked for de-assignment
NOTE: Disk DATA8 in mode 0x7f marked for de-assignment
ERROR: diskgroup DATA was not created
ORA-15018: diskgroup cannot be created
ORA-15335: ASM metadata corruption detected in disk group 'DATA'
ORA-15130: diskgroup "DATA" is being dismounted
Fri Jul 25 16:25:37 2014
ORA-15032: not all alterations performed
ORA-15066: offlining disk "DATA1" in group "DATA" may result in a data loss
ORA-15001: diskgroup "DATA" does not exist or is not mounted
ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [2147483648] [1] [0 != 1]

Now then. First of all, thanks for making it this far – I promise not to do that again in this post. Secondly, in case you really did just hit page down *a lot* you might want to skip back up and look for the bits I’ve conveniently highlighted in red. Specifically, this bit:

WARNING: Library 'AFD Library - Generic , version 3 (KABI_V3)' does not support advanced format disks

Many modern storage platforms use Advanced Format – if you want to know what that means, read here. The idea that AFD doesn’t support advanced format is somewhat alarming – and indeed incorrect, according to interactions I have subsequently had with Oracle’s ASM Product Management people. From what I understand, the problem is tracked as bug 19297177 (currently unpublished) and is caused by AFD incorrectly checking the physical blocksize of the storage device (4k) instead of the logical block size (which was 512 bytes). I currently have a request open with Oracle Support for the patch, so when that arrives I will re-test and add another blog article.

Until then, I guess I might as well take another well-earned vacation?

Oracle 12.1.0.2 ASM Filter Driver: First Impressions

This is a very quick post, because I’m about to log off and take an extended summer holiday (or vacation as my crazy American friends call it… but then they call football  “soccer” too). Before I go, I wanted to document my initial findings with the new ASM Filter Driver feature introduced in this week’s 12.1.0.2 patchset. [For a more recent post on this topic, read here]

Currently a Linux-only feature, the ASM Filter Driver (or AFD) is a replacement for ASMLib and is described by Oracle as follows:

Oracle ASM Filter Driver (Oracle ASMFD) is a kernel module that resides in the I/O path of the Oracle ASM disks. Oracle ASM uses the filter driver to validate write I/O requests to Oracle ASM disks.

The Oracle ASMFD simplifies the configuration and management of disk devices by eliminating the need to rebind disk devices used with Oracle ASM each time the system is restarted.

The Oracle ASM Filter Driver rejects any I/O requests that are invalid. This action eliminates accidental overwrites of Oracle ASM disks that would cause corruption in the disks and files within the disk group. For example, the Oracle ASM Filter Driver filters out all non-Oracle I/Os which could cause accidental overwrites.

Interesting, eh? So let’s find out how that works.

Installation

I found this a real pain as you need to have 12.1.0.2 installed before the AFD is available to label your disks, yet the default OUI mode wants to create an ASM diskgroup… and you cannot do that without any labelled disks.

The only solution I could come up with was to perform a software-only install, which in itself is a pain. I’ll skip the numerous screenshots of that part though and just skip straight to the bit where I have 12.1.0.2 Grid Infrastructure installed.

I’m following these instructions because I am using a single-instance Oracle Restart system rather than a true cluster.

First of all we need to do this:

[oracle@server3 ~]$ $ORACLE_HOME/bin/asmcmd dsset 'AFD:*'

[oracle@server3 ~]$ $ORACLE_HOME/bin/asmcmd dsget
parameter:AFD:*
profile:AFD:*
[oracle@server3 ~]$ srvctl config asm
ASM home: 
Password file:
ASM listener: LISTENER
Spfile: /u01/app/oracle/admin/+ASM/pfile/spfile+ASM.ora
ASM diskgroup discovery string: AFD:*

Then we need to stop HAS and run the AFD_CONFIGURE command:

[root@server3 ~]# $ORACLE_HOME/bin/crsctl stop has -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'server3'
CRS-2673: Attempting to stop 'ora.asm' on 'server3'
CRS-2673: Attempting to stop 'ora.evmd' on 'server3'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'server3'
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'server3' succeeded
CRS-2677: Stop of 'ora.evmd' on 'server3' succeeded
CRS-2677: Stop of 'ora.asm' on 'server3' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'server3'
CRS-2677: Stop of 'ora.cssd' on 'server3' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'server3' has completed
CRS-4133: Oracle High Availability Services has been stopped.

[root@server3 ~]# $ORACLE_HOME/bin/asmcmd afd_configure
Connected to an idle instance.
AFD-627: AFD distribution files found.
AFD-636: Installing requested AFD software.
AFD-637: Loading installed AFD drivers.
AFD-9321: Creating udev for AFD.
AFD-9323: Creating module dependencies - this may take some time.
AFD-9154: Loading 'oracleafd.ko' driver.
AFD-649: Verifying AFD devices.
AFD-9156: Detecting control device '/dev/oracleafd/admin'.
AFD-638: AFD installation correctness verified.
Modifying resource dependencies - this may take some time.
ASMCMD-9524: AFD configuration failed 'ERROR: OHASD start failed'

Er… that’s not really what I had in mind. But hey, let’s carry on regardless:

[root@server3 oracleafd]# $ORACLE_HOME/bin/asmcmd afd_state
Connected to an idle instance.
ASMCMD-9526: The AFD state is 'LOADED' and filtering is 'DEFAULT' on host 'server3.local'

[root@server3 oracleafd]# $ORACLE_HOME/bin/crsctl start has
CRS-4123: Oracle High Availability Services has been started.

Ok it seems to be working. I wonder what it’s done?

Investigation

The first thing I notice is some Oracle kernel modules have been loaded:

[root@server3 ~]# lsmod | grep ora
oracleafd             208499  1
oracleacfs           3307969  0
oracleadvm            506254  0
oracleoks             505749  2 oracleacfs,oracleadvm

I also see that, just like ASMLib, a driver has been plonked into the /opt/oracle/extapi directory:

[root@server3 1]# find /opt/oracle/extapi -ls
2752765    4 drwxr-xr-x   3 root     root         4096 Jul 25 15:15 /opt/oracle/extapi
2752766    4 drwxr-xr-x   3 root     root         4096 Jul 25 15:15 /opt/oracle/extapi/64
2753508    4 drwxr-xr-x   3 root     root         4096 Jul 25 15:15 /opt/oracle/extapi/64/asm
2756532    4 drwxr-xr-x   3 root     root         4096 Jul 25 15:15 /opt/oracle/extapi/64/asm/orcl
2756562    4 drwxr-xr-x   2 root     root         4096 Jul 25 15:15 /opt/oracle/extapi/64/asm/orcl/1
2756578  268 -rwxr-xr-x   1 oracle   dba        272513 Jul 25 15:15 /opt/oracle/extapi/64/asm/orcl/1/libafd12.so

And again, just like ASMLib, there is a new directory under /dev called /dev/oracleafd (whereas for ASMLib it’s called /dev/oracleasm):

[root@server3 ~]# ls -la /dev/oracleafd/
total 0
drwxrwx---  3 oracle dba      80 Jul 25 15:15 .
drwxr-xr-x 21 root   root  15820 Jul 25 15:15 ..
brwxrwx---  1 oracle dba  249, 0 Jul 25 15:15 admin
drwxrwx---  2 oracle dba      40 Jul 25 15:15 disks

The disks directory is currently empty. Maybe I should create some AFD devices and see what happens?

Labelling

So let’s look at my Violin devices and see if I can label them:

root@server3 mapper]# ls -l /dev/mapper
total 0
crw-rw---- 1 root root 10, 236 Jul 11 16:52 control
lrwxrwxrwx 1 root root       7 Jul 25 15:49 data1 -> ../dm-3
lrwxrwxrwx 1 root root       7 Jul 25 15:49 data2 -> ../dm-4
lrwxrwxrwx 1 root root       7 Jul 25 15:49 data3 -> ../dm-5
lrwxrwxrwx 1 root root       7 Jul 25 15:49 data4 -> ../dm-6
lrwxrwxrwx 1 root root       7 Jul 25 15:49 data5 -> ../dm-7
lrwxrwxrwx 1 root root       7 Jul 25 15:49 data6 -> ../dm-8
lrwxrwxrwx 1 root root       7 Jul 25 15:49 data7 -> ../dm-9
lrwxrwxrwx 1 root root       8 Jul 25 15:49 data8 -> ../dm-10
lrwxrwxrwx 1 root root       7 Jul 11 16:53 VolGroup-lv_home -> ../dm-2
lrwxrwxrwx 1 root root       7 Jul 11 16:53 VolGroup-lv_root -> ../dm-0
lrwxrwxrwx 1 root root       7 Jul 11 16:52 VolGroup-lv_swap -> ../dm-1

The documentation appears to be incorrect here, when it says to use the command $ORACLE_HOME/bin/afd_label. It’s actually $ORACLE_HOME/bin/asmcmd with the first parameter afd_label. I’m going to label the devices called /dev/mapper/data*:

[root@server3 mapper]# for lun in 1 2 3 4 5 6 7 8; do
> asmcmd afd_label DATA$lun /dev/mapper/data$lun
> done
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.

root@server3 mapper]# asmcmd afd_lsdsk
Connected to an idle instance.
--------------------------------------------------------------------------------
Label                     Filtering   Path
================================================================================
DATA1                       ENABLED   /dev/mapper/data1
DATA2                       ENABLED   /dev/mapper/data2
DATA3                       ENABLED   /dev/mapper/data3
DATA4                       ENABLED   /dev/mapper/data4
DATA5                       ENABLED   /dev/mapper/data5
DATA6                       ENABLED   /dev/mapper/data6
DATA7                       ENABLED   /dev/mapper/data7
DATA8                       ENABLED   /dev/mapper/data8

That seemed to work ok. So what’s going on in the /dev/oracleafd/disks directory now?

[root@server3 ~]# ls -l /dev/oracleafd/disks/
total 32
-rw-r--r-- 1 root root 26 Jul 25 15:52 DATA1
-rw-r--r-- 1 root root 26 Jul 25 15:49 DATA2
-rw-r--r-- 1 root root 26 Jul 25 15:49 DATA3
-rw-r--r-- 1 root root 26 Jul 25 15:49 DATA4
-rw-r--r-- 1 root root 26 Jul 25 15:49 DATA5
-rw-r--r-- 1 root root 26 Jul 25 15:49 DATA6
-rw-r--r-- 1 root root 26 Jul 25 15:49 DATA7
-rw-r--r-- 1 root root 26 Jul 25 15:49 DATA8

There they are, just like with ASMLib. But look at the permissions, they are all owned by root with read-only privs for other users. In an ASMLib environment these devices are owned by oracle:dba, which means non-Oracle processes can write to them and corrupt them in some situations. Is this how Oracle claims the AFD protects devices?

I haven’t had time to investigate further but I assume that the database will access the devices via this mysterious block device:

[oracle@server3 oracleafd]$ ls -l /dev/oracleafd/admin
brwxrwx--- 1 oracle dba 249, 0 Jul 25 16:25 /dev/oracleafd/admin

It will be interesting to find out.

Distruction

Of course, if you are logged in as root you aren’t going to be protected from any crazy behaviour:

[root@server3 ~]# cd /dev/oracleafd/disks
[root@server3 disks]# ls -l
total 496
-rw-r--r-- 1 root root 475877 Jul 25 16:40 DATA1
-rw-r--r-- 1 root root     26 Jul 25 15:49 DATA2
-rw-r--r-- 1 root root     26 Jul 25 15:49 DATA3
-rw-r--r-- 1 root root     26 Jul 25 15:49 DATA4
-rw-r--r-- 1 root root     26 Jul 25 15:49 DATA5
-rw-r--r-- 1 root root     26 Jul 25 15:49 DATA6
-rw-r--r-- 1 root root     26 Jul 25 15:49 DATA7
-rw-r--r-- 1 root root     26 Jul 25 15:49 DATA8
[root@server3 disks]# od -c -N 256 DATA8
0000000   /   d   e   v   /   m   a   p   p   e   r   /   d   a   t   a
0000020   8  \n
0000032
[root@server3 disks]# dmesg >> DATA8
[root@server3 disks]# od -c -N 256 DATA8
0000000   /   d   e   v   /   m   a   p   p   e   r   /   d   a   t   a
0000020   8   \n   z   r   d   b   t   e   2  l   I   n   i   t   i   a
0000040   l   i   z   i   n   g       c   g   r   o   u   p       s   u
0000060   b   s   y   s       c   p   u   s   e   t  \n   I   n   i   t
0000100   i   a   l   i   z   i   n   g       c   g   r   o   u   p
0000120   s   u   b   s   y   s       c   p   u  \n   L   i   n   u   x
0000140       v   e   r   s   i   o   n       3   .   8   .   1   3   -
0000160   2   6   .   2   .   3   .   e   l   6   u   e   k   .   x   8
0000200   6   _   6   4       (   m   o   c   k   b   u   i   l   d   @
0000220   c   a   -   b   u   i   l   d   4   4   .   u   s   .   o   r
0000240   a   c   l   e   .   c   o   m   )       (   g   c   c       v
0000260   e   r   s   i   o   n       4   .   4   .   7       2   0   1
0000300   2   0   3   1   3       (   R   e   d       H   a   t       4
0000320   .   4   .   7   -   3   )       (   G   C   C   )       )
0000340   #   2       S   M   P       W   e   d       A   p   r       1
0000360   6       0   2   :   5   1   :   1   0       P   D   T       2
0000400

Proof, if ever you need it, that root access is still the fastest and easiest route to total disaster…

[Update July 2015: Ok, so look. I was wrong in this post – these /dev/oracleafd/disks devices are simply pointers to devices in /dev/dm-* and thus I was only overwriting the pointer. To read a more accurate post on the subject, please read here]

New My Oracle Support note on Advanced Format (4k) storage

advanced-format-logo

In the past I have been a little critical of Oracle’s support notes and documentation regarding the use of Advanced Format 4k storage devices. I must now take that back, as my new friends in Oracle ASM Development and Product Management very kindly offered to let me write a new support note, which they have just published on My Oracle Support. It’s only supposed to be high level, but it does confirm that the _DISK_SECTOR_SIZE_OVERRIDE parameter can be safely set in database instances when using 512e storage and attempting to create 4k online redo logs.

The new support note is:

Using 4k Redo Logs on Flash and SSD-based Storage (Doc ID 1681266.1)

Don’t forget that you can read all about the basics of using Oracle with 4k sector storage here. And if you really feel up to it, I have a 4k deep dive page here.

The Ultimate Guide To Oracle with Advanced Format 4k

fud

It’s a brave thing, calling something the “Ultimate Guide To …” as it can leave you open to criticism that it’s anything but. However, this topic – of how Oracle runs on Advanced Format storage systems and which choices have which consequences – is one I’ve been learning for two years now, so this really is everything I know. And from my desperate searching of the internet, plus discussions with people who are usually much knowledgeable than me, I’ve come to the conclusion that nobody else really understands it.

In fact, you could say that it’s a topic full of confusion – and if you browsed the support notes on My Oracle Support you’d definitely come to that conclusion. Part of that confusion is unfortunately FUD, spread by storage vendors who do not (yet) support Advanced Format and therefore benefit from scaring customers away from it. Be under no illusions, with the likes of Western DigitalHGST and Seagate all signed up to Advanced Format, plus Violin Memory and EMC’s XtremIO both using it, it’s something you should embrace rather than avoid.

However, to try and lessen the opportunity for those competitors to point and say “Look how complicated it is!”, I’ve split my previous knowledge repository into two: a high-level page and an Oracle on 4k deep dive. It’s taken me years to work all this stuff out – and days to write it all down, so I sincerely hope it saves someone else a lot of time and effort…!

Advanced Format with 4k Sectors

Advanced Format: Oracle on 4k Deep Dive

More Problems with Oracle’s Support of 4k Devices

This is going to be another one of those posts, a bit like this one, that discuss the use of Oracle’s database product with Advanced Format devices. I wish there weren’t so many of these posts, but it seems that Oracle has a lot of issues with it’s implementation of 4k support.

(Before reading on, if you aren’t sure what I’m talking about here then please have a read of this page…)

In the last post I built a database which used Oracle ASM (and the Linux ASMLib kernel driver) but found that if the database used an SPFILE which was located on a 4k device (within an ASM diskgroup) it didn’t work. Today, I’m going to forego ASM and use a filesystem instead (something I would never do in real life).

Building a 4k Filesystem

Let’s start with a single 4k LUN being presented from my Violin array. I’ve already configured the Linux device mapper multipathing so that it presents itself as a nicely-named device in the /dev/mapper directory:

[oracle@half-server4 ~]$ ls -l /dev/mapper/fs4ktest 
lrwxrwxrwx 1 root root 7 Feb 25 15:53 /dev/mapper/fs4ktest -> ../dm-7
[oracle@half-server4 ~]$ fdisk -l /dev/mapper/fs4ktest

Note: sector size is 4096 (not 512)

Disk /dev/mapper/fs4ktest: 215.8 GB, 215822106624 bytes
255 heads, 63 sectors/track, 3279 cylinders
Units = cylinders of 16065 * 4096 = 65802240 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 524288 bytes

We can see that this is indeed a 4k device, i.e. it has not only a 4096 byte physical blocksize, but a 4096 byte logical blocksize too. The fdisk command has even taken the time to print a special “Note” to ensure we see the sector size is not the usual 512 bytes. The next thing to do is format it with a filesystem so I’m going to use ext4:

[root@half-server4 ~]# mkfs.ext4 /dev/mapper/fs4test 
mke2fs 1.41.12 (17-May-2010)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=1 blocks, Stripe width=128 blocks
2097152 inodes, 8388608 blocks
419430 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
256 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
	4096000, 7962624

Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

Now it needs to be mounted. I’m just going to stick it on a mount point in a new top-level directory called /fstest:

[root@half-server4 ~]# mkdir -p /fstest/fs4ktest
[root@half-server4 ~]# chown -R oracle:oinstall /fstest
[root@half-server4 ~]# mount /dev/mapper/fs4ktest /fstest/fs4ktest

Finally, we I am going to create an Oracle database using this filesystem. I’m not going to cut and paste all the output for that, because it’s all a bit dull… so let’s just skip to the bit where DBCA has completed and the database is open.

Oracle On A 4k Filesystem

So the good news is, it worked. The database is up and running and the datafiles are located on the 4k filesystem:

[oracle@half-server4 fstest]$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.1.0 Production on Mon Mar 3 21:30:02 2014
Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> select name from v$datafile;

NAME
---------------------------------------------------------------------------------------
/fstest/fs4ktest/oracle/oradata/FSTEST/datafile/o1_mf_system_9k9wxmw6_.dbf
/fstest/fs4ktest/oracle/oradata/FSTEST/datafile/o1_mf_sysaux_9k9ww6r6_.dbf
/fstest/fs4ktest/oracle/oradata/FSTEST/datafile/o1_mf_undotbs1_9k9wz257_.dbf
/fstest/fs4ktest/oracle/oradata/FSTEST/datafile/o1_mf_users_9k9wz123_.dbf

Cool. We can end this post here then, right? Well, no… because there is a bit of a problem with this database. Let’s just have a quick check of the FILESYSTEMIO_OPTIONS parameter:

SQL> show parameter filesystem

NAME				     TYPE	 VALUE
------------------------------------ ----------- ------------------------------
filesystemio_options		     string	 none

This parameter controls the way that I/O is performed for files located on filesystems. It isn’t relevant for databases using Oracle ASM (for which the DISK_ASYNCH_IO parameter exists instead), but here it’s making a massive difference. According to the Oracle documentation, it has four possible options:

  • ASYNCH: enable asynchronous I/O on file system files, which has no timing requirement for transmission.
  • DIRECTIO: enable direct I/O on file system files, which bypasses the buffer cache.
  • SETALL: enable both asynchronous and direct I/O on file system files.
  • NONE: disable both asynchronous and direct I/O on file system files.

Normally, when I see filesystem-based databases, I find this parameter set to SETALL. This means asynchronous and direct I/O, but here it is set to NONE which means neither. And it’s the DIRECTIO that we are interested in.

One Buffer Cache Is Enough

As you are no doubt aware, Oracle databases have a buffer cache which is used to cache copies of database blocks. However, the Linux operating system also has its own buffer cache for filesystems. Most people would consider it ineffective to use two levels of cache – and if that is the case, it will obviously be the Oracle buffer cache that needs to be used. So let’s set the parameter to use direct I/O and then restart the database (as the parameter is not dynamic):

SQL> alter system set filesystemio_options='directIO' scope=spfile;

System altered.

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.

Total System Global Area 1.3896E+10 bytes
Fixed Size		    4663568 bytes
Variable Size		 2751465200 bytes
Database Buffers	 1.1107E+10 bytes
Redo Buffers		   33673216 bytes
ORA-00205: error in identifying control file, check alert log for more info

Oh dear. What happened?

[oracle@half-server4 ~]$ tail /u01/app/oracle/diag/rdbms/fstest/fstest/trace/alert_fstest.log 
ORA-00210: cannot open the specified control file
ORA-00202: control file: '/fstest/fs4ktest/oracle/oradata/FSTEST/controlfile/o1_mf_9k9wzp31_.ctl'
ORA-27047: unable to read the header block of file
Linux-x86_64 Error: 22: Invalid argument
Additional information: 1
ORA-205 signalled during: ALTER DATABASE   MOUNT...

The answer, which you can find in My Oracle Support note 1133713.1, is that Oracle does not support 4k devices with direct I/O. This has been the case for a long time – I remember first discovering this nearly two years ago, on 11.2.0.2, yet there is no sign of it being fixed. According to the note, “It is not yet known in which version this support will be available.” Pah.

There’s More: Diagnostic Destination on 4k

And then there’s the diagnostic destination. How about if I choose to locate this on a 4k filesystem?

SQL> show parameter diagnostic_dest

NAME				     TYPE	 VALUE
------------------------------------ ----------- ------------------------------
diagnostic_dest 		     string	 /u01/app/oracle

SQL> alter system set diagnostic_dest='/fstest/fs4ktest/oracle' scope=spfile;

System altered.

I’ll give it a few minutes and then go and look in some of the files… guess what I see?

ORA-48101: error encountered when attempting to read a file [block] [/fstest/fs4ktest/oracle/diag/rdbms/fstest/fstest/metadata/INCIDENT.ams] [0]
ORA-27072: File I/O error
Linux-x86_64 Error: 22: Invalid argument
Additional information: 4
Additional information: 1
Additional information: -1

Look familiar? (This is nothing to do with direct I/O by the way, I disabled that again before this test.)

So let’s be honest, things aren’t going all that well here. There are still a lot of things that do not appear to work properly when using 4k devices. Luckily, my Violin array can present storage as 512 byte to avoid this sort of issue, but really I feel that Oracle needs to get cracking on its Advanced Format support. This is not just a flash memory thing, pretty much every major disk vendor is making Advanced Format devices now from Western Digital, through HGST to Seagate.

Time to get with the programme?

Oracle ASMLib: Physical and Logical Blocksize

This article is about the use of Advanced Format devices on Oracle’s ASMLib kernel library for Linux. For background, read this page on 4k sector sizes first, otherwise it might all sound like nonsense. Mind you, it mind sound like nonsense anyway, I can’t guarantee anything here. By the way, a big hello to my buddy Nate who asked for this information: you rock, dude.

In more recent versions of ASMLib, Oracle introduced a new parameter into the /etc/sysconfig/oracleasm file:

[root@half-server4 mapper]# tail -5 /etc/sysconfig/oracleasm
# ORACLEASM_USE_LOGICAL_BLOCK_SIZE: 'true' means use the logical block size
# reported by the underlying disk instead of the physical. The default
# is 'false'
ORACLEASM_USE_LOGICAL_BLOCK_SIZE=false

To understand what this parameter does, consider this device which I am presenting from a Violin array:

[root@half-server4 ~]# ls -l /dev/mapper/testlun
lrwxrwxrwx 1 root root 8 Feb 27 15:33 /dev/mapper/testlun -> ../dm-19
[root@half-server4 ~]# fdisk -l /dev/mapper/testlun

Disk /dev/mapper/testlun: 34.4 GB, 34359738368 bytes
255 heads, 63 sectors/track, 4177 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 524288 bytes

The important bit there is highlighted in red. This device has a 4k physical blocksize (as all Violin devices do, as well as many other modern storage systems) but has a 512 byte logical blocksize. Essentially, this LUN is pretending to be a 512 byte based.

Now that’s all well and good. Operating systems and applications that cannot support 4k block sizes (e.g. Red Hat 5 and Oracle Linux 5) will happily use this, because they believe it to be 512 byte. But later versions of ASMLib have started being too clever for their own good.

Don’t Look Behind The Curtain

Let’s create an ASMLib label on this device:

root@half-server4 ~]# oracleasm createdisk TESTLUN /dev/mapper/testlun 
Writing disk header: done
Instantiating disk: done

And now we can attempt to put an ASM diskgroup on it:

SQL> CREATE DISKGROUP TEST EXTERNAL REDUNDANCY
DISK 'ORCL:TESTLUN'
ATTRIBUTE
     'sector_size'='512',
     'compatible.asm' = '11.2',
     'compatible.rdbms' = '11.2';  
CREATE DISKGROUP TEST EXTERNAL REDUNDANCY
*
ERROR at line 1:
ORA-15018: diskgroup cannot be created
ORA-15038: disk '' mismatch on 'Sector Size' with target disk group [4096]
[512]

What happened? Well, ASMLib has looked behind the smoke and mirrors and decided that this is actually a 4k device. It’s therefore presenting this to Oracle ASM as 4k, which causes the problem (because I explicitly asked for sector size to be 512 byte on this diskgroup).

One possible solution is to change the ASM_DISKSTRING from it’s default value of NULL (meaning ‘ORCL:*’) to ‘/dev/oracleasm/disks/*’, i.e. the location where ASMLib creates its own block devices. We can test this theory with fdisk:

[oracle@half-server4 ~]$ ls -l /dev/oracleasm/disks/TESTLUN 
brw-rw---- 1 oracle dba 252, 19 Feb 27 15:38 /dev/oracleasm/disks/TESTLUN
[oracle@half-server4 ~]$ fdisk -l /dev/oracleasm/disks/TESTLUN | grep "Sector size"
Sector size (logical/physical): 512 bytes / 4096 bytes

So that would work. But it would lose many of the claimed benefits of ASMLib such as reduced file descriptors and context switching. Also, it feels like a hack.

Setting ORACLEASM_USE_LOGICAL_BLOCK_SIZE

The answer, as you probably guessed, is to set this new parameter. It defaults, wrongly in my opinion, to using the physical block size. We can either edit the value in the file to be true in order to use the logical blocksize, or preferably use the oracleasm configure command:

root@half-server4 ~]# oracleasm configure -b
Writing Oracle ASM library driver configuration: done
[root@half-server4 ~]# oracleasm configure | grep ORACLEASM_USE_LOGICAL_BLOCK_SIZE
ORACLEASM_USE_LOGICAL_BLOCK_SIZE="true"

It can be set back to using the physical blocksize with the following command:

[root@half-server4 ~]# oracleasm configure -p
Writing Oracle ASM library driver configuration: done
[root@half-server4 ~]# oracleasm configure | grep ORACLEASM_USE_LOGICAL_BLOCK_SIZE
ORACLEASM_USE_LOGICAL_BLOCK_SIZE="false"

Finally, a word of warning. If you are like me, then you are a bit stupid and can’t follow instructions properly. I set the value of the parameter to TRUE in upper case and then spent hours wondering why it didn’t work. The answer, to my embarrassment, is that it’s case sensitive. TRUE is not a valid value so it defaults to false. Doh.