More Problems with Oracle’s Support of 4k Devices

This is going to be another one of those posts, a bit like this one, that discuss the use of Oracle’s database product with Advanced Format devices. I wish there weren’t so many of these posts, but it seems that Oracle has a lot of issues with it’s implementation of 4k support.

(Before reading on, if you aren’t sure what I’m talking about here then please have a read of this page…)

In the last post I built a database which used Oracle ASM (and the Linux ASMLib kernel driver) but found that if the database used an SPFILE which was located on a 4k device (within an ASM diskgroup) it didn’t work. Today, I’m going to forego ASM and use a filesystem instead (something I would never do in real life).

Building a 4k Filesystem

Let’s start with a single 4k LUN being presented from my Violin array. I’ve already configured the Linux device mapper multipathing so that it presents itself as a nicely-named device in the /dev/mapper directory:

[oracle@half-server4 ~]$ ls -l /dev/mapper/fs4ktest 
lrwxrwxrwx 1 root root 7 Feb 25 15:53 /dev/mapper/fs4ktest -> ../dm-7
[oracle@half-server4 ~]$ fdisk -l /dev/mapper/fs4ktest

Note: sector size is 4096 (not 512)

Disk /dev/mapper/fs4ktest: 215.8 GB, 215822106624 bytes
255 heads, 63 sectors/track, 3279 cylinders
Units = cylinders of 16065 * 4096 = 65802240 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 524288 bytes

We can see that this is indeed a 4k device, i.e. it has not only a 4096 byte physical blocksize, but a 4096 byte logical blocksize too. The fdisk command has even taken the time to print a special “Note” to ensure we see the sector size is not the usual 512 bytes. The next thing to do is format it with a filesystem so I’m going to use ext4:

[root@half-server4 ~]# mkfs.ext4 /dev/mapper/fs4test 
mke2fs 1.41.12 (17-May-2010)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=1 blocks, Stripe width=128 blocks
2097152 inodes, 8388608 blocks
419430 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
256 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
	4096000, 7962624

Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

Now it needs to be mounted. I’m just going to stick it on a mount point in a new top-level directory called /fstest:

[root@half-server4 ~]# mkdir -p /fstest/fs4ktest
[root@half-server4 ~]# chown -R oracle:oinstall /fstest
[root@half-server4 ~]# mount /dev/mapper/fs4ktest /fstest/fs4ktest

Finally, we I am going to create an Oracle database using this filesystem. I’m not going to cut and paste all the output for that, because it’s all a bit dull… so let’s just skip to the bit where DBCA has completed and the database is open.

Oracle On A 4k Filesystem

So the good news is, it worked. The database is up and running and the datafiles are located on the 4k filesystem:

[oracle@half-server4 fstest]$ sqlplus / as sysdba

SQL*Plus: Release Production on Mon Mar 3 21:30:02 2014
Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Connected to:
Oracle Database 12c Enterprise Edition Release - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> select name from v$datafile;


Cool. We can end this post here then, right? Well, no… because there is a bit of a problem with this database. Let’s just have a quick check of the FILESYSTEMIO_OPTIONS parameter:

SQL> show parameter filesystem

------------------------------------ ----------- ------------------------------
filesystemio_options		     string	 none

This parameter controls the way that I/O is performed for files located on filesystems. It isn’t relevant for databases using Oracle ASM (for which the DISK_ASYNCH_IO parameter exists instead), but here it’s making a massive difference. According to the Oracle documentation, it has four possible options:

  • ASYNCH: enable asynchronous I/O on file system files, which has no timing requirement for transmission.
  • DIRECTIO: enable direct I/O on file system files, which bypasses the buffer cache.
  • SETALL: enable both asynchronous and direct I/O on file system files.
  • NONE: disable both asynchronous and direct I/O on file system files.

Normally, when I see filesystem-based databases, I find this parameter set to SETALL. This means asynchronous and direct I/O, but here it is set to NONE which means neither. And it’s the DIRECTIO that we are interested in.

One Buffer Cache Is Enough

As you are no doubt aware, Oracle databases have a buffer cache which is used to cache copies of database blocks. However, the Linux operating system also has its own buffer cache for filesystems. Most people would consider it ineffective to use two levels of cache – and if that is the case, it will obviously be the Oracle buffer cache that needs to be used. So let’s set the parameter to use direct I/O and then restart the database (as the parameter is not dynamic):

SQL> alter system set filesystemio_options='directIO' scope=spfile;

System altered.

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.

Total System Global Area 1.3896E+10 bytes
Fixed Size		    4663568 bytes
Variable Size		 2751465200 bytes
Database Buffers	 1.1107E+10 bytes
Redo Buffers		   33673216 bytes
ORA-00205: error in identifying control file, check alert log for more info

Oh dear. What happened?

[oracle@half-server4 ~]$ tail /u01/app/oracle/diag/rdbms/fstest/fstest/trace/alert_fstest.log 
ORA-00210: cannot open the specified control file
ORA-00202: control file: '/fstest/fs4ktest/oracle/oradata/FSTEST/controlfile/o1_mf_9k9wzp31_.ctl'
ORA-27047: unable to read the header block of file
Linux-x86_64 Error: 22: Invalid argument
Additional information: 1
ORA-205 signalled during: ALTER DATABASE   MOUNT...

The answer, which you can find in My Oracle Support note 1133713.1, is that Oracle does not support 4k devices with direct I/O. This has been the case for a long time – I remember first discovering this nearly two years ago, on, yet there is no sign of it being fixed. According to the note, “It is not yet known in which version this support will be available.” Pah.

There’s More: Diagnostic Destination on 4k

And then there’s the diagnostic destination. How about if I choose to locate this on a 4k filesystem?

SQL> show parameter diagnostic_dest

------------------------------------ ----------- ------------------------------
diagnostic_dest 		     string	 /u01/app/oracle

SQL> alter system set diagnostic_dest='/fstest/fs4ktest/oracle' scope=spfile;

System altered.

I’ll give it a few minutes and then go and look in some of the files… guess what I see?

ORA-48101: error encountered when attempting to read a file [block] [/fstest/fs4ktest/oracle/diag/rdbms/fstest/fstest/metadata/INCIDENT.ams] [0]
ORA-27072: File I/O error
Linux-x86_64 Error: 22: Invalid argument
Additional information: 4
Additional information: 1
Additional information: -1

Look familiar? (This is nothing to do with direct I/O by the way, I disabled that again before this test.)

So let’s be honest, things aren’t going all that well here. There are still a lot of things that do not appear to work properly when using 4k devices. Luckily, my Violin array can present storage as 512 byte to avoid this sort of issue, but really I feel that Oracle needs to get cracking on its Advanced Format support. This is not just a flash memory thing, pretty much every major disk vendor is making Advanced Format devices now from Western Digital, through HGST to Seagate.

Time to get with the programme?


2 Responses to More Problems with Oracle’s Support of 4k Devices

  1. oranjer says:

    I’m only just about keeping up with the detail of these posts so forgive me if I’ve misunderstood. You say “Luckily, my Violin array can present storage as 512 byte to avoid this sort of issue, but really I feel that Oracle needs to get cracking on its Advanced Format support.” Somehow I don’t think MOS note 1626228.1 is quite what you meant, right? “As of Oracle release 12.1, ASM only supports 512e.”

    • flashdba says:

      I hadn’t see that note before, that’s strange. We know that 4096 sector devices (which cannot possibly be 512e) are supported, because the 12.1 documentation discusses them:

      I wonder if this is meant to say ACFS? We know that ACFS definitely does not support 4k sector devices.

      As it happens, Oracle’s advice in this paper is the same as mine: use 512e and then configure the redo logs to be 4k. This makes things nice and simple and gives great performance, without having to worry about whether Oracle has remembered to write the correct I/O calls in their code.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: