Oracle Linux 6.3 with Oracle 11.2.0.3 single-instance

In this cookbook I am going to build an Oracle 11g Release 2 database running on Oracle Linux 6 and Violin Memory flash storage. Unlike the Advanced Cookbooks which use 4k ASM diskgroups, on this occasion I am keeping it simple and using 512 byte. There is no downside to this, only an upside – which is that the installation is much more simple. Performance will still be excellent…

I have already installed Oracle Linux 6 Update 3 on my IBM X Series server:

[root@half-server4 ~]# cat /etc/oracle-release
Oracle Linux Server release 6.3
[root@half-server4 ~]# uname -s -n -r -m
Linux half-server4 2.6.39-200.24.1.el6uek.x86_64 x86_64

I don’t really need the iptables firewall configured for my testing so I’ve disabled it. If you are wondering why the server is named “half-server4” then it’s because it happens to reside in a half-sized rack. I can assure you that the server itself is entirely whole!

Setup YUM and Oracle Preinstall

In order to install the Oracle Database software I first need to ensure all of the relevant OS packages are installed as well as set various kernel parameters and other configuration settings. In Oracle Linux 5 this was easy to achieve using the oracle-validated package, but in OL6 this no longer exists. Don’t worry though, it hasn’t been removed – only renamed. It’s now called the Oracle Preinstall RPM. If you are installing Oracle Linux 6 then this is the solution – however, if you are installing Red Hat 6, you have been left out in the cold because the Oracle Preinstall package does not work with RHEL6. It’s not that difficult to fix this issue though and I have a step by step guide posted here.

To use Oracle Preinstall, I first need to configure the yum package management tool. Oracle customers with Unbreakable Linux Network (ULN) support contracts can configure yum to use the ULN yum repositories. Those without ULN access can still use yum because Oracle, to its credit, provides public yum repositories (details here).

In previous cookbooks I have used the public yum servers so, for a change, on this occasion I will use the ULN repositories. The first thing to do here is register the system with ULN, which in Oracle Linux 5 we did using the up2date command:

[root@half-server4 ~]# up2date --register
-bash: up2date: command not found

Oh dear. Is this bad? No, it’s just that in Oracle Linux 6 we no longer use up2date… instead we have to use the new uln_register command:

[root@half-server4 ~]# uln_register

This brings up a quasi-graphical screen in which I can enter the details of my Oracle Single-Sign On, password and CSI. It then steps through a load of screens, for most of which I just choose the default answers, then says it’s ready to upload the system details to ULN and register the server. I give it my blessing and (after a lengthy delay) I get the message:

# Note: yum-rhn-plugin has been enabled.
#
# Please review the subscription details below:
#
# Software channel subscriptions:
#  This system will receive updates from the following Unbreakable Linux Network software channels:
#  Oracle Linux 6 Latest (x86_64)
#  Latest Unbreakable Enterprise Kernel for Oracle Linux 6 (x86_64)

This means I can now run yum against the ULN repositories. And to prove it let’s do just that:

[root@half-server4 ~]# yum list \*preinstall\*
Loaded plugins: rhnplugin, security
Available Packages
oracle-rdbms-server-11gR2-preinstall.x86_64       1.0-6.el6      ol6_x86_64_latest

This is the package I need to install, so I will go ahead and run yum with the install option:

[root@half-server4 ~]# yum install oracle-rdbms-server-11gR2-preinstall -y

This creates a huge amount of output as it decides that I need to have 21 new packages installed plus another 2 updated. I won’t bore you by spooling the whole lot here, but this is the positive outcome we want to see (the dependencies may vary but the “Complete!” line remains reassuringly constant):

Installed:
  oracle-rdbms-server-11gR2-preinstall.x86_64 0:1.0-6.el6

Dependency Installed:
  cloog-ppl.x86_64 0:0.15.7-1.2.el6        compat-libcap1.x86_64 0:1.10-1             compat-libstdc++-33.x86_64 0:3.2.3-69.el6
  cpp.x86_64 0:4.4.6-4.el6                 gcc.x86_64 0:4.4.6-4.el6                   gcc-c++.x86_64 0:4.4.6-4.el6
  glibc-devel.x86_64 0:2.12-1.80.el6_3.6   glibc-headers.x86_64 0:2.12-1.80.el6_3.6   kernel-uek-headers.x86_64 0:2.6.32-300.39.2.el6uek
  ksh.x86_64 0:20100621-16.el6             libXmu.x86_64 0:1.0.5-1.el6                libXxf86dga.x86_64 0:1.1.1-1.el6
  libXxf86misc.x86_64 0:1.0.2-1.el6        libaio-devel.x86_64 0:0.3.107-10.el6       libdmx.x86_64 0:1.1.0-1.el6
  libstdc++-devel.x86_64 0:4.4.6-4.el6     mpfr.x86_64 0:2.4.1-6.el6                  ppl.x86_64 0:0.10.2-11.el6
  xorg-x11-utils.x86_64 0:7.4-8.el6        xorg-x11-xauth.x86_64 1:1.0.2-7.1.el6

Dependency Updated:
  glibc.x86_64 0:2.12-1.80.el6_3.6                                  glibc-common.x86_64 0:2.12-1.80.el6_3.6

Complete!

The Oracle preinstallation requirements are now almost complete. I need to set the password of the newly-created oracle user (run “passwd oracle” as the root user) and then it’s time to setup the storage.

Install Device Mapper and Multipathing Daemon

Next I need to make sure the device mapper and multipathing daemon packages are installed:

[root@half-server4 ~]# yum list device-mapper device-mapper-multipath
Loaded plugins: rhnplugin, security
Installed Packages
device-mapper.x86_64              1.02.74-10.el6       @anaconda-OracleLinuxServer.x86_64/6.3
Available Packages
device-mapper.x86_64              1.02.74-10.el6_3.3   ol6_x86_64_latest
device-mapper-multipath.x86_64    0.4.9-56.el6_3.1     ol6_x86_64_latest

The device mapper multipath daemon isn’t installed, so let’s install it (and update the device mapper package while we are there):

[root@half-server4 ~]# yum install device-mapper device-mapper-multipath -y
<...snip!...>
Installed:
  device-mapper-multipath.x86_64 0:0.4.9-56.el6_3.1

Dependency Installed:
  device-mapper-multipath-libs.x86_64 0:0.4.9-56.el6_3.1

Updated:
  device-mapper.x86_64 0:1.02.74-10.el6_3.3

Dependency Updated:
  device-mapper-event.x86_64 0:1.02.74-10.el6_3.3     device-mapper-event-libs.x86_64 0:1.02.74-10.el6_3.3
  device-mapper-libs.x86_64 0:1.02.74-10.el6_3.3      kpartx.x86_64 0:0.4.9-56.el6_3.1

Complete!

Now it’s installed I need to configure it to start when the server boots:

[root@half-server4 ~]# chkconfig --list multipathd
multipathd      0:off   1:off   2:off   3:off   4:off   5:off   6:off
[root@half-server4 ~]# chkconfig multipathd on
[root@half-server4 ~]# chkconfig --list multipathd
multipathd      0:off   1:off   2:on    3:on    4:on    5:on    6:off

The next step is to configure the multipath.conf file. At the moment the file doesn’t exist, but I can get a template from /usr/share/doc/device-mapper-multipath-0.4.9 or just start a new file from scratch. I’ll go for the latter, so here is the content I’m adding:

blacklist {
        devnode "*"
}
blacklist_exceptions {
        devnode "sd*"
}
defaults {
        user_friendly_names yes
}

These first three sections deal with the blacklists (devices which should not be used by the multipath daemon) and the defaults (the parameter user_friendly_names results in devices called /dev/mapper/mpath* instead of rather-less friendly WWID-based names). Since my Violin LUNs are all presented over fibre channel they appear as SCSI devices with names in the format /dev/sd*, so this is why the blacklist exception contains an entry for sd*. In this rather basic configuration I am telling the multipath daemon that it should ignore all devices except those with names like /dev/sd*

The next section will contain a device description for Violin, which tells the multipath daemon how to interrogate Violin devices and what to do with the resulting virtual devices that will represent them. If you have other non-Violin devices also present you will probably need extra entries here, but in this setup I don’t:

# ALUA
devices {
    device {
        vendor "VIOLIN"
        product "SAN ARRAY ALUA"
        path_grouping_policy group_by_prio
        getuid_callout "/sbin/scsi_id --whitelisted --replace-whitespace --page=0x80 --device=/dev/%n"
        prio alua
        path_checker tur
        path_selector "round-robin 0"
        hardware_handler "1 alua"
        failback immediate
        rr_weight uniform
        no_path_retry fail
        rr_min_io_rq 4
    }

# non-ALUA
    device {
        vendor "VIOLIN"
        product "SAN ARRAY"
        path_grouping_policy group_by_serial
        getuid_callout "/sbin/scsi_id --whitelisted --replace-whitespace --page=0x80 --device=/dev/%n"
        path_checker tur
        path_selector "round-robin 0"
        hardware_handler "0"
        failback immediate
        rr_weight uniform
        no_path_retry fail
        rr_min_io_rq 4
    }
}

An important fact to notice here is that this is different to the entry we used for Oracle Linux 5. This is partly because some of the multipathing parameters have changed in RHEL6 / OL6 and partly because the syntax of the /sbin/scsi_id command has changed. For an explanation of the multipathing parameters a good place to start is this Guide to DM Multipath from Red Hat. The scsi_id command is used to interrogate devices in order to determine what they are and bring back information about them. Perhaps the best way to explain this is to run it manually with the additional –export option which prints all the details in a list (although in RHEL5 / OL5 the options are different):

[root@half-server4 host1]# /sbin/scsi_id --whitelisted --page=0x80 --device=/dev/sdb --export
ID_SCSI=1
ID_VENDOR=VIOLIN
ID_VENDOR_ENC=VIOLIN\x20\x20
ID_MODEL=SAN_ARRAY
ID_MODEL_ENC=SAN\x20ARRAY\x20\x20\x20\x20\x20\x20\x20
ID_REVISION=551
ID_TYPE=disk
ID_SERIAL_RAW="SVIOLIN  SAN ARRAY      8D4CC077FDB18BB5"
ID_SERIAL=SVIOLIN_SAN_ARRAY_8D4CC077FDB18BB5
ID_SERIAL_SHORT=8D4CC077FDB18BB5

The two most important bits of information here are the vendor and the serial code: the vendor tells the multipathing daemon to treat it using the parameters defined above, while the serial code (which is unique to each LUN) allows the daemon to know which of the many SCSI devices correspond to the same underlying LUN on the array. A LUN presented over multiple paths appears as all sorts of different SCSI devices with random names like /dev/sdb or /dev/sddz. But if these devices have the same serial, the multipathing daemon knows they are the same underlying LUN and can add them both to the same virtual device, e.g. /dev/dm-3.

Now that the multipath.conf file is setup I will start the multipathing daemon, check that it’s loaded in the kernel (not necessary, I’m just doing this as a demonstration) and then list the devices it can see. The latter will create an enormous amount of screen output so I will reduce this by using grep:

[root@half-server4 host1]# service multipathd start
Starting multipathd daemon:                                [  OK  ]
[root@half-server4 host1]# lsmod | grep mult
dm_multipath           18776  11 dm_round_robin
dm_mod                 76415  23 dm_multipath,dm_mirror,dm_log
[root@half-server4 host1]# multipath -ll | grep VIOLIN
mpathe (SVIOLIN_SAN_ARRAY_8D4CC0770414BA85) dm-4 VIOLIN,SAN ARRAY
mpathd (SVIOLIN_SAN_ARRAY_8D4CC077573E3AA9) dm-7 VIOLIN,SAN ARRAY
mpathc (SVIOLIN_SAN_ARRAY_8D4CC077EF0D02F7) dm-5 VIOLIN,SAN ARRAY
mpathb (SVIOLIN_SAN_ARRAY_8D4CC077FDB18BB5) dm-8 VIOLIN,SAN ARRAY
mpathk (SVIOLIN_SAN_ARRAY_8D4CC077A0617977) dm-12 VIOLIN,SAN ARRAY
mpathj (SVIOLIN_SAN_ARRAY_8D4CC077B2DDF035) dm-10 VIOLIN,SAN ARRAY
mpathi (SVIOLIN_SAN_ARRAY_8D4CC07732414DBD) dm-11 VIOLIN,SAN ARRAY
mpathh (SVIOLIN_SAN_ARRAY_8D4CC0777B6A4F7E) dm-9 VIOLIN,SAN ARRAY
mpathg (SVIOLIN_SAN_ARRAY_8D4CC077A6337433) dm-6 VIOLIN,SAN ARRAY
mpathf (SVIOLIN_SAN_ARRAY_8D4CC077C4BA60F9) dm-3 VIOLIN,SAN ARRAY

Prior to this whole exercise I created and presented 10 LUNs from the Violin array: 8 x 20GB LUNs for use in a DATA diskgroup and 2x 160GB LUNs for use in a RECO diskgroup. The sizes are relatively arbitrary, however the decision to use 8 LUNs in DATA is deliberate. Violin recommends using at least 8 LUNs for best performance – and ASM best practice advices using an even number of disks in a diskgroup. The requirement for 8+ LUNS is not actually a direct Violin issue: it’s to do with the I/O device queues in the operating system. We find that the OS can achieve better I/O performance if operations are split over 8 or more queues.

How can I tell which LUN is which from the output of multipath -ll? In this case, since the LUNs for DATA are 20GB and the LUNs for RECO are 160GB it’s relatively simple (the full multipath -ll output shows the size of each LUN). But let’s say I had lots of LUNs all of similar sizes… how could I know which is which?

The answer comes in the form of the excellent sg3utils package which contains a number of useful utilities. For me, every server should have this package installed – just the rescan-scsi-bus.sh utility alone is a good enough reason (when you present new storage to your host it’s a lot simpler to run this than to reboot!). Let’s install it:

[root@half-server4 host1]# yum install sg3_utils -y
<..snip!..>
Installed:
  sg3_utils.x86_64 0:1.28-4.el6

Dependency Installed:
  sg3_utils-libs.x86_64 0:1.28-4.el6

Complete!

The package contains many useful utilities, one of which is sq_inq. I can use this to query the Violin devices and bring back the LUN names as I created them on the array (highlighted in red):

[root@half-server4 ~]# for device in /dev/dm-*; do echo -n "$device = "; sg_inq -i $device | grep "vendor specific:" | cut -d':' -f2-; done
/dev/dm-8 =   41207F00147:SERVER4-DATA1:8D4CC077FDB18BB5
/dev/dm-5 =   41207F00147:SERVER4-DATA2:8D4CC077EF0D02F7
/dev/dm-7 =   41207F00147:SERVER4-DATA3:8D4CC077573E3AA9
/dev/dm-4 =   41207F00147:SERVER4-DATA4:8D4CC0770414BA85
/dev/dm-3 =   41207F00147:SERVER4-DATA5:8D4CC077C4BA60F9
/dev/dm-6 =   41207F00147:SERVER4-DATA6:8D4CC077A6337433
/dev/dm-9 =   41207F00147:SERVER4-DATA7:8D4CC0777B6A4F7E
/dev/dm-11 =  41207F00147:SERVER4-DATA8:8D4CC07732414DBD
/dev/dm-10 =  41207F00147:SERVER4-RECO1:8D4CC077B2DDF035
/dev/dm-12 =  41207F00147:SERVER4-RECO2:8D4CC077A0617977

Now that I have this information I’m going to combine it with the serial numbers from the multipath -ll output and use the results to add entries to the multipath.conf file to give each device a name I recognise. This is done by using the /dev/dm-* device to match the LUN name against the serial number (i.e. SVIOLIN_SAN_ARRAY_*). Here’s a sample of the new section I’ll add to my multipath.conf:

    multipath {
        wwid SVIOLIN_SAN_ARRAY_8D4CC077FDB18BB5
        alias violin_data1
    }
    multipath {
        wwid SVIOLIN_SAN_ARRAY_8D4CC077EF0D02F7
        alias violin_data2
    }
    multipath {
        wwid SVIOLIN_SAN_ARRAY_8D4CC077573E3AA9
        alias violin_data3
    }
...etc...
}

To make the multipathing daemon pick these changes up we need to flush the existing config and then rerun it:

[root@half-server4 ~]# multipath -F
[root@half-server4 ~]# multipath -v1
violin_data5
violin_data4
violin_data2
violin_data6
violin_data3
violin_data1
violin_data7
violin_reco1
violin_data8
violin_reco2
[root@half-server4 ~]# ls -l /dev/mapper/violin*
lrwxrwxrwx. 1 root root 7 Feb  1 12:14 /dev/mapper/violin_data1 -> ../dm-8
lrwxrwxrwx. 1 root root 7 Feb  1 12:14 /dev/mapper/violin_data2 -> ../dm-5
lrwxrwxrwx. 1 root root 7 Feb  1 12:14 /dev/mapper/violin_data3 -> ../dm-7
lrwxrwxrwx. 1 root root 7 Feb  1 12:14 /dev/mapper/violin_data4 -> ../dm-4
lrwxrwxrwx. 1 root root 7 Feb  1 12:14 /dev/mapper/violin_data5 -> ../dm-3
lrwxrwxrwx. 1 root root 7 Feb  1 12:14 /dev/mapper/violin_data6 -> ../dm-6
lrwxrwxrwx. 1 root root 7 Feb  1 12:14 /dev/mapper/violin_data7 -> ../dm-9
lrwxrwxrwx. 1 root root 8 Feb  1 12:14 /dev/mapper/violin_data8 -> ../dm-11
lrwxrwxrwx. 1 root root 8 Feb  1 12:14 /dev/mapper/violin_reco1 -> ../dm-10
lrwxrwxrwx. 1 root root 8 Feb  1 12:14 /dev/mapper/violin_reco2 -> ../dm-12

And there are my nicely-named Violin devices.

Setup UDEV Rules

The next stage on this journey is to configure an additional UDEV rule for the Violin devices. UDEV is the device manager for Linux and is responsible for creating all of the various device files in the /dev directory structure. This includes the /dev/sd* SCSI devices which represent the paths to the Violin array, as well as the /dev/dm-* multipath devices which are what we need to use with Oracle in order to have protection from path failures.

Why do we need to add a new rule? There are a few reasons, some Violin-related and others Oracle-related. The Violin-related reason is that by default all of the SCSI devices presented to the host will be using the deadline I/O scheduler. I’ve written about I/O schedulers before, but to recap they are algorithms which attempt to trade some extra CPU in order to merge or reorder I/Os in order to reduce the time spent waiting on storage (for example by reducing the number of disk head movements). This is all well and good, but with the Violin array we do not need to worry about any of this, so we prefer to set all Violin devices to use the noop scheduler (noop stands for no optimisations) so that we can just get on with the business of reading and writing. Another thing we like to do is tell the Linux kernel that the Violin devices are flash rather than disk – kernels with version 2.6.32 and above have an extra flag called rotational which is set to 0 when the device is not a spinning disk. By default the Violin SCSI devices will tell the kernel to use the correct value, but annoyingly the /dev/dm-* multipath devices based on Violin will need to be set manually. Finally we like to change the OS queue depths on Violin devices to reflect the increased capacity for I/Os. Here’s what a non-Violin device (based on a spinning disk) looks like:

[root@half-server4 ~]# cat /sys/block/sda/queue/scheduler
noop [deadline] cfq
[root@half-server4 ~]# cat /sys/block/sda/queue/rotational
1
[root@half-server4 ~]# cat /sys/block/sda/queue/nr_requests
128

For Violin we want noop, rotational = 0 and nr_requests = something like 512 or 1024. What about the Oracle reason I mentioned earlier? Well this is the issue of the device ownership. Oracle runs as the user oracle and group dba (or if you are using ASM with role separation it runs with grid and asmadmin so change the text in red below), so the multipath devices being used for the database / ASM will need to have the correct ownership and permissions in order that they can be accessed. In RHEL5 / OL5 we used to set this in the /etc/multipath.conf file but this is no longer possible in RHEL6 / OL6 so we have to set it in UDEV. It’s no big deal though. Here is the UDEV file required to do all of these things – there are two rules, one for the /dev/sd* devices and the other for the /dev/dm-* devices:

# UDEV rules for RH6 and OL6 with Violin Memory
# Set scheduler and queue depth for Violin SCSI devices
KERNEL=="sd*[!0-9]|sg*", BUS=="scsi", SYSFS{vendor}=="VIOLIN", SYSFS{model}=="SAN ARRAY*", RUN+="/bin/sh -c 'echo noop > /sys/$devpath/queue/scheduler && echo 1024 > /sys/$devpath/queue/nr_requests'"
# Set rotational, scheduler and queue depth for Violin multipath devices
KERNEL=="dm-[0-9]*", ENV{DM_UUID}=="mpath-SVIOLIN*", OWNER:="oracle", GROUP:="dba", MODE:="660", RUN+="/bin/sh -c 'echo 0 > /sys/$devpath/queue/rotational && echo noop > /sys/$devpath/queue/scheduler && echo 1024 > /sys/$devpath/queue/nr_requests'"

It’s important that each rule is contained on one line with no line breaks, otherwise it won’t run properly The file needs to be saved as /etc/udev/rules.d/12-violin.rules in order to run correctly. The UDEV rules directory (/etc/udev/rules.d) contains multiple files with names starting with numbers – they are run in ascending order. Each device processed by UDEV is subjected to all of these rules – unless it hits the magic command OPTIONS=”last_rule” in which case all subsequent rules will be ignored on that device. To ensure the Violin devices are picked up before the last_rule option could be hit, we call the file a name starting with 12.

To pick up the changes we have to reload the rules and then trigger UDEV. I’ll demonstrate this by showing the changes on device /dev/dm-8:

[root@half-server4 etc]# ls -l /dev/dm-8
brw-rw----. 1 root disk 252, 8 Feb  1 12:58 /dev/dm-8
[root@half-server4 etc]# cat /sys/block/dm-8/queue/scheduler
noop [deadline] cfq
[root@half-server4 etc]# cat /sys/block/dm-8/queue/rotational
1
[root@half-server4 etc]# udevadm control --reload-rules
[root@half-server4 etc]# udevadm trigger
[root@half-server4 etc]# ls -l /dev/dm-8
brw-rw----. 1 oracle dba 252, 8 Feb  1 12:58 /dev/dm-8
[root@half-server4 etc]# cat /sys/block/dm-8/queue/scheduler
[noop] deadline cfq
[root@half-server4 etc]# cat /sys/block/dm-8/queue/rotational
0

This is the final step for preparing the Violin devices, so we are now ready to begin installing Oracle.

Not Installing Oracle ASMLib

You’ll notice that we didn’t install Oracle’s ASMLib driver during the previous sections. There is a reason for this. Linux kernels of 2.6.32 and above are much better able to understand the properties of storage devices. Here’s the /sys/block/<device>/queue directory for a 2.6.18 kernel:

[root@half-server3 ~]# ls -l /sys/block/sdk/queue
total 0
drwxr-xr-x 2 root root    0 Mar  4 11:54 iosched
-rw-r--r-- 1 root root 4096 Mar  7 12:17 iostats
-r--r--r-- 1 root root 4096 Mar  7 12:17 max_hw_sectors_kb
-rw-r--r-- 1 root root 4096 Mar  7 12:17 max_sectors_kb
-rw-r--r-- 1 root root    0 Mar  4 11:54 nr_requests
-rw-r--r-- 1 root root 4096 Mar  7 12:17 read_ahead_kb
-rw-r--r-- 1 root root    0 Mar  4 11:54 scheduler

Now here’s the same directory for a 2.6.32 kernel:

[root@half-server4 ~]# ls -l /sys/block/sdf/queue
total 0
-rw-r--r--. 1 root root 4096 Mar  7 15:51 add_random
-r--r--r--. 1 root root 4096 Mar  7 15:51 discard_granularity
-r--r--r--. 1 root root 4096 Mar  7 15:51 discard_max_bytes
-r--r--r--. 1 root root 4096 Mar  7 15:51 discard_zeroes_data
-r--r--r--. 1 root root 4096 Mar  7 15:51 hw_sector_size
drwxr-xr-x. 2 root root    0 Mar  7 15:51 iosched
-rw-r--r--. 1 root root 4096 Mar  7 15:51 iostats
-r--r--r--. 1 root root 4096 Mar  7 15:51 logical_block_size
-r--r--r--. 1 root root 4096 Mar  7 15:51 max_hw_sectors_kb
-r--r--r--. 1 root root 4096 Mar  7 15:51 max_integrity_segments
-rw-r--r--. 1 root root 4096 Mar  7 15:51 max_sectors_kb
-r--r--r--. 1 root root 4096 Mar  7 15:51 max_segments
-r--r--r--. 1 root root 4096 Mar  7 15:51 max_segment_size
-r--r--r--. 1 root root 4096 Mar  7 15:51 minimum_io_size
-rw-r--r--. 1 root root 4096 Mar  7 15:51 nomerges
-rw-r--r--. 1 root root 4096 Mar  7 14:54 nr_requests
-r--r--r--. 1 root root 4096 Mar  7 15:51 optimal_io_size
-r--r--r--. 1 root root 4096 Mar  7 15:51 physical_block_size
-rw-r--r--. 1 root root 4096 Mar  7 15:51 read_ahead_kb
-rw-r--r--. 1 root root 4096 Mar  7 15:51 rotational
-rw-r--r--. 1 root root 4096 Mar  7 15:51 rq_affinity
-rw-r--r--. 1 root root 4096 Mar  7 14:54 scheduler

See all the extra information that the OS now has at its disposal? Let’s look at two particularly interesting properties:

[root@half-server4 ~]# cat /sys/block/sdf/queue/logical_block_size
512
[root@half-server4 ~]# cat /sys/block/sdf/queue/physical_block_size
4096

So in 2.6.32 the kernel is aware that my Violin LUN is actually using a 4k sector as a physical address, but presenting to the OS as a 512 byte logical sector. The 2.6.18 kernel would have no idea and just see the 512 byte address.

What’s this got to do with ASMLib? Well, jumping forward a minute, we have to create ASM diskgroups based on ASM disks. Those ASM disks are either block devices (the /dev/mapper/violin* devices we created earlier) OR they are devices presented via the ASMLib driver (appearing with names like ORCL:VIOLIN*). We haven’t yet installed Oracle Grid Infrastructure, so we can’t create the diskgroups and look at their properties, but using my time machine I’m going to do just that so you can see the problem. Here’s the diskgroups created using the /dev/mapper devices (which is what we are going to use here):

ASMCMD> lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB
MOUNTED  EXTERN  N         512   4096  1048576    163840   163767
MOUNTED  EXTERN  N         512   4096  1048576    245760   245706

Now here are the same diskgroups created via ASMLib devices:

ASMCMD> lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB
MOUNTED  EXTERN  N        4096   4096  1048576    163840   162165
MOUNTED  EXTERN  N        4096   4096  1048576    245760   245706

There is a glaring difference, which is the SECTOR_SIZE attribute. If you use ASMLib, Oracle will by default create the diskgroup with a sector size of 4k. This is not the end of the world, but it does mean you need to switch to using the Advanced Cookbooks and stop building ASM and your databases using the GUI tools. If you don’t, the DBCA or OUI will fail during database creation because of this:

ORA-01078: failure in processing system parameters
ORA-15081: failed to submit an I/O operation to a disk
ORA-27091: unable to queue I/O
ORA-17507: I/O request size 512 is not a multiple of logical block size

If you want to stick with the standard, simple method then you need that sector size to be 512 byte. That means avoiding ASMLib.

Install Grid Infrastructure

The next step is the installation of Oracle’s Grid Infrastructure software which includes Clusterware and ASM. In the Advanced Cookbooks this is a complicated process because the aim is to configure each ASM diskgroup to have a sector size of 4k. In this “standard” cookbook we will do no such thing, because a) it’s too complex, and b) it isn’t necessary. We can get perfectly excellent performance whilst pretending that the LUNs are all 512 byte-based.

So, simply install Oracle Grid Infrastructure using the OUI, making sure you set change the Discovery Settings when prompted to point the ASM_DISKSTRING parameter at our block devices:

ASM_DISKSTRING='/dev/mapper/violin*'

Do all the usual stuff, like creating the DATA diskgroup (with external redundancy) on LUNs /dev/mapper/violin_data1 to 8, then run the root.sh etc and the install should finish normally. You can then use asmca to create the RECO diskgroup on LUNs /dev/mapper/violin_reco1 and 2.

Install Oracle Database

Next, install the Oracle Database software using the OUI as usual. If you want to build a database as part of the install, go for it. Alternatively, perform a software-only install and build the database later on using DBCA. Providing you stuck to the instructions above, it should all work just fine. There is however, one more thing you need to do before your work is done…

Recreate Online Redo Logs with 4k Blocksize

The final step to ensure you get the best available performance is to recreate the online redo logs with a 4k blocksize, instead of their default 512 byte blocksize. This will avoid the potential penalty of “partial” or “misaligned” I/O on the array (see here for details).

There are two steps to this process. The first is to set an underscore parameter on the RDBMS instance (or instances if you are using RAC). You don’t need to set this on the ASM instance. As always, remember that in a production environment you should be seeking permission from Oracle Support to set any underscore parameter:

SQL> alter system set "_disk_sector_size_override"=TRUE scope=both;

System altered.

The parameter is dynamic so you do not need to bounce the instance. Now, drop and recreate the online redo logs using the BLOCKSIZE 4096 option:

SQL> select group#, thread#, round(bytes/1024/1024) as mb, blocksize, status from v$log;

    GROUP#    THREAD#         MB  BLOCKSIZE STATUS
---------- ---------- ---------- ---------- ----------------
         1          1         50        512 INACTIVE
         2          1         50        512 CURRENT
         3          1         50        512 INACTIVE

SQL> alter database add logfile group 4 size 1G blocksize 4096;

Database altered.

SQL> alter database add logfile group 5 size 1G blocksize 4096;

Database altered.

SQL> alter system switch logfile;

System altered.

SQL> /

System altered.

SQL> alter system checkpoint;

System altered.

SQL> alter database drop logfile group 1;

Database altered.

SQL> alter database drop logfile group 2;

Database altered.

SQL> alter database drop logfile group 3;

Database altered.

SQL> alter database add logfile group 1 size 1G blocksize 4096;

Database altered.

SQL> alter database add logfile group 2 size 1G blocksize 4096;

Database altered.

SQL>  alter database add logfile group 3 size 1G blocksize 4096;

Database altered.

SQL> select group#, thread#, round(bytes/1024/1024) as mb, blocksize, status from v$log;

    GROUP#    THREAD#         MB  BLOCKSIZE STATUS
---------- ---------- ---------- ---------- ----------------
         1          1       1024       4096 UNUSED
         2          1       1024       4096 UNUSED
         3          1       1024       4096 UNUSED
         4          1       1024       4096 INACTIVE
         5          1       1024       4096 CURRENT

That’s it. The ASM instance(s) use 512 byte, but that doesn’t matter because ASM metadata is always written in 4k blocks anyway. And now the database is using 4k for redo, so assuming you chose a database blocksize that was 4k or higher you have a system which is built ready for high performance.

All you need to do now is test it …