Implementing Linux native multipathing or DM-MPIO together with EMC PowerPath

puzzle

Guest Post

I’m delighted to say that this is another guest post from my good friend Nate Fuzi, who performs the same role as me for Violin but is based in the US instead of EMEA. Because he is American, Nate thinks that scones are called “biscuits”, that chips are called “fries” and that there is nothing – *nothing* – that cannot be improved with the simple addition of bacon. Clearly, something is fundamentally wrong with him – and yet he is like a brother to me. Like the strange, American step-brother I only see a few times a year and whom I cannot understand without the use of a translator. But he’s family all the same. So over to you Nate… and remember: Mom loved me more.

Remember when your parent answering your whiny “but whyyyyyyy???” with “Because I said so” was something you just had to accept? It meant there was no more explanation coming, and it was time for you to move on. Over the years, that answer broke down, and you grew confident you were owed more. And parents agreed: the more you demonstrated your ability to reason, the more reason you got to help you over the denial. It’s a sign of respect that we pay each other in adult life. And it can feel like disrespect if the reason offered feels weak or like it is intended to discourage further inquiry.

I was recently faced with solving what seemed a straightforward problem: take an existing Linux server running EMC’s multipathing software, PowerPath or “PP” as I will refer to it here, to access LUNs presented from that company’s SAN product, the VNX array, and attach and run Violin storage alongside the VNX. PP didn’t then support Violin arrays (still doesn’t at the time of this writing), so what was the client to do when they wanted to try out Violin’s AFA for their database environments? Just run PP and native Linux multipathing, called DM-MPIO, side by side, letting PP manage the VNX LUNs and DM-MPIO manage I/Os bound for Violin, right?

PowerPath versus DM-MPIO

PowerPath versus DM-MPIO

Wrong. Won’t work, I read. PowerPath does something at the HBA layer, I read a seemingly helpful web poster explain, that will corrupt either the VNX data or the Violin data. Well… maybe it will work, suggested another poster, but EMC might not support customers running in such a configuration. Others suggested ominously that PP and DM-MPIO don’t work well together… leaving it to the reader’s imagination what might result. I’m no master Googler, but I couldn’t find where anyone had put aside the rumors and vaguely threatening suggestions and actually tried it. Well, I did it, and I want to write about it so others know it can be done and how to do it because, well, those explanations I read didn’t stand up to question and felt like they were meant to scare me into not trying it. Of course I had to try it! Now, let’s be clear about what I am and am not saying: I am saying I have done this and it works. It’s in production at a customer site, running for months without issue. I am not saying that I have spoken to your EMC support rep and that you’ve been green-lighted to do this in your production environment. I’m not an EMC customer, and I don’t have a buddy in EMC support. So let’s consider this for informational purposes only for the time being.

First off, as several folks rightly pointed out, DM-MPIO could easily manage LUNs from both SAN products. Drop PP, configure DM-MPIO, and done. Well, that just sounds too simple. But it’s true: DM-MPIO has come a long way over the last few years and offers a pretty good set of features for free. PP costs money but is not without added value, as it does have additional configurability for reserve paths that become active in the event of a failure scenario, as well as IO distribution models beyond those offered by DM-MPIO, for example. My customer wanted to keep running PP, so this option was off the table for me.

Next up is the fun fact that PP advises you upon installation that you should “Blacklist all devices in /etc/multipath.conf and stop multipathd service”. The installer doesn’t say what will happen if you don’t do this, only that it is “*** IMPORTANT ***”. Check. Easy enough to ignore if this is the first thing you do. But if DM-MPIO is already running on the system and you try to start PP, it tells you this (verified in 5.7 and 6.0 only):

[root@host] # /etc/init.d/PowerPath start
Starting PowerPath:                                                      [FAILED]
Aborting PowerPath start since DM-Multipath is active.
Refer to PowerPath for Linux Installation and Administration Guide for more information

That’s a bummer. You actually have to stop multipathd and flush its paths before PP will start up. OK, I can do that. And, to be sure, you do NOT want both products attempting to manage IOs for the same device at the same time. That really is a bad thing. As we’ll see shortly, we might even want to segregate traffic across different FC ports, although this is strictly for optimization, not because you can’t mix traffic. But, as soon as we’ve installed the device-mapper-multipath-* packages, let’s honor this restriction right away by blacklisting the EMC devices in /etc/multipath.conf like this:

blacklist {
       devnode "^(control|vg|ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"    # standard stuff
       devnode "^hd[a-z][0-9]*"                                          # this line too
       device {
              vendor "DGC"
              product "*"
       }
       device {
              vendor "EMC"
              product "*"
       }
}

Note that the VNX line grew up in the Clariion company later acquired by EMC and presents a vendor string of “DGC”. Don’t ask me why. [Because the Clariion was a product from Data General Corporation? — flashdba 🙂] It is my understanding that VMAX arrays do present “EMC” as their vendor string. Having done this, we want to explicitly except Violin devices from getting blacklisted:

blacklist_exceptions {
       device {
              vendor "VIOLIN"
              product "*"
       }
}

This isn’t completely necessary, but it does make clear our intentions: don’t manage VNX/EMC devices but do manage Violin devices. Having both entries in the file means that adding some third storage product to the FC SAN won’t cause it to get picked up by DM-MPIO without us consciously making it so. Belt and suspenders, they used to say.

Verify your multipath configuration without actually running it. Do this by adding the “-d” flag to your multipath command:

[root@host] # multipath -v3 -d

The “-v3” flag gives us a verbose parsing of the configuration file so we can see each device and whether, what, and why DM-MPIO is going to do that with device. Make changes ad nauseum, and once you like what you see, run the command without “-d”, and create your multipath devices.

Cool. But remember when PP refused to start up earlier, saying DM-MPIO was found running? Guess what: PP’s inexplicable method of editing your /etc/rc.d/rc.sysinit script to insert its startup lines means it doesn’t attempt to start up until after DM-MPIO gets started on reboot. (Take a look for yourself; it’s there. It also makes you manually start PP if you apply a kernel update that resets the contents of rc.sysinit, at which point it reinserts the startup lines. Sweet.) How to get around this? I’m sure there are lots of solutions. I created a script to flush existing multipaths and start up PP in /etc/init.d and linked it as /etc/rc.d/rc3.d/S86PowerPath. This makes it so PP gets called just prior to DM-MPIO, and each is happy. The later call in /etc/rc.d/rc.sysinit is then redundant but causes no harm. I suppose you could almost as easily edit the rc.sysinit script to remove the check–just remember to make the same edit if/when you update PP.

Now, what was that I said a bit ago about segregating traffic on different HBA ports? This is not required; no magic is happening on the HBA with either product. Each one will discover the devices it is concerned with via its own callout routine and handle that device how you configure it to. But let’s imagine you have 4 FC ports on your host and choose to allow PP and DM-MPIO to each manage devices across all those ports. Neither will be aware what the other is doing in terms of trying to optimize IO distribution across all paths available, and you could well end up shooting yourself in the foot with sub-optimal end results. Segregating traffic also allows you to set different HBA queue depths or optimization settings as recommended by each storage vendor, and we all want to comply with best practices, right?!

Conclusion

None of this is meant to disparage EMC. Well, OK: the part about having the PowerPath startup script insert lines into /etc/rc.d/rc.sysinit is meant to disparage. I think that’s archaic and clunky. I have to believe there’s a more elegant way to do that today. I do hope I save some other soul the frustration I went through determining if this could be done and then how. If anyone has implemented more elegant solutions, I’d love to read about them.

Advertisements

2 Responses to Implementing Linux native multipathing or DM-MPIO together with EMC PowerPath

  1. Faheem says:

    awesome Thread but could you please help by providing ur script or the lines to remove from rc.sysinit in order to get powerpath to start before NMPIO

  2. Nathan Fuzi says:

    Hi Faheem,
    Thanks for reading! And thanks for the memory jog. It’s been a long time since I thought about this particular problem and solution. Here are the contents of my simple PowerPath “booster script”. I am sure true Linux admins out there will find all kinds of opportunities to improve conditions / error checking and handling, but this got the job done, so consider it a minimum working configuration, and you can get creative from here.

    [root@host ~]# cat /etc/rc.d/rc3.d/S86PowerPath
    #!/bin/bash
    #
    # S86PowerPath – Starts EMC PowerPath
    #

    initdir=/etc/rc.d/init.d
    prog=PowerPath
    mpath=/sbin/multipath

    RETVAL=0

    #
    # See how we were called.
    #

    start() {
    test -x $mpath && (echo “Flushing DM-MPIO multipaths…”; /sbin/multipath -F)
    test -x $initdir/$prog && . $initdir/$prog start
    RETVAL=$?
    echo
    }

    stop() {
    echo
    test -x $initdir/$prog && . $initdir/$prog stop
    RETVAL=$?
    echo
    }

    restart() {
    stop
    start
    }

    case “$1″ in
    start)
    start
    ;;
    stop)
    stop
    ;;
    restart)
    restart
    ;;
    *)
    echo $”Usage: $0 {start|stop|restart}”
    RETVAL=2
    esac

    exit $RETVAL

    Also, since you mentioned it, I happen to have the contents of the PowerPath entry in /etc/rc.sysinit that performs the check. You could comment out the check if you want, but be aware that any OS update or PowerPath update may require you to re-edit the change.

    #——————————————————-
    # PowerPath and DM-Multipath cannot co-exist on the host.
    # Do not allow PowerPath to start if DM-Multipath is
    # configured and managing devices.
    #——————————————————-

    check_dm_multipath_config ()
    {
    continue_pp_start=true

    # On SUSE, DM-Multipath will start after PowerPath starts.
    # Hence, if DM-Multipath is configured to start on boot,
    # do not continue PowerPath start.

    if [ -f /etc/SuSE-release -a “$RUNLEVEL” = “S” ]
    then
    if chkconfig multipathd –list 2>/dev/null \
    | grep ‘[35]:on’ >/dev/null 2>&1
    then
    continue_pp_start=false
    fi
    elif lsmod | grep -w ‘^dm_multipath’ >/dev/null
    then
    if [ -f /sbin/multipath -a -f /sbin/multipathd ]
    then
    num=`/sbin/multipath -ll -v1 | grep -c ‘.*’`
    if [ $num -gt 0 ]; then
    continue_pp_start=false
    fi
    fi
    fi

    if [ $continue_pp_start = “false” ]
    then
    echo “`eval_gettext \”Aborting PowerPath start since DM-Multipath is active. Refer to PowerPath for Linux Installation and Administration Guide for more information \”`”
    pp_logger Aborting PowerPath start since DM-Multipath is active. Refer to PowerPath for Linux Installation and Administration Guide for more information
    fi

    /bin/$continue_pp_start
    }

    I hope this helps.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s