Oracle Exadata X4 (Part 2): The All Flash Database Machine?

This article looks at the new Oracle Exadata X4-2 Database Machine from Big Red. In part one I looked at the changes made from the X3 model (more stuff) as well as the implications (more license bills). I also covered some of the confusing and bewildering descriptions Oracle has used to describe the flash capacity of the X4. To recap, here are some of the quotes made in various Oracle literature:

Source

Quote

Oracle Exadata X4-2 datasheet

44.8 TB of raw physical flash memory”

Oracle Exadata X4 Press Release

logical flash cache capacity to 88 TB

Oracle X3 to X4 Changes slide deck

“flash cache compression expands capacity to 88TB (raw)”

Oracle Exadata X4-2 datasheet

effective flash capacity of 440 TB

The source of this confusion appears to be the claim that a new feature called Exadata Smart Flash Cache Compression will allow more data to fit into flash. Noticeably absent from the press release and datasheet is the information that this new feature apparently requires the Advanced Compression license, potentially adding over $1m to the list price of a full rack (see slide 22 of this Oracle presentation).

This second part of the article will look at the implications of these changes, but to make things more interesting there’s one specific change I haven’t mentioned until now. And it’s the change that I think gives the biggest insight into Oracle’s thinking.

The Hybrid Database Machine

Picture courtesy of Dennis van Zuijlekom

Picture courtesy of Dennis van Zuijlekom

Right now, in the storage industry, there is a paradigm shift taking place as primary data moves from rusty old spinning disks to semiconductor-based NAND flash storage. Most storage vendors now offer all-flash arrays as part of their product lineup, although one or two still insist on the hybrid approach where data is located on disk but flash is used as a tiering or caching layer to improve performance.

Oracle, despite being one of the early adopters of flash with its Sun Oracle Database Machine (i.e. the Exadata v2), still uses the hybrid approach in Exadata. Each full rack contains 14 storage cells, with each cell containing 12 rotating magnetic disks as well as four PCIe flash cards (made by LSI and then rebranded as Sun). The disks can be bought in two options: high performance or high capacity (known as HP and HC respectively). It’s fair to say that the majority of customers buy the high performance version (* see comments below) – after all, Exadata is a very expensive solution aimed at solving performance problems, so performance is generally high up on a customer’s list of requirements.

Upgrading to Slower Performance?

See if you can spot the most important change to be made since the introduction of flash back in the Sun Oracle v2 (second generation) machine:

Product

Raw Flash

High Performance Disks

HP Disk Capacity

Sun Oracle Database Machine (v2)

5.3 TB

600GB 15,000 RPM

100 TB

Exadata Database Machine X2-2

5.3 TB

600GB 15,000 RPM

100 TB

Exadata Database Machine X3-2

22.4 TB

600GB 15,000 RPM

100 TB

Exadata Database Machine X4-2

44.8 TB

1.2TB 10,000 RPM

200 TB

Did you notice? In the X4 model storage cells, the HP disks have now doubled in capacity. That’s not the important bit though, it’s the sacrifice that Oracle had to make to do this: 10k RPM disk drives instead of 15k RPM. In Exadata X4, the high performance disks are slower than in Exadata X3.

How much slower are we talking? Well, the average rotational latency of a 15k RPM drive is 4ms. The average rotational latency for a 10k RPM drive is 6ms. That’s an extra 50% average rotational latency. Why on Earth would Oracle make that change? If customers wanted more capacity, couldn’t they just buy the storage expansion racks?

Design Dilemmas

The answer lies in two of Oracle’s fundamental design choices for the Exadata architecture:

  • the reliance on ASM software mirroring (meaning all data is stored either twice or three times), and
  • the use of flash as cache only (meaning all data in flash is eventually destaged to disk) rather than a tier of storage.

Remember that Oracle claims the Exadata Smart Flash Cache can now contain 88TB of data? But if all data on disk must be mirrored, then with ASM “normal redundancy” (i.e. double mirroring) the usable disk capacity with HP disks is just 90TB, according to the datasheet. If you want to perform zero-downtime upgrades then you need “high redundancy” (i.e. triple mirroring) which means even less capacity. What is the point of having less disk capacity than you have flash cache capacity? Clue: there is no point.

Which is where I finally get to my point. Oracle has taken the decision, almost by stealth, to make the Exadata X4 into an all-flash database machine. Except you still have to pay for the disks…

The All Flash Database Machine

Before we go any further, here’s a quote from Oracle’s Vice President of Product Management, Tim Shetler, discussing the increased flash capacity in Exadata X4:

exadata-disk-is-the-new-tape

Yes, that’s right: on Exadata X4, your entire database is now likely to be in flash. Yet in Exadata flash is only ever used as a cache, so the database in question is also going to be located on disk. And because ASM mirroring is required, it will actually be on disk twice – or, if you need zero-downtime upgrades, three times. Three copies on disk and one on flash? That doesn’t seem like the most efficient way to utilise what is, after all, extremely expensive storage.

What about the “inactive, colder data” that remains solely on disk? Well ok… let’s think about that for a minute. The flash cache, according to the sources in the first table above, holds between 88TB and 440TB of data – but, since it’s a cache, that data must be read from a persistent source somewhere. That source is the disks. If your disks contain “inactive, colder data” which doesn’t enter the cache, exactly how is that cache going to be efficiently populated? Keeping inactive data on Exadata’s disks is not only financially ruinous, it impacts the effect of having such an increased flash cache capacity.

Money Talks

dollarsWhat if Oracle ditched the disks and went for an all-flash architecture, as many storage vendors are now doing? Would that be a win for Oracle and it’s customers alike?

Whether it would be a win for customers is something that can be debated. What is undeniable though is the commercial problem Oracle would face if it made a technical decision to ditch the disks. Customers buying Oracle Exadata have to pay for Oracle Exadata Storage Software licenses… and guess what the licensing unit is? You license by the disk. Each storage cell has 12 disks and each full rack has 14 cells, meaning a full rack requires 168 storage licenses. These are currently listing at $10,000 per disk, bringing the total list price to $1.68m per rack.

Hmm. Admitting that the disks are no longer necessary could be an expensive problem, couldn’t it?

About these ads

9 Responses to Oracle Exadata X4 (Part 2): The All Flash Database Machine?

  1. kevinclosson says:

    “Three copies on disk and one on flash?”

    How many copies in flash? What about write-back flash?

    It’s true that only the primary mirror copy of a clean block is stored in cache, but in the case of write-back flash cache one would likely rather not rely on a sole copy of a dirty block in a write-back flash cache.

  2. Erman Arslan says:

    Hi,

    Supposing everything is in flash, because the machine is flash-optimized rather than disk; what will happen if oracle uses smart scans all the time? Will it go to disk ?

    For not going to disk for hot data and use smart scan techniques -> Should we use CELL_FLASH_CACHE KEEP for all the objects to be able to use smart scans? // or maybe we dont need cell_flash_cache_keep in new versions of Exadata to do this?

    Thank you

  3. oranjer says:

    Great articles – thanks!

    Could you write a bit more about how the the flash area is populated? If it’s a cache, does it need ‘warmed up’ like a traditional cache? What happens in the event of DR to a different Exadata with Data Guard – should there be a dip in performance while the standby cache is populated? And what if you have a different Exadata for performance testing – when you restore your production database to your test Exadata, does it need to perform a few warm-up tests before reliable testing can take place?

    Cheers.
    T

    • flashdba says:

      I’m sorry to say that I’m no longer entirely sure, it’s been a couple of years since I walked away from Exadata. However, since each storage cell has its own flash cache, it seems likely you are correct. The cache resides on PCIe flash cards so is persistent, but it is unlikely to contain cached copies of the data you need at the point when you switch over to a standby. The same could also be said of the storage indexes, which are not persistent (they reside in volatile DRAM so must be rebuilt whenever cell services are restarted).

      • oranjer says:

        Thanks for that. I’ll see if I can get an answer out of Oracle. If I learn anything from them I’ll come back here and let everyone know.

  4. flashdba says:

    “It’s fair to say that the majority of customers buy the high performance version”

    I’ve had some good feedback on this, especially from Frits Hoogland and Jason Arneil of Enkitec and e-DBA respectively, to say that these days most customers buy the high capacity disks rather than high performance. Frits made the suggestion, which I can completely believe, that with previous generations most customers bought through Oracle and had the machines sized by Oracle salespeople, who chose the smaller HP disks. In more recent times it has become common to buy through partners, who contribute advice with sizing etc and therefore tend to recommend HC disks.

    This is interesting feedback indeed, although it doesn’t change the comments from Oracle’s PM guys who say that your entire database is likely to sit in flash. Essentially, through a combination of larger flash cards and (licensable) compression techniques, Oracle is aiming to ensure you see all-flash performance no matter how much (licensable) disk capacity is on the backend.

  5. Hi,
    I don’t know if the concern about “database in memory” for Exadata is completely right.
    I didn’t see changes in Exadata Software that makes or move data from disk to flash in a different way for X4. Is the same software and you can use it in V2/X3/X3 version. Maybe in the future we can see this change. But I don’t believe, maybe in the next step after Exadata (choose your name).
    By the way you can change what will be in flash using IORM, for the same database you can define what category will be there. For example, the tables used for batch category can be defined to not stay in flash and the oltp category use flash. Of course that is more complex, because you need to integrate IORM, Database Resource Manager and Database Services. Sorry, but you can do this only in Exadata, define for the same database what stay (or not) in flash, not just what table but based in different costumers of your data.
    In a normal Exadata day you naturally see data flow form disk to flash, and the usage (MB/s and IOPS) is high from flash than disk. I hope/belive that the change from 15K to 10k rmp have more details than just “database in memory”.
    If you think about the usage of OLTP database the compression from flash will be good, more data in the same place (slide 21 of presentation) . And consider that for OLTP your data came from flash more than disk, you can use 10K disks. Your OLTP neither will see the difference. For DW guy’s that use Exadata with HC 7.2k the change to 10k will be good.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 880 other followers