Database Consolidation Part 2 – Shared Infrastructure Design Choices

Part one was all about the business drivers and technical challenges faced when building a database consolidation platform. Database consolidation is all about sharing infrastructure, so part two is about the design choices that are available…

An important architectural decision when consolidating databases is that of where the shared infrastructure should diverge. If we assume that your customers are applications which require a database service, at what point should each application be segregated from the others? Obviously you want to use the same underlying hardware, but what about the OS? What about the storage, do you want to segregate the data into different volumes on different LUNs? Maybe you want to share right at the top and just have different application schemas in one big container database?

Let’s have a look at the three main choices available:

Multi-Tenancy databases
Shared Platform databases
Virtualisation

A multi-tenancy database is a database which contains main different applications, each with their own schema. In many ways this model makes a lot of sense, since it allows for the highest level of resource sharing and an almost-zero deployment time for new schemas. And after all, Oracle is designed to have multiple users and schemas; the database resource manager allows for a level of QoS (quality of service) to be maintained whilst features such as Virtual Private Database can be used to enhance the security levels. Oracle allows for services to be defined which can then be controlled and relocated on a clustered database. Why not opt for this method? In fact, some customers do – although the vast majority don’t. The reasons for avoiding this method are further up this page, under the heading “Technical Challenges”. A single big database is a big single point of failure. You don’t want to hit an ORA-600 and see the whole thing come crashing down if it’s a container for your entire application estate! Say someone accidentally truncates a table and wants the whole database rolled back so they can retrieve their data, how can you work that situation out? Maintenance becomes a nightmare – can you really have all of your applications on the exact same release and patchset of Oracle? What about testing… Say one of your applications requires a patch for the optimizer, how do you go about testing every other application to ensure they are not affected? And security… it only takes one mistaken privilege to be granted and everything is exposed… do you really trust this model?

A shared platform database model provides segregation at the database level, so that a cluster of hardware (for example a six-node cluster running Oracle Grid Infrastructure) then runs different databases. This allows for a wide-ranging variety of database versions and patchsets to be run on the same platform, which is far more practical and makes the security issues far easier to cope with. Of course, it’s not without its challenges either. Firstly, there are still components that cannot be upgraded without affecting large groups (or all) of the customers: the operating systems, the Grid Infrastructure software, firmware for various components etc. Then there are the additional resource requirements for running multiple databases: extra RAM to cope with all of the SGAs and PGAs, extra CPU capacity to cope with all the additional processes from each instance, extra storage for all of those temporary and undo tablespaces, the online and archive redo logs, the SYSTEM and SYSAUX tablespaces. Maintenance requirements also increase, because although you can upgrade or patch each database independently you now have many more databases to upgrade / patch. This means administrative time increases dramatically – although you can combat this with the use of enterprise management tools such as Oracle Enterprise Manager.

An environment which uses virtualisation is perhaps the strongest design model. Virtualisation products have matured significantly in recent years to the point that they are now being used not just in non-database production environments but now for databases as well. Traditionally this is been a difficult subject for DBAs due to Oracle’s support policy for databases running on VMWare. This has softened considerably in recent years but Oracle still reserves the right withdraw support for an issue unless it “can be demonstrated to not be as a result of running on VMware”. Of course, Oracle has its own virtualisation product Oracle VM (which I have to say I actually really like) where support is not an issue, but I suspect that it has a far smaller share of the market than VMWare (although you wouldn’t know it from the aggressive marketing…). The great thing about virtualisation is that you have inherent security based on the segregation of each virtual machine. Maintenance becomes a lot easier because even OS upgrades can take place without affecting other users, whilst VMs can be migrated from one physical stack to another in order to perform non-disruptive hardware maintenance. Deployment and provisioning becomes easier as virtualisation products like VMWare and OVM are designed with these requirements in mind; the use of templates and the cloning of existing images are both great options. Similarly, expansion both at the VM level and across the whole platform is a lot easier. On the other hand, licensing (particularly of Oracle products) isn’t always clear (but then when is it?). The main challenge though is capacity, because now you not only have to consider all of those database SGAs and PGAs but also the operating systems and their various requirements, from root filesystems to swap files. Again I will talk about this in the second post on this topic.

Finally… there is a fourth model, which I haven’t mentioned here because it almost certainly won’t apply to the majority of people reading this. The fourth model is schema-level multi-tenancy, as used by the likes of Software-as-a-Service companies, whereby a single application is shared by multiple customers each of which only see their slice of the data. This is really an application-based consolidation solution, where each user or set of users only has visibility of their data despite it being stored in the same tables as that of other users. The application uses unique keys and referential integrity to lookup only the correct data for each user, leading to the security ramification that your data is only as secure as the developer code written to extract it for you. I once worked on one of these systems and discovered a SQL injection issue that allowed me to view not only my data but that of anyone whose userID I could guess. Of course there are products such as Oracle’s Virtual Private Database that can be used to provide additional levels of protection.

The reason I mention this fourth model is that Larry Ellison attacked Salesforce.com for using a variant of this model and said that multi-tenancy “was the state-of-the-art 15 years ago”, whilst talking up the Oracle Public Cloud for using virtualisation as a security model. According to Larry, multi-tenancy “puts your data at risk by commingling it with others”. Now, I don’t know Salesforce’s database design so I don’t know how well it fits into my description above (I have some friends who work for Salesforce though so I do know that they employ great developers!)… but what I do know is Exadata. And Exadata, along with the Super Cluster, is the platform for Oracle’s “Private Cloud” offering (details of which you can read about here). Exadata, however, has no virtualisation option. You cannot run OVM on Exadata, so if you read Oracle’s Exadata Database Consolidation white paper, it’s all about building the shared platform model I talked about above. To me, that doesn’t really fit in with Larry’s words on the subject.

Scale works in both directions…

One final thought for this section. If you build a DaaS environment and get all of your automated provisioning right etc you will make it very easy for your users to build new applications and services. That’s a good thing, right? But don’t forget to spend some time thinking about how you are going to ensure that this thing doesn’t grow and grow out of control. Ideally you need some sort of cross-charging process in place (I could probably write another whole article on this at some point, it’s such a big topic) but most of all you need to have a process for decommissioning and tearing down applications and databases that have exceeded their shelf life. If you don’t have that, you will find that all of your infrastructure cost savings are very short lived…!

That’s it for part two. In part three I will be discussing the capacity requirements of a consolidation platform. And you won’t be surprised to hear that flash is going to make an appearance soon, because flash memory is the perfect fit for a consolidation environment. Don’t believe me? Wait and see…

Database Consolidation Part 1 – Business Drivers and Technical Challenges

Database consolidation has been a big trend in the industry for a while now. You can see this if you read the IT press, or if you listen to the relentless procession of people queueing up to talk about the “cloud”. I saw it in my time at Oracle, where we had an increasing number of customers come and talk to us about the pressures of running thousands of independent databases, all on their own servers, all taking up vast amounts of data centre real estate and acting like a dead weight around the neck of their IT organisations.

Of course, just rounding up all of your databases and sticking them on some big iron isn’t really going to bring you many benefits. The real benefits of a consolidation exercise come when you use it as a way of repositioning your databases as service offerings. These come under various guises: the “As A Service” models: Database-as-a-Service (DaaS), Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS); the On-Demand models (e.g. Amazon’s Relational Database Service); and the ubiquitous cloud offerings (e.g. the Oracle Cloud). (My first job working with On Demand services was in 2003, so I still can’t use the term “cloud” without it coming out sounding as if I’m being sarcastic… I don’t mean to, but why does everyone always talk as if it’s a brand new idea?!)

Anyway, I’m going to make a bold claim and say that, at least to a DBA, it’s all the same thing. To me, database consolidation is about reducing vast estates of physical database servers into smaller, more tightly-managed groups of databases that can provide predefined services. (Oh and labelling them as a “cloud”, because if that word isn’t used at least a hundred times a day in our industry the world would end…) It’s also about turning your databases into a well-defined service – and therefore turning your users into your customers, even if they are actually part of your own organisation.

No matter what you call it, you can always spot a database consolidation exercise by the business drivers and the technical challenges.

Business Drivers

Cost reduction
Increased agility
Reduced complexity
Higher service levels

The cost reduction piece seems obvious – a smaller number of servers cost less to buy and run than a larger number, right? But it’s often misunderstood just how much of a cost saving can be made. Don’t just think about the servers, think about the savings in data centre footprint, in power and cooling. Think about the reduced administration costs, particularly if you design your service properly (i.e. the agility angle). Now think about the potential reduction in license costs. And an often-overlooked area of saving is the reduction in failures and outages caused by having a tightly designed and standardised operating model (i.e. the reduced complexity angle).

Increased agility is as important as cost reduction, something which may come as a surprise to those who are used to concentrating on technical rather than business challenges. To a CIO, the ability to react quicker, to take advantage of new opportunities as soon as they become apparent, is equally as important as controlling the bottom line. In a well-implemented database-as-a-service offering, deployments of new databases / services are fully automated. Automatic provisioning has to be a default requirement in the design. Likewise the ability to automatically scale (up or down) in order to meet changing demand is a must. That scaling needs to be possible on two levels: at the individual database level to meet the developing requirements of each “customer” and at the macro level to expand (or contract) the capability of your DaaS offering depending on overall demand.

It may not always seem like it at first, but the consolidation of your databases onto a DaaS platform should result in reduced complexity. Why? Well because at the heart of any consolidation exercise must be standardisation. Every large IT organisation has a plethora of different databases running different versions on different operating systems. No matter how stringent your deployment procedures are, it’s guaranteed that if your databases are built manually then each one will have a subtle difference based on a) when it was built, b) who built it, and c) what sort of day they were having at the time. Human beings are complex creatures, they behave in unexpected ways – the only way to have true consistency is to have your database deployment automated. And then there are the systems that you may have inherited, perhaps as the result of an acquisition or departmental reorganisation. You know the ones, they are usually sat in the corner untouched and unloved, because nobody dares go near them in case they break. In a DaaS environment every database is, at least outwardly, identical. This means that as a DBA you don’t have to worry about the way you treat them – what you can do with one database you can do with any of them. It’s all about manageability.

And the outcome of that reduced complexity must therefore be higher service levels. You can pretty much guarantee that any organisation with 1000 databases all running on similar path levels on the same OS, using the same file layouts, with automated management scripts to deploy them or tear them down (and perhaps even to patch them) will deliver a higher uptime than an organisation with a multitude of different databases on different operating systems, each one of which has its own subtleties and nuances.

Technical Challenges

So now that we’ve covered why it’s worth doing, what are the challenges associated with actually doing it? I’ve had a lot of exposure to DaaS and consolidation environments, both at Oracle (hands on in a support role) and in my new role at Violin (in a technical presales capacity). One particular experience which serves me well is the four years I spent working on British Telecom’s DaaS environment for Surren Partabh, who is BT’s CTO of Core Technologies. When it came to DaaS, BT were light years ahead of the game – their mutiple DaaS environments have been in place for years already and support many hundreds of databases. There is an interesting case study about BT DaaS here – if you are considering a consolidation exercise (and you can ignore the author’s overuse of the word “cloud”) then it’s well worth a read. As Surren says, “Our Oracle Database 11g consolidation has enabled us to reduce our server sprawl, deploy databases faster, and operate with 20% fewer DBA’s”.

So what are the challenges?

Availability
Capacity
Security
Maintenance

Availability is a challenge, not really in a technical sense (at least not any more than normal) but because of the increase in risk. When you consolidate your databases you put all of your eggs in one basket. If you have a large part of your business dependent on your DaaS platform and it takes a plunge, the pressure is truly going to be on. Having said that, my experience is that availability increases during database consolidation. HA and DR are easier to plan for and incorporate into a DaaS design than on the ad-hoc basis of a siloed database environment. Extensive backup and DR solutions cost money, which means that inevitably you end up with databases in your environment whose HA characteristics you are not always comfortable with. When you have all your eggs in the aforementioned basket it becomes impossible to argue about whether good backup solutions, HA and DR etc are worth the investment. Consequently you can achieve economies of scale by implementing a single solution across your whole environment – with the happy consequence that systems which may not have qualified for this level of service if they were independent end up getting a free ride. One thing to remember about consolidation though: test your backups, test your HA and test your DR… test it again and again. I know what it’s like to lose >50 production databases in one single calamity – and I can promise you it’s not a nice place to be.

Capacity for me is the biggest challenge of all. In fact it’s so critical to the idea of database consolidation that it is the reason I started writing this blog entry. Don’t forget that capacity isn’t just about disk space, it’s about CPU resources, it’s about memory, networking, IO requirements… essentially everything that is a finite resource. Capacity is something you have to plan for when you design and build a DaaS environment; get it wrong in one direction (too much) and you won’t achieve those cost savings that were one of the driving forces behind the whole exercise… get it wrong in the other direction (too small) and that cherished availability will be compromised, possibly affecting your entire solution. In fact, capacity planning for database consolidation is such an important topic that having started this blog entry with it in mind, I am going to give it its own entry entirely…!

Security is a challenge which has similar characteristics to those I described for availability. By putting all of your databases on one platform you increase the risk – security therefore needs to be strictly controlled. At a very simplistic level, unauthorised acquisition of administrator privileges on a consolidated environment could lay open your entire data estate. Compliance is another potential issue: things are complicated by environments where different databases have different regulatory or legal requirements. For example, if one of the databases on a DaaS system needs to meet Payment Card Industry standards then all of the underlying architecture will be affected, potentially resulting in all of the databases needing to meet PCI standards. Of course, as with availability, this can work in your favour because if you design the system with security and compliance in mind, you may find that databases which were previously somewhat lacking in the security department are dragged kicking and screaming into a compliant state (often under the threat of being cast out of the environment if they fail to comply). The other major consideration for security is the use of virtualisation. By placing each database in its own virtual environment, an additional layer of security can be wrapped around it, effectively segregating it from its neighbours whilst still retaining the benefits of a shared infrastructure. This is a massive trend in the industry now and is something that, I believe, is inevitable for most enterprise database environments.

And finally we come to maintenance. I cannot emphasise enough how important it is to define the maintenance strategy of a DaaS / consolidation environment before you implement it. Most vendors now, whether they be software (e.g. Oracle), operating system or hardware (server, storage, network etc) are focussed on providing zero-downtime products capable of non-disruptive maintenance. But no matter how much you spend, there will inevitably be times when you need to take a planned outage. And of course, with all your internal customers now using the same shared infrastructure, that downtime is going to have quite an effect. Here is what is going to happen if you don’t plan to avoid it up front: your DaaS environment has 26 databases on it, labelled A to Z. The application owner of A is hitting a problem which, unfortunately, requires maintenance on the underlying infrastructure. This patch, firmware upgrade, whatever it may be, requires downtime which will take the service offline. You were promised by all your vendors that their products would never require downtime… but hey guess what? So you go to application owner B and you say I need to take the system down this weekend – and he says “No way, not this weekend – we have a critical application upgrade planned. Can you wait until the weekend after?”. So you tell this to application owner C and she says, “We have our upgrade the week after – we already had to delay it because of B so we cannot wait any longer”. Trust me that you will never get as far as Z! You could pull rank of course, so you go to the CTO and say, “These guys are driving me mad, can you help me out?” but the CTO says, “What are you crazy, it’s the quarter end this month – we can’t do any of this stuff!”. Here is my advice to anyone implementing a DaaS or consolidation environment: Define maintenance windows into the service agreement, then make your internal customers sign up to these terms before they are allowed on to your platform. If they don’t agree to these maintenance cycles then they need to go and build their own system! This is also another argument for virtualisation, because – although it doesn’t completely solve the problem – adding an extra layer of abstraction down at the hypervisor level allows for everything above that to be treated independently.

Those are the technical challenges, but what about the design choices? There are three (or four depending on your view) architectural methods of achieving a consolidation or DaaS platform. In part two of this series I will examine them and have a look at the benefits and pitfalls associated with each. If you made it this far you will be delighted to hear that I am only just started…

Database Consolidation Part 2 – Shared Infrastructure Design Choices