Practical VM Architecture: Highly Available
Now that we understand how to handle managing VM sprawl, and we know we need to scale an application horizontally, let's take the next step and build a VM infrastructure that's capable of fault-tolerant operation.
Fault tolerance means that a physical machine failure (fault) will not interrupt the operation of any running services. This isn't practical if the hardware actually fails, since VMs running on it will immediately crash, but if you're wanting to perform maintenance or if you sense a non-instant failure looming, you can migrate VMs to other servers. A cluster can, however, automatically restart crashed VMs. Before we get too far ahead of ourselves, let's first explore the concept behind VM migration. Manually or automatically migrating VMs requires some special configuration.
There are essentially two requirements, or layers, to setting up HA VM environments. First, you must have shared storage and a clustered file system. VMFS in VMware, or GFS in RedHat Linux will do the trick. Second, you need something to manage the cluster, such as VirtualCenter from VMware, or the RH Cluster Suite, which will keep track of where VMs are running and perform migrations. Sounds easy enough, right?
Well, it's not easy with RedHat. As much as it pains me to say, this stuff just isn't production ready without the sysadmin becoming extremely familiar with the quirkiness. Things like documented GFS commands returning "not yet implemented," or 'clustat -v' returning "clustat version DEVEL," or Conga generating a cluster.conf that the other tools claim is invalid, in a stable release of RHEL 5.2, make me stare blankly at the terminal. This is the nature of Open Source, and the good news is that most issues will be fixed quickly. That said, you can get it working.
VMware, on the other hand, tightly controls their product, and it generally works as advertised. Seriously, this realization pains the gooey OSS center in me, but it's true.
Now, just what do we mean by "migrations" anyway? Migrating a VM from one server to another can be done one of two ways: live or not. Non-live migration of a VM involves the copying of a VM's disk image to another server, and then starting it up on the new hardware. This can be automated, but it results in the VM having to shut down and then boot up again. To perform a live migration with zero downtime, you must be on a shared file system so that disk image copying isn't necessary. Then, the program managing the migration, must copy the entire memory space of the VM over to the new server, and quickly "start" the VM. This happens without interruptions to the OS or the applications running, and without the end-users ever knowing.
Clustered File Systems
You can't just use NFS to share out the disk images to all your servers. It's too slow, and locking issues would be unbearable during frequent migrations. We need a clustered file system.
Conceptually, a clustered file system is a file system that supports multiple operating systems mounting and writing to it at the same time. It's very tricky business, and getting it wrong will instantly corrupt the file system your VMs are stored on.
Fencing is required to ensure that your file system does not corrupt. Fencing means literally the fencing, or isolating, of a cluster node. Fencing is abrupt; most methods of fencing will talk to the hardware management interface of a server (via IPMI most likely) and immediately remove power from the server. A cluster will generally decide that a node needs to be fenced when it stops responding to heartbeat messages. Cluster nodes can also fence themselves if they discover an inconsistency in the file system they weren't expecting. The details of setting up fencing can be tricky, so be sure to understand all your options for your given platform.
You have two options these days for creating a clustered volume: iSCSI or a FC SAN. The iSCSI route is certainly cheaper, and should perform sufficiently as long as your VMs aren't doing tons of I/O. As we mentioned last week, a busy database server is not a candidate for virtualization.
After configuring the LUNs to be accessible on multiple hosts, you then create the clustered file system and make it available on all nodes. The steps vary based on some decisions you make. You can choose to use CLVM or not, and you also get to decide if GNBD is right for you. Spend some serious time reading all of the RedHat documents before even setting up a test environment. It's frequently the case that you find out after the fact you've implemented something incompatible with another critical technology you needed to use.
RedHat Cluster Suite is stable, it and mostly works. It can manage services, which for the sake of this article are Virtual Machines. You can define which services run primarily on which physical servers, and the cluster will automatically restart services if they crash or disappear. This ability to ensure that VMs can survive hardware failure, or at least be automatically restarted to minimize downtime, is the minimum required functionality. Manually migrating VMs with zero downtime is also extremely useful in a production environment. With careful configuration, RHCS does this well.
VMware's VMotion is much more evolved. It can do all of that, and with VirtualCenter's help, you can configure load shuffling rules. Too much CPU load or RAM utilization on server A? Move a VM or two to server B. Vice-versa is possible as well; it can power off unneeded servers.
Both technologies strive to achieve the same thing, which in the end amounts to many extremely intelligent service-specific VMs that maintain overwhelming uptime levels. This is only the beginning of autonomic computing and self-healing infrastructures.
In conclusion, we must say that VMware is recommended for mission-critical applications. The RH Cluster Suite is getting there, and if you are budget constrained, use it. Just make sure you understand the limitations, and test every failure scenario you can think of before putting your RH cluster into production.