With the recent release of Microsoft's Hyper-V shaking up the hypervisor market, we decided to conduct a two-part evaluation pitting virtualization vendors against each other on performance as well as on features such as usability, management and migration.
Podcast: Virtualization game on: Microsoft vs. VMware
Microsoft and VMware accepted our invitation, but the open source virtualization vendors - Citrix (Xen) and Red Hat (Linux-based hypervisor) - were unable to participate because they are undergoing product revisions. That left us with a head-to-head matchup between Microsoft's Hyper-V and VMware's market-leading ESX.
The findings here focus on hypervisor performance. A second installment coming later this month will take usability, management and migration features into account.
The question of which hypervisor is faster depends on a number of factors. For example, it depends on how virtual machine (VM) guest operating systems are allocated to the available host CPUs and memory. It also depends on numerous product-specific limitations that can restrict performance.
That said, VMware ESX was the overall winner in this virtualization performance contest - where we were limited to running six concurrent VMs because of the combination of our server's processor cores and memory capacity, and the limitation of the hypervisors we tested. ESX pulled down top honors in most of our basic load testing, multi-CPU VM hosting, and disk I/O performance tests.
Microsoft's Hyper-V, however, did well in a few cases, namely when we used a special set of drivers released by Microsoft to boost performance of the only Linux platform Hyper-V officially supports: Novell's SuSE Enterprise Linux.
VM hypervisors are designed to represent server hardware resources to multiple guest operating systems. The physical CPUs (also called cores) are represented to guest operating systems as virtual CPUs (vCPU). But there isn't necessarily a one-core to one-vCPU relationship. The exact ratio depends upon the underlying hypervisor. In our testing, we let the hypervisor decide how to present CPU resources as vCPUs.
The operating systems "see" the server resources within the limitations imposed by the hypervisor. As an example, a four CPU-core system might be represented as a single CPU to the operating system, which will then have to live on just that CPU. In other cases, four CPUs may be virtualized as eight vCPUs, in a scenario in which quieter VMs aren't likely to frequently use peak CPU resources. Other constraints can be imposed on the VMs as well, such as those pertaining to disk size, network I/O, and even which guest gets to use the single CD/DVD inside the server.
One frustrating performance limitation imposed by both Hyper-V and ESX is that the number of vCPUs that can be used by any single VM is four, no matter the type or version of that guest operating system instance or how many physical cores might actually be available. Furthermore, if you choose to run 32-bit versions of SLES 10 as a guest operating system, you will find that Microsoft only lets those guests have a single vCPU.
The limitations imposed by the hypervisor vendors on the number of available vCPUs come from two areas. First, keeping track of VM guests with very large CPU needs also involves enormous memory management and large amount of inter-CPU communications (including processor cache, instruction pipelines and I/O state controls) that are exceedingly difficult. Secondly, the demand for VM guest hosting has been perceived to be a server consolidation action - and servers that need consolidating are often single CPU machines.
These limitations in hypervisor hardware resource allocations set the stage for how we could take advantage of the 16-CPU HP DL580G5 server in our test bed (see How we did it).
As previously noted, Microsoft officially supports its own operating systems and Novell's SLES 10 (editions running Service Packs 1 and 2) as guest instances. That accounts for why we tested with only Windows 2008 and SLES 10.2 VMs. Other operating systems (Red Hat Linux, Debian Linux and NetBSD) may work, but organizations seeking debugging or tech support are on their own if they use them.
While we were testing, Microsoft introduced its Hyper-V Linux Interface Connector (Hyper-V LinuxIC) kit, which is a set of drivers that help optimize CPU, memory, disk and network I/O for SLES guest instances. We did see a boost in performance with the kit in place, but only in the case of one vCPU per guest. Hyper-V LinuxIC isn't supported for SMP environments.
The cost of virtualization
No one is claiming the buzz about server virtualization is unsubstantiated. It lets you pack multiple operating system instances onto the same hardware that previously only hosted one instance. And it helps in deploying a standard operating system profile across the data center, if that is your goal.
But nothing is free. Hypervisors become the basic operating system of the servers that they virtualize, which taxes performance. Our first test measures the cost of virtualization by comparing transactional performance when an operating system is running on bare metal with the performance of that same operating system when a hypervisor serves as a buffer between the operating system and the system. The difference in performance amounts to a theoretical tax imposed by the hypervisor's innate management role.
In our tests, the performance hit when we moved from a native operating system instance to a virtualized one with a single vCPU allotted, ranged from about 2.5% when ESX was running Windows 2008 to more than 12% when Hyper-V was running SLES. The foundational performance 'cost' of each hypervisor varied, but VMware wins this theoretical round. It's theoretical because there are few cases for running a virtual machine platform with only a single guest limited to a single CPU.
When the number of CPUs made available to a single virtual machine guest climbed, the cost of virtualization varied more widely. When we allowed a single operating instance SMP access to four vCPUs, the lowest price paid - less than 4% - was registered when VMware ESX was supporting a SLES instance. Conversely, the highest operational price paid was a more than 15% hit taken when Hyper-V was supporting a SLES instance.
Overall, Hyper-V also loses this round, but by very little when supporting Windows VMs. It falls down more on SLES, likely because of the fact that the LinuxIC kit isn't available to boost performance results.
Testing VMs with business application loads
The second round of performance tests compares iterative VM application performance as VM machines are added to the system. We tracked performance for one, three and six VMs when supporting approved guests. We measured performance when each VM was allocated its own vCPUs and when each was allowed to tap into four vCPUs. This load test would theoretically amplify performance differences.
Our test tool of choice was SPECjbb2005 - a widely used benchmark that mimics distributed transactions in a distribution warehouse-like environment. The SPECjbb2005 test uses Java application components running inside a single host or VM instance. The first component simulates a client generating threads to be processed by the second component, a business logic engine that in turn stores and fetches objects in transactions to/from a set of Java Collection objects (emulating a database engine), logging them through a set of iterative transaction cycles. SPECjbb2005 spawns test parameters it chooses based on the number of CPUs found, as well as the available memory in the host. The measured output is in basic operations per second, or bops per period time with the more bops per test run, the better.
We completed multiple runs with each hypervisor, a set where each VM was allocated its own vCPU and a set where each VM was permitted to tap into four vCPUs.
In both cases, we ran tests with one, three and six VMs. We ran each sequences first with Windows 2008 Server as the hosted operating system and then with SUSE SLES 10.2 as the hosted operating system.
The first round used a ratio of one VM guest operating system per vCPU and limited memory access (2GB) for each operating system instance. This resource allocation is typical of what would happen during a server consolidation process, in which older single-CPU machines are consolidated into a physical-to-virtual re-hosting situation.
VMware started out ahead in this race with Windows 2008 and SLES 10.2 virtual performance nearly as fast as native performance, and held close to that pace with three guest operating systems. Hyper-V with three VMs in place was about 1,400 bops off VMWare's pace with Windows 2008 guests and 1,800 bops down from ESX mark with SLES VMs.
At six VM guests, both hypervisors are starting to struggle to deliver performance comparable to what a native operating system running directly on the server can pull off. But Microsoft kept its performance drop a bit more in check as it appears to have mastered a more linear distribution of hypervisor resources when VMs get piled on.
In reality, consolidated instances aren't necessarily as burdened at the pace we placed on the instance by running concurrent SPECjbb2005 tests. Many operating system and application instances typically have far less constant CPU utilization than SPECjbb2005 places on them, and the utilization is often more random in nature. We've stressed the VMs and the hypervisors supporting them to amplify how each hypervisor reacts under enormous loads.
In the second round of iterative VM tests we allowed each VM to have access to four vCPUs, the maximum allowed by either hypervisor under test. Each VM was still limited by 2GB of memory as it's a common ceiling when consolidating and testing an operating system. This test scenario more readily demonstrates how VMs would be used in virtualized database applications, rendering farms, high-volume transaction systems and other applications needing strong CPU availability.
As before, we started with a single VM guest to establish a baseline, then added two more VMs for a total of three instances, then three more for a total of six VMs. In the first test, as we noted in our cost of virtualization test, VMware pulls slightly ahead when hosting Windows 2008 clients and has almost an 1100 bops advantage when hosting SLES 10.2 VMs. Because Microsoft's LinuxIC kit isn't supported for SMP environments, Hyper-V's performance with SLES is dampened without the boost it provided in the tests where we could allocate a single vCPU to each VM.
In the test where three VMs were each using four vCPUs, 12 vCPUs were in play. Because there were 16 physical CPU cores on the server in our test bed that could be virtualized by the hypervisors under test, there were four CPUs sitting idle. Hyper-V pulls ahead of VMware ESX in this instance with on average 6,500 more bops. Our test results suggest that Hyper-V could see those extra available hardware resources and tapped into them, whereas ESX could not.
This advantage is lost, however, when we oversubscribe as we did in the final round of testing. Oversubscription is a method that allocates more physical CPU than is available, allowing VMs to "share" their allocated vCPUs with other VM guests. It's a process that is useful when VMs are running applications that use CPU power randomly, as it lets you stuff more VMs while hopefully (dependent on guest activities) offering performance at or above what the guests did before they were virtualized.
Six VM guests each using four vCPUs oversubscribes the 16 physical CPU cores in our test rig. Both hypervisors are starting to buckle under an extreme load as CPU power is at a premium in this stressful test. But VMware seems to deal with oversubscription better than Hyper-V as it could still pull down an average of 16,136 bops with Windows 2008 guests (compared with Hyper-V's 14,588 bops) and 17,089 bops with SLES guests (compared with Hyper-V's 11,122 bops). Microsoft also is slightly disadvantaged in oversubscription because a native instance of Windows 2008 Server (we used Enterprise Edition) needs to be active to run the Hyper-V hypervisor system - using up its own space and CPU.
The disk I/O seen in a VM light
We also tracked disk throughput of hosted VMs with Intel's IOMeter (pre-compiled Windows and Linux versions). IOMeter exercises disk subsystems by spawning worker threads that read and write to the subsystem in a tester-defined routine. Measurements are summarized in terms of IOs per second as recorded by IOmeter at the end of a test run. The results are expressed in terms of IO's per second. A higher number of IOs is better.