VMware edges out Microsoft in virtualization performance test

Hyper-V's bright spot is a set of drivers that help it support Linux VMs

1 2 Page 7

The cost of virtualization

No one is claiming the buzz about server virtualization is unsubstantiated. It lets you pack multiple operating system instances onto the same hardware that previously only hosted one instance. And it helps in deploying a standard operating system profile across the data center, if that is your goal.

But nothing is free. Hypervisors become the basic operating system of the servers that they virtualize, which taxes performance. Our first test measures the cost of virtualization by comparing transactional performance when an operating system is running on bare metal with the performance of that same operating system when a hypervisor serves as a buffer between the operating system and the system. The difference in performance amounts to a theoretical tax imposed by the hypervisor's innate management role.

In our tests, the performance hit when we moved from a native operating system instance to a virtualized one with a single vCPU allotted, ranged from about 2.5% when ESX was running Windows 2008 to more than 12% when Hyper-V was running SLES. The foundational performance 'cost' of each hypervisor varied, but VMware wins this theoretical round. It's theoretical because there are few cases for running a virtual machine platform with only a single guest limited to a single CPU.

When the number of CPUs made available to a single virtual machine guest climbed, the cost of virtualization varied more widely. When we allowed a single operating instance SMP access to four vCPUs, the lowest price paid - less than 4% - was registered when VMware ESX was supporting a SLES instance. Conversely, the highest operational price paid was a more than 15% hit taken when Hyper-V was supporting a SLES instance.

Overall, Hyper-V also loses this round, but by very little when supporting Windows VMs. It falls down more on SLES, likely because of the fact that the LinuxIC kit isn't available to boost performance results.

Testing VMs with business application loads

The second round of performance tests compares iterative VM application performance as VM machines are added to the system. We tracked performance for one, three and six VMs when supporting approved guests. We measured performance when each VM was allocated its own vCPUs and when each was allowed to tap into four vCPUs. This load test would theoretically amplify performance differences.

Table of tracking performance degradation as VMs are added 

Our test tool of choice was SPECjbb2005 - a widely used benchmark that mimics distributed transactions in a distribution warehouse-like environment. The SPECjbb2005 test uses Java application components running inside a single host or VM instance. The first component simulates a client generating threads to be processed by the second component, a business logic engine that in turn stores and fetches objects in transactions to/from a set of Java Collection objects (emulating a database engine), logging them through a set of iterative transaction cycles. SPECjbb2005 spawns test parameters it chooses based on the number of CPUs found, as well as the available memory in the host. The measured output is in basic operations per second, or bops per period time with the more bops per test run, the better.

We completed multiple runs with each hypervisor, a set where each VM was allocated its own vCPU and a set where each VM was permitted to tap into four vCPUs.

In both cases, we ran tests with one, three and six VMs. We ran each sequences first with Windows 2008 Server as the hosted operating system and then with SUSE SLES 10.2 as the hosted operating system.

The first round used a ratio of one VM guest operating system per vCPU and limited memory access (2GB) for each operating system instance. This resource allocation is typical of what would happen during a server consolidation process, in which older single-CPU machines are consolidated into a physical-to-virtual re-hosting situation.

VMware started out ahead in this race with Windows 2008 and SLES 10.2 virtual performance nearly as fast as native performance, and held close to that pace with three guest operating systems. Hyper-V with three VMs in place was about 1,400 bops off VMWare's pace with Windows 2008 guests and 1,800 bops down from ESX mark with SLES VMs.

At six VM guests, both hypervisors are starting to struggle to deliver performance comparable to what a native operating system running directly on the server can pull off. But Microsoft kept its performance drop a bit more in check as it appears to have mastered a more linear distribution of hypervisor resources when VMs get piled on.

In reality, consolidated instances aren't necessarily as burdened at the pace we placed on the instance by running concurrent SPECjbb2005 tests. Many operating system and application instances typically have far less constant CPU utilization than SPECjbb2005 places on them, and the utilization is often more random in nature. We've stressed the VMs and the hypervisors supporting them to amplify how each hypervisor reacts under enormous loads.

Table of Tracking performance degradation as VMs are added in a  symmetric-multiprocessing state

In the second round of iterative VM tests we allowed each VM to have access to four vCPUs, the maximum allowed by either hypervisor under test. Each VM was still limited by 2GB of memory as it's a common ceiling when consolidating and testing an operating system. This test scenario more readily demonstrates how VMs would be used in virtualized database applications, rendering farms, high-volume transaction systems and other applications needing strong CPU availability.

As before, we started with a single VM guest to establish a baseline, then added two more VMs for a total of three instances, then three more for a total of six VMs. In the first test, as we noted in our cost of virtualization test, VMware pulls slightly ahead when hosting Windows 2008 clients and has almost an 1100 bops advantage when hosting SLES 10.2 VMs. Because Microsoft's LinuxIC kit isn't supported for SMP environments, Hyper-V's performance with SLES is dampened without the boost it provided in the tests where we could allocate a single vCPU to each VM.

In the test where three VMs were each using four vCPUs, 12 vCPUs were in play. Because there were 16 physical CPU cores on the server in our test bed that could be virtualized by the hypervisors under test, there were four CPUs sitting idle. Hyper-V pulls ahead of VMware ESX in this instance with on average 6,500 more bops. Our test results suggest that Hyper-V could see those extra available hardware resources and tapped into them, whereas ESX could not.

This advantage is lost, however, when we oversubscribe as we did in the final round of testing. Oversubscription is a method that allocates more physical CPU than is available, allowing VMs to "share" their allocated vCPUs with other VM guests. It's a process that is useful when VMs are running applications that use CPU power randomly, as it lets you stuff more VMs while hopefully (dependent on guest activities) offering performance at or above what the guests did before they were virtualized.

Six VM guests each using four vCPUs oversubscribes the 16 physical CPU cores in our test rig. Both hypervisors are starting to buckle under an extreme load as CPU power is at a premium in this stressful test. But VMware seems to deal with oversubscription better than Hyper-V as it could still pull down an average of 16,136 bops with Windows 2008 guests (compared with Hyper-V's 14,588 bops) and 17,089 bops with SLES guests (compared with Hyper-V's 11,122 bops). Microsoft also is slightly disadvantaged in oversubscription because a native instance of Windows 2008 Server (we used Enterprise Edition) needs to be active to run the Hyper-V hypervisor system - using up its own space and CPU.

The disk I/O seen in a VM light

We also tracked disk throughput of hosted VMs with Intel's IOMeter (pre-compiled Windows and Linux versions). IOMeter exercises disk subsystems by spawning worker threads that read and write to the subsystem in a tester-defined routine. Measurements are summarized in terms of IOs per second as recorded by IOmeter at the end of a test run. The results are expressed in terms of IO's per second. A higher number of IOs is better.

In a virtualized world, VM guest instances must contend with either internal disk or storage-area network resources. When the hardware is re-represented to guest operating systems through virtualization, the hypervisor layer between the hardware and guest VMs uses its own disk driver to manage disk activity. Adding virtualized guests divides the hardware resources among the guest VM operating system/applications instance. Even though native operating system drivers might be good, the ability for a hypervisor to manage the communication needs among a number of guests becomes a very sophisticated business, and latency and efficiency issues will be seen as application performance slow-downs.

We ran IOmeter in each VM instance to gauge how the hypervisor could "breathe" data to disk. We used a tougher-than-real-world ratio of 70% writes vs. 30% reads. We favored writes in our configuration because they aren't heavily cached by the operating system (so their contents don't evaporate during power outages or hardware resets), and read-based cache can distort measurements.

Table of disk I/O results with VMs accessing a single vCPU

We established the I/O performance of a native operating system (in both single and SMP servers) to establish a baseline of the operating system's disk I/O speed as measured by IOMeter. We then ran the same tests on each of our hypervised environments with six VM guests. We wanted to know if the hypervisor could offer more disk channel availability to VM guests than they could use on their own as native instances.

The good news is that our tests show both hypervisors could pump up the disk channel at rates greater than a single native instance could when we added more guest VM instances. This means hypervisors controlling the disk channel (an HP Smart Array in our case) can do a good job of cramming that channel when the number of VM guests increases.

Table of disk I/O results with VMs accessing four vCPUs

In the hosted SLES results where each VM accessed a single vCPU, we again saw that Hyper-V VM guest instances get a formidable boost from the Microsoft Linux IC as SLES Linux VMs ran faster on Hyper-V than on VMware ESX. When we tested to see if SLES without the LinuxIC kit would be slower, we found it was essentially the same (within a single percent) as VMware ESX's performance. When we ran this test on Hyper-V without the LinuxIC kit, the average I/O for an SLES VM was 83.78 I/Os per second, about 5% faster than VMware's disk throughput with SLES.

However, Hyper-V doesn't fare as well in delivering disk I/O to its own Windows 2008 Server. VMware lapped Microsoft with six Windows 2008 VMs loaded up.

When we measured, disk I/O activity in an SMP environment - where each of our six VMs was allocated four vCPUs - we intentionally oversubscribed the server to see if the hypervisors could sustain their disk channel activity when given a volume of disk demand from each guest. As a hypervisor is an operating system of its own, it must carefully reallocate disk writing time and switch contexts among guests cleanly and efficiently.

In these tests, both hypervisors achieved more I/O performance than a native operating system running on bare metal. But VMware ESX is the clear winner. When hosting Windows 2008 VMs it registered 1733.63 I/Os per second compared with Hyper-V's 874.29 I/Os per second and the native performance of 712.97 I/Os per second. But it also beat out Hyper-V in the hosted SLES environment by a narrow margin of about 45 I/Os per second. Hyper-V no longer has the advantage of the LinuxIC kit, which doesn't support SMP hardware.


VMware's initial lead in the marketplace has given it a performance lead in most of the areas that we tested, although Microsoft's prowess is beginning to show in a core area - consolidation of single-CPU focused VM performance. Both vendors are likely to improve their performance numbers rapidly, as it's a source of strong competition between them. Biting at their heels are offerings from Citrix, Sun and Red Hat, as well as open source developments that are reaching commercial potential. VM performance is certainly an area to keep an eye on.

Henderson and Allen are researchers for ExtremeLabs. They can be reached at thenderson@extremelabs.com.

NW Lab Alliance

Henderson is also a member of the Network World Lab Alliance, a cooperative of the premier reviewers in the network industry each bringing to bear years of practical experience on every review. For more Lab Alliance information, including what it takes to become a member, go to www.networkworld.com/alliance.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.
1 2 Page 7
Take IDG’s 2020 IT Salary Survey: You’ll provide important data and have a chance to win $500.