Hyper-V's bright spot is a set of drivers that help it support Linux VMs
With the recent release of Microsoft's Hyper-V shaking up the hypervisor market, we decided to conduct a two-part evaluation pitting virtualization vendors against each other on performance as well as on features such as usability, management and migration.
Podcast: Virtualization game on: Microsoft vs. VMware
Microsoft and VMware accepted our invitation, but the open source virtualization vendors - Citrix (Xen) and Red Hat (Linux-based hypervisor) - were unable to participate because they are undergoing product revisions. That left us with a head-to-head matchup between Microsoft's Hyper-V and VMware's market-leading ESX.
The findings here focus on hypervisor performance. A second installment coming later this month will take usability, management and migration features into account.
The question of which hypervisor is faster depends on a number of factors. For example, it depends on how virtual machine (VM) guest operating systems are allocated to the available host CPUs and memory. It also depends on numerous product-specific limitations that can restrict performance.
That said, VMware ESX was the overall winner in this virtualization performance contest - where we were limited to running six concurrent VMs because of the combination of our server's processor cores and memory capacity, and the limitation of the hypervisors we tested. ESX pulled down top honors in most of our basic load testing, multi-CPU VM hosting, and disk I/O performance tests.
Microsoft's Hyper-V, however, did well in a few cases, namely when we used a special set of drivers released by Microsoft to boost performance of the only Linux platform Hyper-V officially supports: Novell's SuSE Enterprise Linux.
VM hypervisors are designed to represent server hardware resources to multiple guest operating systems. The physical CPUs (also called cores) are represented to guest operating systems as virtual CPUs (vCPU). But there isn't necessarily a one-core to one-vCPU relationship. The exact ratio depends upon the underlying hypervisor. In our testing, we let the hypervisor decide how to present CPU resources as vCPUs.
The operating systems "see" the server resources within the limitations imposed by the hypervisor. As an example, a four CPU-core system might be represented as a single CPU to the operating system, which will then have to live on just that CPU. In other cases, four CPUs may be virtualized as eight vCPUs, in a scenario in which quieter VMs aren't likely to frequently use peak CPU resources. Other constraints can be imposed on the VMs as well, such as those pertaining to disk size, network I/O, and even which guest gets to use the single CD/DVD inside the server.
One frustrating performance limitation imposed by both Hyper-V and ESX is that the number of vCPUs that can be used by any single VM is four, no matter the type or version of that guest operating system instance or how many physical cores might actually be available. Furthermore, if you choose to run 32-bit versions of SLES 10 as a guest operating system, you will find that Microsoft only lets those guests have a single vCPU.
The limitations imposed by the hypervisor vendors on the number of available vCPUs come from two areas. First, keeping track of VM guests with very large CPU needs also involves enormous memory management and large amount of inter-CPU communications (including processor cache, instruction pipelines and I/O state controls) that are exceedingly difficult. Secondly, the demand for VM guest hosting has been perceived to be a server consolidation action - and servers that need consolidating are often single CPU machines.
These limitations in hypervisor hardware resource allocations set the stage for how we could take advantage of the 16-CPU HP DL580G5 server in our test bed (see How we did it).
As previously noted, Microsoft officially supports its own operating systems and Novell's SLES 10 (editions running Service Packs 1 and 2) as guest instances. That accounts for why we tested with only Windows 2008 and SLES 10.2 VMs. Other operating systems (Red Hat Linux, Debian Linux and NetBSD) may work, but organizations seeking debugging or tech support are on their own if they use them.
While we were testing, Microsoft introduced its Hyper-V Linux Interface Connector (Hyper-V LinuxIC) kit, which is a set of drivers that help optimize CPU, memory, disk and network I/O for SLES guest instances. We did see a boost in performance with the kit in place, but only in the case of one vCPU per guest. Hyper-V LinuxIC isn't supported for SMP environments.
The cost of virtualization
No one is claiming the buzz about server virtualization is unsubstantiated. It lets you pack multiple operating system instances onto the same hardware that previously only hosted one instance. And it helps in deploying a standard operating system profile across the data center, if that is your goal.
But nothing is free. Hypervisors become the basic operating system of the servers that they virtualize, which taxes performance. Our first test measures the cost of virtualization by comparing transactional performance when an operating system is running on bare metal with the performance of that same operating system when a hypervisor serves as a buffer between the operating system and the system. The difference in performance amounts to a theoretical tax imposed by the hypervisor's innate management role.
In our tests, the performance hit when we moved from a native operating system instance to a virtualized one with a single vCPU allotted, ranged from about 2.5% when ESX was running Windows 2008 to more than 12% when Hyper-V was running SLES. The foundational performance 'cost' of each hypervisor varied, but VMware wins this theoretical round. It's theoretical because there are few cases for running a virtual machine platform with only a single guest limited to a single CPU.
When the number of CPUs made available to a single virtual machine guest climbed, the cost of virtualization varied more widely. When we allowed a single operating instance SMP access to four vCPUs, the lowest price paid - less than 4% - was registered when VMware ESX was supporting a SLES instance. Conversely, the highest operational price paid was a more than 15% hit taken when Hyper-V was supporting a SLES instance.
Overall, Hyper-V also loses this round, but by very little when supporting Windows VMs. It falls down more on SLES, likely because of the fact that the LinuxIC kit isn't available to boost performance results.
Testing VMs with business application loads
The second round of performance tests compares iterative VM application performance as VM machines are added to the system. We tracked performance for one, three and six VMs when supporting approved guests. We measured performance when each VM was allocated its own vCPUs and when each was allowed to tap into four vCPUs. This load test would theoretically amplify performance differences.
Our test tool of choice was SPECjbb2005 - a widely used benchmark that mimics distributed transactions in a distribution warehouse-like environment. The SPECjbb2005 test uses Java application components running inside a single host or VM instance. The first component simulates a client generating threads to be processed by the second component, a business logic engine that in turn stores and fetches objects in transactions to/from a set of Java Collection objects (emulating a database engine), logging them through a set of iterative transaction cycles. SPECjbb2005 spawns test parameters it chooses based on the number of CPUs found, as well as the available memory in the host. The measured output is in basic operations per second, or bops per period time with the more bops per test run, the better.
We completed multiple runs with each hypervisor, a set where each VM was allocated its own vCPU and a set where each VM was permitted to tap into four vCPUs.
In both cases, we ran tests with one, three and six VMs. We ran each sequences first with Windows 2008 Server as the hosted operating system and then with SUSE SLES 10.2 as the hosted operating system.
The first round used a ratio of one VM guest operating system per vCPU and limited memory access (2GB) for each operating system instance. This resource allocation is typical of what would happen during a server consolidation process, in which older single-CPU machines are consolidated into a physical-to-virtual re-hosting situation.
VMware started out ahead in this race with Windows 2008 and SLES 10.2 virtual performance nearly as fast as native performance, and held close to that pace with three guest operating systems. Hyper-V with three VMs in place was about 1,400 bops off VMWare's pace with Windows 2008 guests and 1,800 bops down from ESX mark with SLES VMs.
At six VM guests, both hypervisors are starting to struggle to deliver performance comparable to what a native operating system running directly on the server can pull off. But Microsoft kept its performance drop a bit more in check as it appears to have mastered a more linear distribution of hypervisor resources when VMs get piled on.
In reality, consolidated instances aren't necessarily as burdened at the pace we placed on the instance by running concurrent SPECjbb2005 tests. Many operating system and application instances typically have far less constant CPU utilization than SPECjbb2005 places on them, and the utilization is often more random in nature. We've stressed the VMs and the hypervisors supporting them to amplify how each hypervisor reacts under enormous loads.
In the second round of iterative VM tests we allowed each VM to have access to four vCPUs, the maximum allowed by either hypervisor under test. Each VM was still limited by 2GB of memory as it's a common ceiling when consolidating and testing an operating system. This test scenario more readily demonstrates how VMs would be used in virtualized database applications, rendering farms, high-volume transaction systems and other applications needing strong CPU availability.
As before, we started with a single VM guest to establish a baseline, then added two more VMs for a total of three instances, then three more for a total of six VMs. In the first test, as we noted in our cost of virtualization test, VMware pulls slightly ahead when hosting Windows 2008 clients and has almost an 1100 bops advantage when hosting SLES 10.2 VMs. Because Microsoft's LinuxIC kit isn't supported for SMP environments, Hyper-V's performance with SLES is dampened without the boost it provided in the tests where we could allocate a single vCPU to each VM.
In the test where three VMs were each using four vCPUs, 12 vCPUs were in play. Because there were 16 physical CPU cores on the server in our test bed that could be virtualized by the hypervisors under test, there were four CPUs sitting idle. Hyper-V pulls ahead of VMware ESX in this instance with on average 6,500 more bops. Our test results suggest that Hyper-V could see those extra available hardware resources and tapped into them, whereas ESX could not.
This advantage is lost, however, when we oversubscribe as we did in the final round of testing. Oversubscription is a method that allocates more physical CPU than is available, allowing VMs to "share" their allocated vCPUs with other VM guests. It's a process that is useful when VMs are running applications that use CPU power randomly, as it lets you stuff more VMs while hopefully (dependent on guest activities) offering performance at or above what the guests did before they were virtualized.
Six VM guests each using four vCPUs oversubscribes the 16 physical CPU cores in our test rig. Both hypervisors are starting to buckle under an extreme load as CPU power is at a premium in this stressful test. But VMware seems to deal with oversubscription better than Hyper-V as it could still pull down an average of 16,136 bops with Windows 2008 guests (compared with Hyper-V's 14,588 bops) and 17,089 bops with SLES guests (compared with Hyper-V's 11,122 bops). Microsoft also is slightly disadvantaged in oversubscription because a native instance of Windows 2008 Server (we used Enterprise Edition) needs to be active to run the Hyper-V hypervisor system - using up its own space and CPU.
The disk I/O seen in a VM light
We also tracked disk throughput of hosted VMs with Intel's IOMeter (pre-compiled Windows and Linux versions). IOMeter exercises disk subsystems by spawning worker threads that read and write to the subsystem in a tester-defined routine. Measurements are summarized in terms of IOs per second as recorded by IOmeter at the end of a test run. The results are expressed in terms of IO's per second. A higher number of IOs is better.
In a virtualized world, VM guest instances must contend with either internal disk or storage-area network resources. When the hardware is re-represented to guest operating systems through virtualization, the hypervisor layer between the hardware and guest VMs uses its own disk driver to manage disk activity. Adding virtualized guests divides the hardware resources among the guest VM operating system/applications instance. Even though native operating system drivers might be good, the ability for a hypervisor to manage the communication needs among a number of guests becomes a very sophisticated business, and latency and efficiency issues will be seen as application performance slow-downs.
We ran IOmeter in each VM instance to gauge how the hypervisor could "breathe" data to disk. We used a tougher-than-real-world ratio of 70% writes vs. 30% reads. We favored writes in our configuration because they aren't heavily cached by the operating system (so their contents don't evaporate during power outages or hardware resets), and read-based cache can distort measurements.
We established the I/O performance of a native operating system (in both single and SMP servers) to establish a baseline of the operating system's disk I/O speed as measured by IOMeter. We then ran the same tests on each of our hypervised environments with six VM guests. We wanted to know if the hypervisor could offer more disk channel availability to VM guests than they could use on their own as native instances.
The good news is that our tests show both hypervisors could pump up the disk channel at rates greater than a single native instance could when we added more guest VM instances. This means hypervisors controlling the disk channel (an HP Smart Array in our case) can do a good job of cramming that channel when the number of VM guests increases.
In the hosted SLES results where each VM accessed a single vCPU, we again saw that Hyper-V VM guest instances get a formidable boost from the Microsoft Linux IC as SLES Linux VMs ran faster on Hyper-V than on VMware ESX. When we tested to see if SLES without the LinuxIC kit would be slower, we found it was essentially the same (within a single percent) as VMware ESX's performance. When we ran this test on Hyper-V without the LinuxIC kit, the average I/O for an SLES VM was 83.78 I/Os per second, about 5% faster than VMware's disk throughput with SLES.
However, Hyper-V doesn't fare as well in delivering disk I/O to its own Windows 2008 Server. VMware lapped Microsoft with six Windows 2008 VMs loaded up.
When we measured, disk I/O activity in an SMP environment - where each of our six VMs was allocated four vCPUs - we intentionally oversubscribed the server to see if the hypervisors could sustain their disk channel activity when given a volume of disk demand from each guest. As a hypervisor is an operating system of its own, it must carefully reallocate disk writing time and switch contexts among guests cleanly and efficiently.
In these tests, both hypervisors achieved more I/O performance than a native operating system running on bare metal. But VMware ESX is the clear winner. When hosting Windows 2008 VMs it registered 1733.63 I/Os per second compared with Hyper-V's 874.29 I/Os per second and the native performance of 712.97 I/Os per second. But it also beat out Hyper-V in the hosted SLES environment by a narrow margin of about 45 I/Os per second. Hyper-V no longer has the advantage of the LinuxIC kit, which doesn't support SMP hardware.
VMware's initial lead in the marketplace has given it a performance lead in most of the areas that we tested, although Microsoft's prowess is beginning to show in a core area - consolidation of single-CPU focused VM performance. Both vendors are likely to improve their performance numbers rapidly, as it's a source of strong competition between them. Biting at their heels are offerings from Citrix, Sun and Red Hat, as well as open source developments that are reaching commercial potential. VM performance is certainly an area to keep an eye on.
Henderson and Allen are researchers for ExtremeLabs. They can be reached at firstname.lastname@example.org.
Henderson is also a member of the Network World Lab Alliance, a cooperative of the premier reviewers in the network industry each bringing to bear years of practical experience on every review. For more Lab Alliance information, including what it takes to become a member, go to www.networkworld.com/alliance.
An interesting explanation has emerged regarding why Microsoft curiously jumped from Windows 8 to...
FBI says man-in-the-middle e-mail scam cost victims $214M; IRS says phone scam has 3,000 victims...
Buyers of the earthly explanation for whatever fell from the sky in Roswell, N.M. back in 1947 are...
Sponsored by AT&T
Sponsored by Brocade
Reported launch would be first in a big expansion planned for this year
The first transcontinental phone call took place 100 years ago between New York and San Francisco
Better credit card security, along with the explosion of the Internet of Things is going to change the...
Friday could bring a 'wild trading day' as the cloud storage company finally reaches its long-awaited...