Xen-based hypervisors push performance limits

Citrix VMs are tops in transaction processing, Novell's in I/O speed

When we declared VMware's ESX virtual machine platform to be the performance winner against Micosoft’s Hyper-V.

When we declared VMware's ESX virtual machine platform to be the performance winner against Micosoft's Hyper-V - readers asked, "How could you not test Xen from either Novell or Citrix?"

The short answer then was that neither vendor was ready to enter its Xen hypervisor derivative when testing was conducted last summer. However, in the second round of identical testing done late last fall, we tested Citrix XenServer 5.0, Novell's Xen 3.2 and Virtual Iron 4.4. Two other vendors -- Sun and Red Hat -- were invited to participate but because of varying timing problems, declined to participate.

Our testing confirmed some readers' assertions that open source Xen is a formidable challenger to the closed code VMware and Microsoft hypervisors. When we measured the performance of business transactions running atop the hypervisors, Citrix's XenServer 5.0 was the top finisher in nine out of 12 test runs.

The disk I/O battle was won by Novell's SUSE Xen, which killed all competition in every contest. That achievement boils down to the fact that within the default installation we tested, Novell's SUSE Xen caches writes when using the default, file-backed disk configuration. This caching gives Novell unprecedented speed. But for some, caching disk writes bucks a longstanding practice of passing disk writes immediately to media for the purpose of maintaining transactional integrity. The counter argument there is that should a transactional failure occur while a disk write is in cache storage (before being written to disk), the problem can be easily trapped and dealt with by transaction oriented applications like databases.

When you pull in the numbers recorded by Microsoft and VMware in the last round of testing, you can see that in terms of performance, the Brothers Xen provide new and formidable competition for both hypervisor market leader VMware ESX and its more recent competitor, Microsoft's Hyper-V.

Para vs. full virtualization

Novell SUSE Xen and Citrix XenServer (along with Hyper-V) are capable of bringing into play a process called paravirtualization that can, where supported in both the host hypervisor and in the virtualized guest operating systems, enable a greater bonding between a guest VM and the resources of the physical server. With this bond in place, the guest operating system is supposed to be able access the resources of the host machine more efficiently.

Virtual Iron doesn't support paravirtualization. VMware supports paravirtualization for some Linux versions through a VMI-enabled kernel, but the SLES 10 SP2 64-bit distribution we used in our test bed does not have that kernel at this juncture.

We conducted all tests with SLES VMs running on Novell SUSE Xen and Citrix XenServer hypervisors in both para- and full-virtualization modes. We took these extra steps to discern whether there's an advantage to paravirtualization relationships and our analysis says that while paravirtualization helps some of the incremental load profiles we tested, the overall advantage isn't a consistent benefit. We printed the best numbers achieved for each hypervisor.

Transaction benchmarks summary

We developed several test profiles that mimic common use cases for virtualized guest operating systems. Each product was tested in this round on a HP 580 G5, four-socket, 16-core server, a test bed and process identical to the ones used to test VMware ESX and Hyper-V (see How we did it).

We used SPEC's SPECjbb2005, a Java-based business transaction benchmark, to first compare native operating system performance to basic hypervisor load profiles. We then measured performance as we added guest virtual machines to each hypervisor platform until we hit the final profile that oversubscribes system resources.

The fastest overall performance of a guest VM in our transactional benchmark testing was achieved by XenServer in most cases.

Across the six tests in which each hypervisor was hosting Windows 2008 Server virtual machines, the only case in which XenServer earned the silver was when we ran six Windows 2008 Server guest VMs, all of which had access to a single virtual CPU. Microsoft's Hyper-V achieved the high-water mark in that test run (measured in our first round of testing) with 14,531 bops, compared with XenServer's 14,128 bops.

We can speculate that XenServer gives more resources to a single vCPU than other hypervisors we've tested, which enhances results in situations where the vCPUs are undersubscribed, that is, where there is only a one VM to one vCPU ratio or less.

In the six tests where the hypervisors were hosting SUSE Linux virtual machines, Novell's own Xen implementation was able to best Citrix's XenServer in the test where there was one Linux VM running on a single vCPU. Of course, this win was achieved with a very slight margin, only 25 bops. VMware's ESX beat XenServer in our test where six Linux VMs had access to four vCPUs by a wider margin of 314 bops.

The performance price for virtualization

Virtualizing a guest operating system adds work for the server to handle. As more VM guests means more shared server resources, at some point, performance will degrade because of the extra work each VM guest imposes on the finite server resources. We measured performance of both Windows 2008 Enterprise Server and Novell's SLES 10.2 natively on the server, to garner a baseline of performance expectation. Those numbers came in at 18,153 bops for Windows Server 2008 and 22,240 bops for Novell SLES.

Tracking VM Transactional performanceIn our transactional-based performance testing, Citrix's XenServer is the clear front runner, winning 9 out of twelve contests. VMware's ESX (tested in a previous round) holds on to the number two spot. SPECjbb2005 business transaction emulator performance testing shows results in Basic Operations Per Second (bops). The higher the bops result the better. Numbers in bold typeface are the top result in each test run.
Round One: Running on OneVCPU per VM with four cores availableSupporting Windows Server 2008 VMsSupporting Novell SLES 10.2 VMs
Native operating system18,15322,240
One VM running on Microsoft Hyper-V.17,403 (95.87% of native)19,619 (88.21% of native)
One VM running on VMWare ESX.17,963 (98.95% of native)20,711 (93.13% of native)
One VM running on Citrix XenServer.18,431 (101.53 % of native)20,874 (93.86% of native)
One VM running on Virtual Iron.16,861 (92.88% of native)14,277 (64.20% of native)
One VM running on Novell SLES Xen.17,721 (97.62% of native)20,897 (93.96% of native)
Three VMs running on Microsoft Hyper-V. Results listed in average bops per VM.16,363 (90.14% of native)18,461 (83.01% of native)
Three VMs running on VMware ESX. Results listed in average bops per VM.17,735 (97.79% of native)20,229 (90.96% of native)
Three VMs running on Citrix XenServer. Results listes in average bops per VM.18,257 (100.57% of native)20,244 (91.03% of native)
Three VMs running on Virtual Iron. Results listes in average bops per VM.16,671 (91.83% of native)15,542 (69.88% of native)
Three VMs running on Novell SLES Xen. Results listes in average bops per VM.14,514 (79.95% of native)18,489 (83.13% of native)
Six VMs running on Microsoft Hyper-V. Results listed in average bops per VM.14,531 (80.05% of native)15,168 (68.20% of native)
Six VMs running on VMWare ESX. Results listed in average bops per VM.13,964 (76.92% of native)14,009 (62.99% of native)
Six VMs running on Citrix XenServer. Results listed in average bops per VM.14,162 (78.01% of native)15,888 (71.44% of native)
Six VMs running on Virtual Iron. Results listed in average bops per VM.14,128 (77.83% of native)13,350 (60.03% of native)
Six VMs running on Novell SLES Xen. Results listed in average bops per VM.12,490 (68.80% of native)15,153 (68.13% of native)
 
Round Two: Running on 4 vCPUs with all 16 server cores availableSupporting Windows Server 2008 VMsSupporting Novell SLES 10.2 VMs
Native operating system32,52533,996
One VM running on Microsoft Hyper-V.31,037 (95.43% of native)28,776 (84.65% of native)
One VM running on VMWare ESX.31,155 (95.79% of native)32,680 (96.13% of native)
One VM running on Citrix XenServer.32,040 (98.51% of native)33,397 (98.24% of native)
One VM running on Virtual Iron.31,382 (96.49% of native)32,124 (94.49% of native)
One VM running on Novell SLES Xen.29,838 (91.74% of native)32,461 (95.48% of native)
Three VMs running on Microsoft Hyper-V. Results listed in average bops per VM.33,674 (103.53% of native)30,976 (91.12% of native)
Three VMs running on VMware ESX. Results listed in average bops per VM.27,143 (83.45% of native)27,778 (81.71% of native)
Three VMs running on Citrix XenServer. Results listes in average bops per VM.35,128 (108.00% of native)35,872 (105.52% of native)
Three VMs running on Virtual Iron. Results listes in average bops per VM.33,658 (103.48% of native)32,353 (95.17% of native)
Three VMs running on Novell SLES Xen. Results listes in average bops per VM.31,049 (95.46% of native)33,956 (99.88% of native)
Six VMs running on Microsoft Hyper-V. Results listed in average bops per VM.14,588 (44.85% of native)11,122 (32.72% of native)
Six VMs running on VMWare ESX. Results listed in average bops per VM.16,136 (49.85% of native)17,089 (50.72% of native)
Six VMs running on Citrix XenServer. Results listed in average bops per VM.19,438 (59.76% of native)16,775 (49.34% of native)
Six VMs running on Virtual Iron. Results listed in average bops per VM.15,694 (48.25% of native)15,413 (45.34% of native)
Six VMs running on Novell SLES Xen. Results listed in average bops per VM.15,053 (46.28% of native)15,958 (45.18% of native)

It's possible for a hypervisor to allocate even more resources than a native OS implementation because a hypervisor is able to capture all the resources of a server, where a native installation might not be able to use those resources because of restrictions of its kernel's ability to use all resources of a four-core, or 16-core server. I/O drivers included with hypervisors may also manage server resources more productively.

We divided our testing into two rounds: one with the server confined to one socket of four cores, and; a second where we re-installed the remaining three sockets rendering 16 cores and four vCPUs to each guest instance. In each test, we progressively added VM guests, and compared the results with the native operating system results on the same hardware.

The results showed a clear winner. XenServer was very efficient at finding resources and offering them up to a guest VM. In the first test where we used one VM guest with a single vCPU, XenServer offered sufficient additional resources from the remaining cores to permit Windows to perform faster than its native performance. It's a bit of a smoke-and-mirrors trick (as XenServer's allocates a larger common denominator of resources than the other competitors), but interesting -- and certainly faster than the competition.

Where three VMs shared the four cores with one vCPU allocated to each, XenServer repeated as the performance leader, going just a tiny bit slower, but still faster than native performance. It was only when we started to oversubscribe the four cores with six VM guests that XenServer start to slow down -- but it still exceeded the performance of all four other competitors.

Where we tested Novell's SLES 10.2 Linux as a VM, Novell's SLES Xen bested all (although the results were very close) where we had a single SLES 10.2 VM running on a single vCPU. But Novell’s SLES Xen was bested by XenServer when we increased the number of SLES VMs to three and six, each with access to its own vCPU. In no case was SLES VM performance faster than native performance as it had been with Windows 2008 Server Edition testing.

When we gave the XenServer Hypervisor guest VM instances lots of vCPUs in our second test round, XenServer did well supporting Windows 2008 VMs, pulling down 98.5% of the Windows Server 2008 native numbers when each VM had access to four v-CPUs. It then zoomed to an astounding 108% of native when we added three more VMs to the four vCPUs (remember, it's finding additional resources), then XenServer slowed down as we oversubscribed the number of guest VMs to six guests, four vCPUs each, on a 16-core system to just less than 60% of native.

XenServer continued its winning streak when running Linux VMs with multiple vCPUs available to the VMs except in the toughest test, where VMware's ESX still tromps all when we over-allocate resources by chaining six SLES VM guests with four vCPUs allocated to each guest.

One performance parameter to note regarding XenServer is that there was a consistency issue in the test where we had six VMs running on one vCPU. While the charted performance numbers show the average speed of the VMs, we kept detailed records on each VM’s individual performance. With XenServer, the differences between the slowest VM and the fastest VMs were as much as 41% across 10 test runs of this test scenario. No other hypervisor's guests showed this variation in any of the test scenarios. We also found that we couldn't predict which guest VM would be fastest/slowest through these test runs.

We asked a Citrix spokesperson to comment on the variance in this single test, and Bill Carovano, director of technical product management for XenServer, says the variations were likely caused by the cron jobs that the guests can trigger. Without tweaking, Carovano says these can occur somewhat randomly and may lead to performance variances. In internal testing, Citrix tries to suppress cron jobs to remove fluctuations in its results.

Virtual Iron's performance put it in the overall bottom slot, but it's important to note that the results didn't lag far behind others in all cases. And, Virtual Iron did place second when hosting a single Windows 2008 Server guest across four vCPUs test, a test that gave a single virtual machine a playground of four CPU cores and 2GB of memory -- and all disk resources. That's a pretty wide-open field to run in.

I/O results favor Novell

We tested I/O performance using Intel's IOMeter to assess the number of I/Os per second that each virtual machine could deliver in both under- and over-subscribed conditions.

In the first of our two I/O test scenarios, we used six guest VMs that were assigned one vCPU each, emulating a typical non-oversubscribed server consolidation scenario. The second test made use of six virtual machines with four vCPU, SMP kernels.

Disk I/O results with VMs accessing single vCPUNovell captures the flag in our IOMeter disk performance testing mainly because it caches writes in its default configuration. Results shown in IOs per second. The more IOps, the faster the hypervisor's IO performance. Microsoft Hyper-V and VMWare ESX results carried over from previous testing.
 Windows Server 2008 VMsNovell SLES 10.2 VMs
Native operating system running on a single CPU

712.97

226.96

I/O operations per second with six VMs, each using one vCPU, average of each VM in concurrent IOMeter tests.Hyper-V

145.71

109.51

VMWare

288.94

79.64

Cirtix XenServer

159.58

86.43

Virtual Iron

157.06

86.54

Novell SLES Xen

1,131.45

416.24

Total I/O operations per second, all six VMs each using one vCPU, test running concurrently on all VMs.Hyper-V

874.29

657.07

VMWare

1,733.63

477.85

Cirtix XenServer

957.48

518.61

Virtual Iron

942.34

519.26

Novell SLES Xen

6,788.73

2,497.45

Across every IOMeter test, Novell's SLES Xen blew away the competition. The results were so startling (in some cases there was a 10fold advantage in performance for VMs running on Novell's Xen hypervisor), that we retested Novell's SLES Xen across all scenarios. During these retests we carefully watched the disk I/O channel. Our tests include 70% write to 30% read ratio in order to provide large amounts of pressure on the disk channel to emulate virtualization in stressful, high-I/O environments. Servers don't typically see this ratio in many applications, but certain applications such as data warehousing, business analysis, database maintenance and batch processing typical in research applications favor writes over reads, so we test heavily.

In Novell's case we saw that the read/write transactions to disk seemed to come in large cycles, rather than the steady waves that normally typified disk activity while we were testing other hypervisors. From this evidence, we suspected the Novell system was using write caching.

When we asked Novell to comment on this situation, Santanu Bagchi, Novell's senior product manager for virtualization, confirmed our suspicions and told us that write caching is Novell's default when the virtual disk is configured as a file-backed disk as was the case in our test bed.

Write caching prevents bottlenecks when the channel is busy. But it can, in some cases, cause transactional integrity issues. But you can also argue that in many server configurations, write caching can be battery-backed. Being battery-backed staves off the transactional integrity issues by temporarily housing data to be written to disk for the life of the battery or until the transaction is written to media and verified.

In modern data centers, servers are often highly protected with availability features that prevent power outages and other conditions that can corrupt cache and render server data into garbage. It is for these reasons we let the Novell SLES Xen scores stand, realizing that systems purists will likely object to this default installation method and its potential for systems failures.

Disk I/O results with VMs accessing multiple vCPUsIn oversubscribed conditions, IOMeter results still show that Novell's SLES Xen has a very big lead in IOps. Results shown in IOs per second. The more IOps, the faster the hypervisor's IO performance. Microsoft Hyper-V and VMWare ESX results carried over from previous testing.
 Windows Server 2008 VMsNovell SLES 10.2 VMs
Native operating system running on four-CPU core.

1040.38

322.93

I/O operations per second with six VMs, each using four vCPUs. Results reflect the average performance of each VM in concurrent IOmeter tests.Hyper-V

166.27

69.95

VMWare

313.72

77.56

Cirtix XenServer

140.02

72.29

Virtual Iron

182.35

83.66

Novell SLES Xen

1,689.36

430.55

I/O operations per second with six VMs, each using four vCPUs. Results reflect the total performance of all VMs in concurrent IOmeter tests.Hyper-V

874.29

419.67

VMWare

1,882.34

465.36

Cirtix XenServer

840.14

433.73

Virtual Iron

1,094.10

501.96

Novell SLES Xen

10,136.17

2,583.33

Citrix XenServer pulled down numbers low enough across most tests for us to query Citrix as to why that was the case. We were told to change the scheduler setting to use the NOOP scheduler, which should have been selected by default but because of a bug in the installer, didn't set correctly on our hardware. This change actually resulted in slightly worse numbers for Windows VMs but resulted in significant improvement with the SLES VMs. Our reported numbers, reflect the NOOP scheduler being in place.

In terms of performance, the Brothers Xen provide some stiff competition. The question is, which is more important to your VM scheme: transactional performance (XenServer is tops there) or I/O performance (Novell’s SUSE Xen screams if you can stand the caching component)? The answer could sway your decision as to which Xen hypervisor might be more suitable for your environment.

Henderson and Allen are researchers for ExtremeLabs, of Indianapolis. Contact them at kitchen-sink@extremelabs.com.

From CSO: 7 security mistakes people make with their mobile device
Join the discussion
Be the first to comment on this article. Our Commenting Policies