Chapter 1: System Considerations

Prentice Hall

As an example, a customer wanted a 20:1 compression ratio for virtualization of their low-utilization machines. However, they also had networking goals to compress their network requirements at the same time. The other limiting factor was the hardware they could choose, because they were limited to a certain set, with the adapters precisely limited. The specifications stated that with the hardware they could do what they wanted to do, so they proceeded down that path. However, what the hardware specification states is not necessarily the best practice for ESX, and this led to quite a bit of hardship as they worked through the issues with their chosen environment. They could have alleviated certain hardships early on with a better understanding of the impact of ESX on the various pieces of hardware and that hardware's impact on ESX. (Whereas most, if not all, of the diagrams and notes use Hewlett-Packard hardware, these are just examples; similar hardware is available from Dell, IBM, Sun, and many other vendors.)

Basic Hardware Considerations

An understanding of basic hardware aspects and their impact on ESX can greatly increase your chances of virtualization success. To begin, let's look at the components that make up modern systems.

When designing for the enterprise, one of the key considerations is the processor to use, specifically the type, cache available, and memory configurations; all these factors affect how ESX works in major ways. The wrong choices may make the system seem sluggish and will reduce the number of virtual machines (VMs) that can run, so it is best to pay close attention to the processor and system architecture when designing the virtual environment.

Before picking any hardware, always refer to the VMware Hardware Compatibility Lists (HCLs), which you can find as four volumes at http://www.vmware.com/support/pubs/vi_pubs.html:

  • ESX Server 3.x Systems Compatibility Guide

  • ESX Server 3.x I/O Compatibility Guide

  • ESX Server 3.x Storage/SAN Compatibility Guide

  • ESX Server 3.x Backup Software Compatibility Guide

Processor Considerations

Processor family, which is not a huge consideration in the scheme of things, is a consideration when picking multiple machines for the enterprise because the different types of processor architectures impact the availability of ESX features. Specifically, mismatched processor types will prevent the use of VMotion. VMotion allows for the movement of a running VM from host to host by using a specialized network connection. VMotion momentarily freezes a VM while it copies the memory and register footprint of the VM from host to host. Afterward, the VM on the old host is shut down cleanly, and the new one will start. If everything works appropriately, the VM does not notice anything but a slight hiccup that can be absorbed with no issues. However, because VMotion copies the register and memory footprint from host to host, the processor architecture and chipset in use needs to match. It is not possible without proper masking of processor features to VMotion from a Xeon to an AMD processor or from a single-core processor to a dual-core processor, even if it is the same family of processor that was introduced in ESX version 2.5.2. If the Virtual Machine to be moved is a 64 bit VM, then the processors must match exactly as there is no method available to mask processor features. Therefore the processor architecture and chipset (or the instruction set) is extremely important, and because this can change from generation to generation of the machines, it is best to introduce two machines into the virtual enterprise at the same time to ensure VMotion actually works. When introducing new hardware into the mix of ESX hosts, test to confirm that VMotion will work.


Best Practice - Standardize on a single processor and chipset architecture. If this is not possible because of the age of existing machines, test to ensure VMotion still works, or introduce hosts in pairs to guarantee successful VMotion. Different firmware revisions can also affect VMotion functionality.

Ensure that all the processor speed or stepping parameters in a system match, too.


Note that many companies support mismatched processor speeds or stepping in a system. ESX would really rather have all the processors at the same speed and stepping. In the case where the stepping for a processor is different, each vendor provides different instructions for processor placement. For example, Hewlett-Packard (HP) will require that the slowest processor be in the first processor slot and all the others in any remaining slots. To alleviate any type of issue, it is a best practice that the processor speeds or stepping match within the system.

Before proceeding to the next phase, a brief comment on dual-core (DC) versus single-core (SC) processors is warranted. ESX Server does not differentiate in its licensing scheme between DC and SC processors, so the difference between them becomes a matter of cost versus performance gain of the processors. The DC processor will handle more VMs than an SC but also cost more and has support only in the later releases of ESX. In some cases, it is possible to start with SC processors and make the first upgrade of the ESX Servers to be DC processors in their effort to protect the hardware investment. If performance is the issue, DC is the way to go. Nevertheless, for now, the choice is a balance of cost versus performance. Due to current shared-cached mechanisms for DC, an eight-core or four-processor server has the same processing power as if there were seven physical processors, and once shared cache goes away there is a good chance the efficiency of the DC will match that of a true eight-way machine.

Cache Considerations

Unlike matching processor architectures and chipsets, it is not important to match the L2 Cache between multiple hosts. A mismatch will not prevent VMotion from working. However, L2 Cache is most likely to be more important when it comes to performance because it controls how often main memory is accessed. The larger the L2 Cache, the better an ESX Server will run. Consider Figure 1.1 in terms of VMs being a complete process and the access path of memory. Although ESX tries to limit memory usage as much as possible, with 40 VMs this is just not possible, so the L2 Cache plays a significant part in how VMs perform.

Figure 1.1

Memory access paths

As more VMs are added to a host of the same operating system (OS) type and version, ESX will start to share code segments between VMs. Code segments are the instructions that make up the OS within the VM and not the data segments that contain the VM's memory. Code-segment sharing between VMs does not violate any VM's security, because the code never changes, and if it does, code-segment sharing is no longer available for the VMs. That aside, let's look at Figure 1.1 again. When a processor needs to ask the system for memory, it first goes to the L1 Cache (up to a megabyte usually) and sees whether the memory region requested is already on the processor die. This action is extremely fast, and although different for most processors, we can assume it is an instruction or two (measured in nanoseconds). However, if the memory region is not in the L1 Cache, the next step is to go to the L2 Cache, which is generally off the die, over an extremely fast channel (green arrow) usually running at processor speeds. However, this takes even more time and instructions than L1 Cache access, and adds to the overall time to access memory. If the memory region you desire is not in L2 Cache, it is in main memory (yellow arrow) somewhere, and must be accessed and loaded into L2 Cache so that the processor can access the memory, which takes another order of magnitude of time to access. Usually, a cache line is copied from main memory, which is the desired memory region and some of the adjacent data, to speed up future memory access. When we are dealing with non-uniform memory access (NUMA) architecture, because is the case with AMD processors, there is yet another step to memory access if the memory necessary is sitting on a processor board elsewhere in the system. The farther away it is, the slower the access time (red and black arrows), and this access over the CPU interconnect will add another order of magnitude to the memory access time, which in processor time can be rather slow.

Okay, but what does this mean in real times? Assuming that we are using a 3.06GHz processor, the times could be as follows:

  • L1 Cache, one cycle (~0.33ns)

  • L2 Cache, two cycles, the first one to get a cache miss from L1 Cache and another to access L2 Cache (~0.66ns), which runs at CPU speeds (green arrow)

  • Main memory is running at 333MHz, which is an order of magnitude slower than L2 Cache (~3.0ns access time) (yellow arrow)

  • Access to main memory on another processor board (NUMA) is an order of magnitude slower than accessing main memory on the same processor board (~30–45ns access time, depending on distance) (red or black arrow)

This implies that large L2 Cache sizes will benefit the system more than small L2 Cache sizes; so, the larger the better, so that the processor has access to larger chunks of contiguous memory, because the memory to be swapped in will be on the larger size and this will benefit the performance of the VMs. This discussion does not state that NUMA-based architectures are inherently slower than regular-style architectures, because most NUMA-based architectures running ESX Server do not need to go out to other processor boards very often to gain access to their memory.


Best Practice - Invest in the largest amount of L2 Cache available for your chosen architecture.


Memory Considerations

After L2 Cache comes the speed of the memory, as the preceding bulleted list suggests. Higher-speed memory is suggested, and lots of it! The quantity of memory and the number of processors govern how many VMs can run simultaneously without overcommitting this vital resource. In many cases, the highest-speed memory often comes with a lower memory penalty. An example of this is the HP DL585, which can host 32GB of the highest-speed memory, yet it can host 64GB of the lower-speed memory. So, obviously, there are trade-offs in the number of VMs and how you populate memory, but generally the best practice is high-speed and a high quantity. Consider that the maximum number of vCPUs per core is eight. On a 4-processor box, that could be 32 VMs. If each of these VMs is 1GB, we need 33GB of memory to run the VMs. Why 33GB? Because 33GB gives both the console OS (COS, the service console) and the VMkernel up to 1GB of memory to run the VMs. Because 33GB of memory is a weird number for most computers these days, we would need to overcommit memory. When we start overcommitting memory in this way, the performance of ESX can degrade. In this case, it might be better to move to 64GB of memory instead. However, that same box with DC processors can, theoretically, run up to 64 VMs, which implies that we take the VM load to the logical conclusion, and we are once more overcommitting memory. However, eight VMs per processor is a theoretical limit, and it's hard to achieve. (It is not possible to run VMs with more vCPUs than available physical cores, but there is still a theoretical limit of eight vCPUs per core.) There are rumors that it has been done. Unfortunately, that pushes the machine to its limits and is not recommended. Recommended memory utilization differs significantly for each configuration.


Best Practice - High-speed memory and lots of it! However, be aware of the possible trade-offs involved in choosing the highest-speed memory. More VMs may necessitate the use of slightly slower memory.


What is the recommended memory configuration? This subject is covered when we cover VMs in detail, because it really pertains to this question; but, the strong recommendation is to put in the maximum memory the hardware will support that is not above the 64GB limit set by ESX (because overcommitting memory creates too much of a performance hit and should only be done in extreme circumstances). However, this is a pretty major cost-benefit solution because redundancy needs to be considered with any implementation of ESX; it is therefore beneficial to cut down on the per-machine memory requirements to afford redundant systems.

I/O Card Considerations

The next consideration is which I/O cards are supported. Unlike other operating systems, there is a finite list of supported I/O cards. There are limitations on the redundant array of inexpensive drives (RAID) arrays, Small Computer System Interface (SCSI) adapters for external devices including tape libraries, network interface cards (NICs), and Fibre Channel host bus adapters. Although the list changes frequently, it boils down to a few types of supported devices limited by the set of device drivers that are a part of ESX. Table 1.1 covers the devices and the associated drivers.

Table 1.1: Devices and Drivers

Related:
1 2 3 4 5 Page 1
Page 1 of 5
IT Salary Survey: The results are in