Chapter 1: System Considerations

Prentice Hall

1 2 3 4 5 Page 3
Page 3 of 5

Disk Drive Space Considerations

The next item to discuss is what is required for drive space. In essence, the disk subsystem assigned to the system needs to be big enough to contain the COS and ESX. The swap file for the COS, storage space for the virtual swap file (used to overcommit memory in ESX), VM disk files, local ISO images, and backups of the Virtual Machine Disk Format (VMDK) files for disaster-recovery reasons. If Fibre Channel or iSCSI is available, it is obvious that you should offload the VM disk files to these systems. When we are booting from a SAN we have to share the Fibre Channel adapter between the service console and ESX for ESX earlier than version 3.0. The sharing of the Fibre Channel adapter ports is not a best practice and is offered as a matter of convenience and not really suggested for use. (Boot from a SAN is covered fully in Chapter 3, "Installation"). Putting temporary storage (COS swap) onto expensive SAN or iSCSI storage is also not a best practice; the recommendation is that there be some form of local disk space to host the OS and the COS swap files. It is a requirement for VMotion in ESX version 3 that the per-VM VMkernel swap live on the remote storage device. The general recommendation is roughly 72GB in a RAID 1 or mirrored configuration for the operating system and its necessary file systems, and for local storage of ISO files and other items as necessary.

For ESX versions earlier than version 3, the VMkernel swap file space should be twice the amount of memory in the machine. However, if twice the amount of memory in the machine is greater than 64GB, another VMkernel swap file should be used. Each VMkernel swap file should live on its own VMFS. VMs could live on a VMFS created larger than 64GB, and then a few VMs could live with the virtual swap files. However, if there will be no VMs on these VMFS partitions, the partitions could be exactly 64GB and use RAID 0 or unprotected RAID storage. The caveat in this case is if you lose a drive for this RAID device, it's possible the ESX Server will no longer be able to overcommit memory and those VMs currently overcommitted will fail. Use the fastest RAID level and place the virtual swap file on a VMFS on its own RAID set. It is also possible to place the VMkernel swap with the operating system on the recommended RAID 1 device. RAID 5 is really a waste for the VMkernel swap. RAID 1 or the VMFS partition containing the VMkernel swap file for ESX versions earlier than version 3 is the best choice.

For ESX version 3, there is no need to have a single VMkernel swap file. These are now included independently with each VM.

Any VMFS that contains VMs should use a RAID 5 configuration for the best protection of data. Chapter 12, "Disaster Recovery and Backup," covers the disk configuration in much more detail as it investigates the needs of the local disk from a disaster-recovery (DR) point of view. The general DR point of view is to have enough local space to run critical VMs from the host without the need for a SAN or iSCSI device.


Best Practice for Disk - Have as much local disk as possible to hold VMkernel swap files (twice memory for low-memory systems and equal to memory for the larger-memory systems) for ESX versions earlier than version 3.

Have as much local disk necessary to hold the OS, local ISO images, local backups of critical VMs, and perhaps some local VMs.


Basic Hardware Considerations Summary

Table 1.2 conveniently summarizes the hardware considerations discussed in this section.

Table 1.2: Best Practices for Hardware

Item

ESX Version 3

ESX Versions Earlier Than Version 3

Chapter to Visit for More Information

Fibre Ports

Two 2GB

Two 2GB

Chapter 5

Network Ports

Six 1GB

  Two for COS

  Two for VMs

  Two for VMotion

Four 1GB

  One for COS

  Two for VMs

  One for VMotion

Chapter 8

Local disks

SCSI RAID Enough to keep a copy of the most important VMs

SCSI RAID Enough to keep a copy of the most important Vms and local vSwap file

 

iSCSI

Two 1GB network ports via VMkernel or iSCSI HBA

N/A

Chapter 8

SAN

Enterprise class

Enterprise class

Chapter 5

Tape

Remote

Remote

Chapter 11

NFS-based NAS

Two 1GB network ports via VMkernel

Via COS

Chapter 8

Memory

Up to 64GB

Up to 64GB

 

Networks

Three or four

  Admin/iSCSI network

   VM network

  VMotion network

  VMkernel network

Three

  Admin network

  VM network

  VMotion network

Chapter 8

Specific Hardware Considerations

Now we need to look at the hardware currently available and decide how to best use it to meet the best practices listed previously. All hardware will have some issues to consider, and applying the comments from the first section of this chapter will help show the good, bad, and ugly about the possible hardware currently used as a virtual infrastructure node. The primary goal is to help the reader understand the necessary design choices when choosing various forms of hardware for an enterprise-level ESX Server farm. Note that the number of VMs mentioned are based on an average machine that does not do very much network, disk, or other I/O and has average processor utilization. This number varies too much based on the utilization of the current infrastructure, and these numbers are a measure of what each server is capable of and are not intended as maximums or minimums. A proper analysis will yield the best use of your ESX Servers and is part of the design for any virtual infrastructure.

Blade Server Systems

Because blade systems (see Figure 1.2) virtualize hardware, it is a logical choice for ESX, which further virtualizes a blade investment by running more servers on each blade. However, there are some serious design considerations when choosing blades. The majority of these considerations are in the realm of port density and availability of storage. Keep in mind our desire to have at least four NICs, two Fibre Channel ports, and local disk: Many blades do not have these basic requirements. Take, for example, the IBM HS20. This blade has two on-board NICs and two Fibre Channel ports. Although there is plenty of Fibre Channel, there is a dearth of NICs in this configuration. That is not to say that the HS20 is not used, but the trade-off in its use is either lack of redundancy, or security, and performance trade-offs. Other blades have similar trade-offs, too. Another example is the HP BL3 p blade. Although it has enough NIC ports, the two Fibre Channel ports share the same port on the fabric, which in essence removes Fibre redundancy from the picture. On top of that restriction, the BL3 p uses an IDE/ATA drive and not a SCSI drive, which implies that a SAN or iSCSI server is also required to run VMs. There are also no Peripheral Component Interconnect (PCI) slots in most blades, which makes it impossible to add in an additional NIC, Fibre, or SCSI adapter. In addition to the possible redundancy issue, there is a limitation on the amount of memory that you can put into a blade. With a blade, there is no PCI card redundancy because all NIC and Fibre ports are part of the system or some form of dual-port mezzanine card. If more than one network will be available to the VMs, 802.1q VLAN tagging would be the recommendation, because there is no way to add more NIC ports and splitting the NIC team for the VMs would remove redundancy. Even with these trade-offs, blades make very nice commonly used ESX Servers. It is common for two processor blades to run between four and ten VMs. This limitation depends on the amount of memory available. On four-processor blades, where you can add quite bit more memory, the loads can approach those of comparable nonblade systems.

Figure 1.2

Front and back of blade enclosure


Best Practice with Blades - Pick blades that offer full NIC and Fibre redundancy.


1U Server Systems

The next device of interest is the 1U server (see Figure 1.3), which offers in most cases two on-board NICs, generally no on-board Fibre, perhaps two PCI slots, and perhaps two to four SCSI/SAS disks. This is perfect for adding a quad-port NIC and a dual-port Fibre controller; but if you need a SCSI card for a local tape device, which is sometimes necessary but never recommended, there is no chance to put one in unless there is a way to get more on-board NIC or Fibre ports. In addition to the need to add more hardware into these units, there is a chance that PCI card redundancy would be lost, too. Consider the HP DL360 as a possible ESX Server, which is a 1U device with two SCSI or SATA drives, two on-board NICs, and possibly a mezzanine Fibre Channel adapter. In this case, if we were using ESX version 2.5.x or earlier, we would need to only choose SCSI drives, and for any version, we would want to add at least a quad-port NIC card to get to the six NICs that make up the best practice and gain more redundancy for ESX version 3. In some cases, there is a SCSI port on the back of the device, so access to a disk array will increase space dramatically, yet often driver deficiencies affect its usage with tape devices.

Figure 1.3

1U server front and back

In the case of SAN redundancy, if there were no mezzanine Fibre Channel adapter, the second PCI slot would host a dual-port Fibre Channel adapter, which would round out and fill all available slots. With the advent of quad-port NIC support, adding an additional pair of NIC ports for another network requires the replacement of the additional dual-port NIC with the new PCI card. There are, once again, a fair number of trade-offs when choosing this platform, and its low quantity of memory implies fewer VMs per server, perhaps in the four to ten range of VMs, depending on the quantity of memory and size of disk in the box. With slightly more capability than blades, the 1U box makes a good backup server, but can be a workhorse when needed.


Best Practice for 1U Boxes - Pick a box that has on-board Fibre Channel adapters so that there are free slots for more network and any other necessary I/O cards. Also, choose large disk drives when possible. There should be at least two on-board network ports. Add quad-port network and dual-port Fibre Channel cards as necessary to get port density.


2U Server Systems

The next server considered is the 2U server (see Figure 1.4), similar to the HP DL380. This type of server usually has two on-board Ethernet ports, perhaps one on-board Fibre Channel port, and usually an external SCSI port for use with external drive arrays. In addition to all this, there are at least three PCI slots, up to six SCSI drives, and at least twice as much memory than a 1U machine. The extra PCI slot adds quite a bit of functionality, because it either can host an Adaptec SCSI card to support a local tape drive or library, which is sometimes necessary but never recommended, or it can host more network capability. At the bare minimum, at least two more NIC ports are required and perhaps a dual-port Fibre Channel adapter if there is not a pair of ports already in the server. Because this class of server can host six SCSI disks, they can be loaded up with more than 1TB of space, which makes the 2U server an excellent stand-alone ESX Server. Introduce dual-core processors and this box has the power to run many VMs. The major limitation on this class of server is the possible lack of network card space and the memory constraint. Even with these limitations, it is a superb class of server and provides all the necessary components to make an excellent ESX Server.

Figure 1.4

Front and back of 2U server

Pairing a 2U server with a small tape library to become an office in a box that ships to a remote location does not require a SAN or another form of remote storage because it has plenty of local disk space, to which another disk array connects easily. Nevertheless, the 2U has the same characteristics as a 1U box in many cases. Is the extra memory and PCI slot very important? It can be, and depending on the type of server, there might be a need for a dual or quad-port NIC, dual-port host bus adapter (HBA), and a SCSI adapter for a tape library. The extra slot, extra memory, and lots of local disk make this class of server an extremely good workhorse for ESX. It is possible to run between 6 and 24 VMs on these types of servers depending on available memory and whether DC processors are in use.


Best Practice for 2U Servers - Pick a server that has at least two on-board NIC ports, two on-board Fibre Channel ports, plenty of disk, and as much memory as possible. Add a quad-port network card to gain port density and, if necessary, two single-port Fibre Channel adapters add more redundancy


Large Server-Class Systems

The next discussion combines multiple classes of servers (see Figure 1.5). The class combines the 4, 8, and 16 processor machines. Independent of the processor count, all these servers have many of the same hardware features. Generally, they have four SCSI drives, at least six PCI slots, two on-board NICs, RAID memory, and very large memory footprints ranging from 32GB to 128GB. The RAID memory is just one technology that allows for the replacement of various components while the machine is still running, which can alleviate hardware-based downtime unless it's one of the critical components. RAID memory is extremely nice to have, but it is just a fraction of the total memory in the server and does not count as available memory to the server. For example, it is possible to put a full 80GB of memory into an HP DL760, but the OS will only see 64GB of memory. The missing 16GB becomes the RAID memory pool, which comes into use only if there is a bad memory stick discovered by the hardware. Generally, the larger machines have fewer disks than the 2U servers do, but it makes up for that by having an abundance of PCI buses and slots enabling multiple Fibre Channel adapters and dual-port NICs for the highest level of redundancy. In these servers, the multiple Fibre Channel ports suggested by the general best practice would each be placed on different PCI buses, as would the NIC cards to get better performance and redundancy in PCI cards, SAN fabric, and networking. These types of servers can host a huge number of VMs. The minimum number of VMs is usually in the range of 20, but it can grow to as high as 50 depending on processor count, utilization, and load.

Figure 1.5

Back and front of large server-class machines

The Effects of External Storage

There are many different external storage devices, ranging from simple external drives, to disk arrays, shared disk arrays, active/passive SAN, active/active SAN, SCSI tape drives, to libraries, Fibre-attached tape libraries.... The list is endless actually, but we will be looking at the most common devices in use today and those most likely to be used in the future. We shall start with the simplest device and move on to the more complex devices. As we did with servers, this discussion points out the limitations or benefits in the technology so that all the facts are available when starting or modifying virtual infrastructure architecture.

For local disks, it is strongly recommended that you use SCSI/SAS RAID devices; although IDE is supported for running ESX, it does not have the capability to host a VMFS, so some form of external storage will be required. ESX version 3 supports local SATA devices, but they share the same limitations as IDE. In addition, if you are running any form of shared disk cluster, such as Microsoft Cluster servers, a local VMFS is required for the boot drives, yet remote storage is required for all shared volumes using raw disk maps. If one is not available, the shared disk cluster will fail with major locking issues.


Best Practice for Local Disks - Use SCSI or SAS disks.


Outside of local disks, the external disk tray or disk array (see Figure 1.6) is a common attachment and usually does not require more hardware outside of the disk array and the proper SCSI cable. However, like stand-alone servers, the local disk array does not enable the use of VMotion to hot migrate a VM. However, when VMotion is not required, this is a simple way to get more storage attached to a server. If the disk array is a SATA array, it is probably better to go to SCSI instead, because although you can add more space into SATA, SCSI is much faster and is supported on all versions of ESX.

Figure 1.6

Front and back of an external disk array

The next type of device is the shared disk array (see Figure 1.7), which has its own controllers and can be attached to a pair of servers instead of only one. The on-board controller allows logical unit numbers (LUNs) to be carved out and to be presented to the appropriate server or shared among the servers. It is possible to use this type of device to share only VMFS-formatted LUNs between at most four ESX hosts because that is generally the limit on how many SCSI interfaces that are available on each shared disk array. It is a very inexpensive way to create multi-machine redundancy. However, using this method limits the cluster of ESX Servers to exactly the number of SCSI ports that are available, and limits the methods for accessing raw LUNs from within VMs.

Figure 1.7

Front and back of a shared SCSI array


Best Practice for Local Storage - Use local or locally attached SCSI-based storage systems.


A SAN is one of the devices that will allow VMotion to be used and generally comes in an entry-level (see Figure 1.8) and enterprise-level (see Figure 1.9) styles. Each has its uses with ESX and all allow the sharing of data between multiple ESX hosts, which is the prime ingredient for the use of VMotion. SAN information is covered in detail in Chapter 5, "Storage with ESX."

Figure 1.8

Front and back of an entry-level SAN with SATA drives

Figure 1.9

Front and back of an enterprise-level SAN

Although SATA drives are not supported for ESX earlier than version 3.5, when directly attached to a host unless a SCSI to SATA bridge adapter is in use, they are supported if part of a SAN (refer to Figure 1.8). However, they are slower than using SCSI drives, so they may not be a good choice for primary VMDK storage, but would make a good temporary backup location; the best solution is to avoid non-SCSI drives as much as possible. Although the entry-level SAN is very good for small installations, enterprise-class installations really require an enterprise-level SAN (refer to Figure 1.9). The enterprise-level SAN provides a higher degree of redundancy, storage, and flexibility for ESX than an entry-level version. Both have their place in possible architectures. For example, if you are deploying ESX to a small office with a pair of servers, it is less expensive to deploy using an entry-level SAN than a full-sized enterprise-class SAN.


Best Practice for SAN Storage - Use SCSI-based SAN storage systems. For small installations, entry-level systems may be best; however, for anything else, it is best to use enterprise SAN systems for increased redundancy.


The last entry in the storage realm is that of NAS devices (see Figure 1.10), which present file systems using various protocols including Network File System (NFS), Internet SCSI (iSCSI), and Common Internet File System (CIFS). Of particular interest is the iSCSI protocol, which is SCSI over Internet Protocol (IP). This protocol is not supported as a storage location for virtual machine disk files in ESX versions earlier than 3.0, but support is available for later versions. With NAS, there is no need for Fibre Channel adapters, only more NICs to support the iSCSI and NFS protocols while providing redundancy. In general, iSCSI and NAS run slightly more slowly than Fibre Channel when looking at the raw speeds networking currently available.

Figure 1.10

NAS device


Best Practice for iSCSI - NAS or iSCSI are not supported on versions earlier than ESX version 3.0; do not use this device until an upgrade is available. Also, have enough COS NIC ports to provide redundancy and bandwidth.


Examples

Now it is time to review what customers have done in relation to the comments in the previous sections. The following six examples are from real customers, not from our imagination. The solutions proposed use the best practices previously discussed and a little imagination.

Example 1: Existing Datacenter

A customer was in the midst of a hardware-upgrade cycle and decided to pursue alternatives to purchasing quite a bit of hardware; the customer wanted to avoid buying 300+ systems at a high cost. They decided to pursue ESX Server. Furthermore, the customer conducted an exhaustive internal process to determine the need to upgrade the 300+ systems and believes all of them could be migrated to ESX, because they meet or exceed the documented constraints. Their existing machine mix includes several newer machines from the last machine refresh (around 20), but is primarily made up of machines that are at least 2 to 3 generations old, running on processors no faster than 900MHz. The new ones range from 1.4GHz to 3.06GHz 2U machines (see Figure 1.4). The customer would also like to either make use of their existing hardware somehow or purchase very few machines to make up the necessary difference, because the price for ESX to run 300+ machines approaches their complete hardware budget. In addition, a last bit of information was also provided, and it really throws a monkey wrench into a good solution: They have five datacenters with their own SAN infrastructure.

Following best practices, we could immediately state that we could use the 3.06GHz hosts. Then we could determine whether there were enough to run everything. However, this example shows the need for something even more fundamental than just hardware to run 300+ virtual machines. It shows the need for an appropriate analysis of the running environment to first determine whether the 300+ servers are good candidates for migration, followed by a determination of which servers are best fit to be the hosts of the 300+ VMs. The tool used most often to perform this analysis is the AOG Capacity Planner. This tool will gather up various utilization and performance numbers for each server over a one- to two-month period. This information is then used to determine which servers make good candidates to run as VMs.


Best Practice - Use a capacity planner or something similar to get utilization and performance information about servers.


Related:
1 2 3 4 5 Page 3
Page 3 of 5
The 10 most powerful companies in enterprise networking 2022