OS bypass eliminates overhead

The RDMA Consortium and IETF have developed a set of standard extensions to Ethernet and TCP/IP that eliminate all three sources of CPU overhead.

The demand for higher-speed server interconnects to support clustering, storage networks and bulk data movement continues to drive Ethernet's evolution. In transitioning from 1G to 10G bit/sec data rates, Ethernet is poised to handle the most demanding data center applications in these three areas.

However, taking full advantage of this tenfold increase in performance requires the elimination of the three elements of host CPU overhead related to networking: buffer copies, transport processing and application context switches. Recognizing the host CPU overhead problem, the RDMA Consortium and IETF have developed a set of standard extensions to Ethernet and TCP/IP that eliminate all three sources of overhead. Collectively, these specifications are called iWarp.

While Remote Direct Memory Access (RDMA ) and Transport Offload Engines have made great strides in reducing overhead, a full 40% of network overhead is attributed to application context switches. Context switches occur when process execution moves from user space to kernel space. Of the three sources of network overhead, context switches have been discussed the least and warrant further consideration.

A change in context

Simply put, user space is where all user programs execute. Historically, applications operating in user space make system calls into the kernel for privileged operations such as I/O commands to network or storage devices.

Kernel space is where the operating system, device drivers and hardware interrupt handlers run. The kernel provides a safe interface to hardware, provides interprocess security, gives different processes a fair share of the resources, and arbitrates access to resources/hardware.

Transitions from user to kernel space (and the reverse) historically have been required to pass data between user programs and their clustering, storage and networking hardware resource. Each transition requires saving the user process context data and loading the kernel context data. The act of saving the user process information and loading the kernel process information is known as a context switch.

Typically, a context switch involves saving the address space and software stack information, and the register set (program counter, stack pointer, instruction register and other general processor registers) from the current process and loading the corresponding information for the new process. With this information, the CPU begins execution of the kernel process, using the restored registers and address space.

The overhead of saving and restoring context information limits application I/O performance. As mentioned before, in the case of TCP/IP user-to-kernel transition, it can account for approximately 40% of the host CPU networking overhead.

The technique for eliminating the user-to-kernel transition and its associated context switch is known as operating system bypass. As shown in the graphic, operating system calls are avoided by updating the I/O library to take advantage of operating system bypass capabilities. This modification is transparent to applications and enables direct communication of all commands to the I/O adapter, eliminating the user-to-kernel transition. Operating system bypass is a well-proven technique that has been used for years in the highest-performance cluster interconnects.

The iWarp specification lets an iWarp-compliant Ethernet channel adapter transparently replace Ethernet network interface cards because ECAs are completely compatible with today's Ethernet infrastructure - cables, switches/routers and applications. IWarp also defines a new interface that enables application software to communicate directly with an ECA. This provides additional benefits to applications demanding the highest levels of performance.

To fully eliminate host CPU overhead, an iWarp ECA must support RDMA, TCP offload and operating system bypass. Anything less will consume CPU resources as network load increases and is not a complete implementation of the iWarp specifications.

Operating system bypass

Meier is product manager for NetEffect's line of iWarp Ethernet channel adapters. He can be reached at kmeier@neteffect.com.

Editors' Picks
Join the discussion
Be the first to comment on this article. Our Commenting Policies