Emerging network technologies such as 10G Ethernet and InfiniBand make it possible to link servers at high speeds, but for servers to fully benefit, network interfaces need to be replaced with a more efficient architecture.
Traditional hardware and software architecture imposes a significant load on a server's CPU and memory because data must be copied between the kernel and application. Memory bottlenecks become more severe as connection speeds exceed the processing power and memory bandwidth of servers.
Remote Direct Memory Access (RDMA) is a network interface card (NIC) feature that lets one computer directly place information into the memory of another computer. The technology reduces latency by minimizing demands on bandwidth and processing overhead.
This is achieved by implementing a reliable transport protocol in hardware on the NIC and by supporting zero-copy networking with kernel bypass.
Zero-copy networking lets the NIC transfer data directly to or from application memory, eliminating the need to copy data between application memory and the kernel.
Kernel bypass lets applications issue commands to the NIC without having to execute a kernel call. The RDMA request is issued from user space to the local NIC and over the network to the remote NIC without requiring any kernel involvement. This reduces the number of context switches between kernel space and user space while handling network traffic.
When an application performs an RDMA Read or Write request, no data copying is performed. The RDMA request is issued from an application running in user space to the local NIC and then carried over the network to the remote NIC without requiring any kernel involvement. Request completions might be processed either entirely in user space (by polling a user-level completion queue) or through the kernel in cases where the application wishes to sleep until a completion occurs.
RDMA operations let an application read from or write to the memory of a remote application. The remote virtual memory address for the operation is carried within the RDMA message. There is no need for the remote application to do anything other than register the relevant memory buffer with its local NIC. The CPUs in the remote node are not at all involved in the incoming RDMA operation, and they incur no load.
Key value
An application can protect its memory from arbitrary access by remote applications through key value use. The application issuing the RDMA operation must specify the correct key for the remote memory region that it is trying to access. The remote app obtains the key value when it registers the memory with its local NIC.
The issuing application also must determine the remote memory address and the key for the memory region. The remote application informs the issuing application about the beginning virtual address, size and key of the memory region. It transmits this information using a send operation before the issuing application can start issuing RDMAs to that memory region.
InfiniBand networks and networks implementing the Virtual Interface Architecture support RDMA. RDMA over TCP/IP using NICs with transport offload engines is under development.
Protocols that use RDMA to achieve high performance include Sockets Direct Protocol, SCSI RDMA Protocol (SRP) and Direct Access File System (DAFS). Communication libraries that use RDMA include Direct Access Provider Library (DAPL), Message Passing Interface (MPI) and Virtual Interface Provider Library (VIPL).
Clusters running distributed applications are one of the first areas where RDMA can make a strong impact. Using RDMA through DAPL or VIPL, database software running on clusters can scale better and achieve higher performance with the same number of nodes.
Clustered scientific computing applications that use MPI also see a dramatic performance improvement as a result of the low latency, low overhead and high throughput that interconnects supporting RDMA provide. Other early applications of RDMA are remote file server access via DAFS, and storage access for blade servers via SRP.
RDMA is fast becoming an essential feature of high-speed clusters and server-area networks.