When applications were exclusively hosted in the corporate data center, remote sites had much lower bandwidth and each required its own WAN optimization device. According to conventional wisdom, if we increase bandwidth, performance will improve. However, without decreasing latency, application performance will continue to suffer—no matter how much bandwidth we throw at the network.
Four main things contribute to latency:
- Propagation delay
- Serialization delay
- Queuing delay
- Processing delay
Propagation delay
This is the delay between two endpoints. For example, propagation delay is based on the speed of light measured at 5ms per 1000Km. The one-way propagation delay between a data center in New York and a branch in San Jose would be at least 24ms. This assumes a direct fiber path and no router hops, in which case the propagation delay will be significantly higher. For large carriers, one-way delays average 35-45 msec.
Serialization delay
Serialization delay is the amount of time required to clock a frame onto the transmission medium from the network interface. It is directly tied to link speed and is also affected by networks that use different data link protocols.
The serialization delay is calculated by dividing the packet size in bits by the link speed in megabits per second (Mbps) and includes all packet, Layer 2 and Layer 3 overhead. For the sake of simplicity, we’ll ignore the Layer 2 overhead in this discussion. It is likely to be different between a T1 and a 100Mbps link, and it’s sufficiently small to be negligible here.
Table 1: The serialization delay for different packet sizes and transmission rates
It is important to note that while the effective download rate may be different from and probably lower than the actual link-speed, the serialization delay is still dependent on the actual link speed only.
Queuing delay
This is the delay originating from packets sitting in the queues of intermediate routers between endpoints waiting to be serviced. It is the only delay component that is highly variable. Unlike propagation and serialization delays, which depend on the speed of light and the rate of a clock, queuing delays depend on the current volume of traffic. If a link or intermediate router experiences congestion, the queuing delay can increase from almost zero to several hundred milliseconds. Queuing delay swings depend on which class of traffic is being serviced and the configuration of the servicing router.
Network designers often try to avoid congestion by overprovisioning bandwidth, which requires the average and maximum load on a network to be well understood. But congestion can still arise from incidents such as DDoS attacks or link failures that shift traffic to paths that are normally less utilized.
A well-designed and implemented QoS policy along with properly marked traffic will help avoid congestion, at least for high-priority traffic, regardless of the overall load condition of the network. In extreme conditions, this may not hold true.
Processing delay
This is the time required to process packets as they transit a router. Processing includes the route lookup for forwarding, access list processing, cflowd flow data generation and recognition, and encryption. Processing delay may be slightly variable in software-based routers, but is usually static in hardware-based devices. For example, if a hardware-based forwarding engine has been designed to process 4 million packets per second, the processing delay naturally must be 1s/4M (250ns) for the forwarding engine to be able to keep up.
To understand the delay between endpoints while taking all the previously discussed components into account, national carriers in the U.S. are a good reference point. Both AT&T and Sprint publish their latency data online and typically set the latency between New York and San Jose to ~70ms.
Therefore, the latency between San Jose and San Diego is ~13ms. In reality, this might be different based on how the carrier network is laid out. For example, the connection between San Jose and San Diego might go through Phoenix, which would result in a delay between San Jose and Seattle larger than 13ms.
Regional performance hubs
Assessing latency data between different sites is a good way for enterprises to decide whether to deploy regional performance hubs based on user population density. Remotes sites that use performance hubs would connect to regional WAN optimization devices.
For example, if a performance hub were placed at a colocation facility in San Jose, most branches in west coast cities such as Seattle, Portland, Los Angles and Phoenix would be within 20msec delay. Similarly, deploying performance hubs at colocation facilities in several cities across the country for would provide an enterprise with an optimal network and very few devices to manage. Instead of 5,000 WAN optimization devices, an enterprise could deploy a total of 20 or up to 50 based on experience spread across its performance hubs, keep latency down and achieve optimal application performance.