This Week in NW
ISP backbones stand up in grueling 30-day performance test.
It's time to lay to rest the notion that ISPs can't deliver telephone-company-level reliability. The fact is, some routed IP network backbones now meet or exceed telco-grade performance. That's the key finding of a groundbreaking study of ISP backbone network performance conducted for Network World by Internet measurement experts Andrew Corlett and Robert Mandeville, along with Network Test, a Network World Global Test Alliance partner.
Site-specific performance for each provider
In a first for public network measurement, the seven participating ISPs placed one of our measurement devices in four locations across their U.S. backbone networks. We turned on the 28 measurement devices and let them run for a month, nonstop, collectively generating an astounding 4.5 billion packets. All told, we collected 156 million discrete measurements. If a network hiccuped for even 10 microsec, we knew about it.
The participants were Cable & Wireless, Level 3 Communications, Qwest, Savvis Communications, Sprint, Verio and WilTel Communications (formerly Williams Communications). A few other major players opted not to take part, citing various reasons (see “Not in the game").
Here's what we found:
It wasn't all good news, though. Two providers - Qwest and Verio - suffered outages that added up to more than an hour each during our monthlong test. Multiple providers asked for extended maintenance periods, meaning that (at least in theory) their networks could be offline while providers work on routers and switches. And while all ISPs put up excellent numbers for average jitter, the maximum jitter for all providers was well into the hundreds of milliseconds. That's high enough to virtually guarantee lousy application performance.
This project was a massive undertaking. Including planning and implementation, it took more than a year to complete. It involved setting up 29 test sites and generating 4,558,388,076 packets during the month of August 2002. We collected 156,050,656 discrete measurements.
One caveat regarding this test is that the results are not a simple guide for picking the best provider. Big as this project was, it measured only core networks. We plan to conduct additional tests focusing on provider edge and customer premises circuits.
Still, a network's performance is only as good as its core allows. Without a good backbone, a network can't be good, no matter what.
Testing made difficult
It's an understatement to say this project was more ambitious than the average trade magazine review. For one thing, we didn't ask test participants to send their widgets to us; instead, we went to them. Each of the seven participants installed the cNode, a hardware-based traffic generator and analyzer designed by Mandeville and Corlett, in four locations around the U.S., and we also maintained a central data repository.
Getting access to the backbones of Tier-1 ISPs was another complicating factor. The backbone networks we targeted for testing collectively carry the majority of the global Internet's traffic, and access to the network infrastructure is very carefully controlled. We went through many rounds of conference calls with each provider, months of careful planning, and weeks of logistical legwork before we obtained a single measurement. To the participants' credit, each devoted senior network architects and routing engineers to support this project.
If these factors weren't enough, we encountered one more complication. On the last day of measurement, we received news that the primary investors funding CQOS Inc. had decided to cease the company's operations. It was only through a massive volunteer effort for weeks afterward that we were able to retrieve and analyze the test data.
The co-founders of the company, Corlett and Mandeville, are actively moving to make this measurement technology available to providers and end users. (For more, see www.iometrix.com.)
The myth of five nines
While the cNodes can record more than 70 measurements of network health, we focused on just three for this review: uptime, jitter and packet loss (see How we did it).
Uptime is the most important metric in assessing ISP backbone performance. After all, speedy throughput, low latency or special services won't make a bit of difference if the network isn't available.
In setting up the tests, we asked ISPs to let us know when they would perform scheduled maintenance, and we agreed to exclude performance measurements taken during these periods. By definition, a network (or parts of it) might be offline during a maintenance period. Carriers don't guarantee network availability during maintenance; that's why we kept separate tallies for periods of normal operation vs. those for scheduled and emergency maintenance.
For scheduled maintenance windows, the ISPs declared upfront, before the monthlong test, when they would run maintenance. For emergency windows, we required 48-hour advance notice; if an outage occurred with less notice than that, we logged it as part of normal operations (see graphic).
One issue with uptime is that there are many ways to define the metric. For this project, we considered uptime to be the percentage of the "normal" (nonmaintenance) window during which we could send and receive traffic.
It also might have been possible to send traffic during maintenance periods (it almost always was), but because the providers told us their networks could be down at that time we didn't count such measurements as "normal."
The cNodes offered traffic continuously. If a cNode observed packet loss, it started a timer. If no further packets were forthcoming within 10 seconds, the cNode would record the event as an outage and run an outage duration timer that would continue running until the network delivered more packets. Once the flow of packets resumed, the cNode would reset the outage duration timers to zero.
The gold standard for uptime, which service providers set and advertise, is five nines, meaning the network is available 99.999% of the time. It's a tough standard to meet: Five nines amounts to just 5 minutes of downtime per year (see graphic).
At least during normal operations, four ISPs met that goal with three of those surpassing it by achieving perfect 100% uptime during normal operations, in accordance to the five-nines standard. The three members of the perfect-uptime club were C&W, Level 3 and Savvis. WilTel followed just behind with 99.999% uptime.
Two more ISPs - Qwest and Sprint - achieved four-nines results during normal operations, with Sprint outperforming Qwest by a slight margin. The lowest number belonged to Verio, with 99.96% uptime.
These are remarkable results, but looks can be deceiving. The four ISPs with stellar scores got their good results, in part, by reducing the size of their normal windows.
C&W's results offer the most dramatic example of this: The 100% uptime numbers were somewhat easier to achieve because the network was in normal mode only 91% of the time. In effect, this measurement means the network was 100% up and maintenance-free - 91% of the time.
Qwest and Sprint were next in terms of the greatest amount of maintenance time declared. Both ISPs were in normal operations mode less than 95% of the time.
Happily, the distinctions between normal and maintenance windows were academic in our tests. C&W might have had a comparatively large maintenance window, but then again its network achieved perfect uptime during normal and maintenance periods.
Notably, the provider with the lowest uptime during normal operations - Verio - was also the provider with the least scheduled and emergency maintenance time.
Verio's relatively low uptime score during normal operations is reflected in the number of outages we measured within the normal window (see graphic).
Verio attributes the 357 outages we recorded - by far the most of any provider - to intermittent problems on three OC-48 circuits. Verio fixed the problems about a third of the way into our test, and we recorded no outages (and substantially better performance in other areas) after that.
Verio's troublesome OC-48 circuits led not only to frequent outages, but also the greatest amount of downtime, by far, of any ISP. This was perhaps the biggest single differentiator in the test.
Verio's total outage duration amounts to more than six hours for all locations. This does not mean Verio's network was down for six hours during our monthlong test; rather, it is merely the sum of all the outage durations at all locations. Even so, it's still far higher count than any other provider.
Qwest's total outage count was second highest to Verio's weighing in at just over an hour. However, in Qwest's case the problems are related to some (not all) flows destined for Sunnyvale, Calif. In four cases, total outage duration exceeded 890 seconds on this link. All Qwest's other measurements for total outage duration are at least one order of magnitude lower.
C&W and Savvis needn't have declared maintenance windows at all. Both providers' networks ran outage-free for the entire test period. (As noted, C&W's test window was a week shorter than that of other providers because of a configuration issue on its network.)
The issue of what is and isn't a maintenance window in part caused one major player, AT&T , to decline to take part in this evaluation. Citing the huge number of elements in its network, AT&T said its network would be in maintenance mode 30% of the time. AT&T understandably was unable to say exactly when it would run maintenance operations on each router in its vast network. But AT&T declined our request to reduce the size of its maintenance window, and it declined our offer to print its maintenance-window results along with numbers from the normal period.
Taken to its logical extreme, an unscrupulous provider could achieve excellent uptime numbers simply by declaring it would be doing scheduled maintenance 99.999% of the time and then ensuring it was up 100% of its 0.001% "normal" window.
Carriers can't declare some periods off-limits because it might affect their uptime scores and then include measurements from those periods if it turns out after the fact that there weren't any outages.
A case of the jitters
Jitter, or variation in packet arrival times, can have a serious effect on application performance. This is especially true for delay-sensitive apps such as voice or video, or real-time apps such as those used in delivering stock quotes.
The cNodes measure jitter by noting differences in packet arrival times. The cNode also can measure one-way delay, but this is most effective if sender and receiver are precisely synchronized using Global Positioning System (GPS) clocks.
Only Sprint supplied GPS clocks for this project, so they're the only ISP for which we present delay measurements.
The average jitter numbers are encouraging for all ISPs (see graphic). The average of the average jitter measurements for all providers was just 39 microsec, well below the level that would disrupt any application. Qwest did even better than that with an average jitter less than the 10-msec timestamp resolution of the cNodes. In other words, average jitter on Qwest's network was so low we couldn't measure it.
Three other ISPs - Level 3, Sprint and WilTel - also were near the minimum, with average jitter of just 10 microsec on some circuits.
It was a different story for maximum jitter. We recorded maximum jitter values of more than 100 msec for all seven ISPs. Numbers like that can affect application performance. The human eye will perceive degradations in video quality with jitter as little as 10 msec.
In the worst case, Verio recorded maximum jitter of 246.16 msec (about a quarter of a second) when sending User Datagram Protocol (UDP ) packets from Dallas to Chicago.
However, these measurements might not be as dire as they appear. While 200-msec jitter is certainly enough to disrupt the average voice-over-IP session, it's less clear how frequently the ISPs' networks suffered this kind of jitter.
As with any maximum measurement, all it takes is one bad jitter number to send the peak value sky-high. That peak value might not be representative of any of the other billions of packets involved.
We initially set up the cNodes to record 95th percentile jitter, meaning the value of jitter measurements within 95% of one another. Because of a configuration error, we turned off recording this measurement about a third of the way into the test window. For the period we did measure, 95th percentile jitter looked very good: Typically, it was around 40 to 100 microsec for all providers - close to the average jitter and nowhere near the level that would harm any application.
While it's unfortunate that we couldn't present 95th percentile jitter for the whole test duration, we can say that nearly all jitter measurements appear to be close to the average values. Jitter numbers were spot-checked for all seven ISPs for the entire test period, and most maximum jitter values are close to the averages - certainly well below the threshold where application performance might suffer.
Looking at loss
Packet loss is another key measure of network health. While certain applications can tolerate loss, it's generally the case that dropped packets degrade performance in some way.
A few dropped packets can cause TCP retransmissions, and this will reduce throughput and increase latency. Drop enough packets, and TCP sessions will time out. This is especially critical considering that more than 90% of Internet traffic runs over TCP.
Most ISPs kept loss minimal - for some ISPs, packet loss amounted to less than 0.01% of all packets transmitted.
C&W led all ISPs with a perfect score on packet loss. C&W's network delivered every one of the 462 million packets we fed its routers during the normal test window. And even though we officially measured loss only during normal operations, C&W's network also was perfect during maintenance periods.
The Level 3 and Savvis networks also delivered virtually all the 676 million packets they carried. Some loss existed for both these ISPs during normal operations. However, the amounts were so small that after rounding, the results equate to 0.00% loss.
The other ISPs experienced higher loss, although "higher" is a relative term. In the worst case - on three Verio circuits sending traffic to New York - packet loss ranged from 0.12% to 0.14%.
Curiously, packet loss only affected traffic headed to New York from Verio's Chicago, Dallas and Palo Alto POPs. Traffic from New York to these locations wasn't affected.
As noted, Verio attributes the problems with three OC-48 circuits at an unspecified location. Verio fixed the problems, and packet loss fell sharply after that. If we factor out the misbehaving OC-48s, Verio's average loss on all other circuits amounts to 0.01% of all traffic.
Newman is president of Network Test in Westlake Village, Calif., an independent benchmarking and network design consultancy. He can be reached at firstname.lastname@example.org. Mandeville and Corlett can be contacted at email@example.com and Andrew@iometrix.com.
Global Test Alliance
Newman is also a member of the Network World Global Test Alliance, a cooperative of the premier reviewers in the network industry, each bringing to bear years of practical experience on every review. For more Test Alliance information, including what it takes to become a member, go to www.nwfusion.com/alliance.
On top of the backbone