• United States
by Ru Wadasinghe, Nortel

Is graceful restart the best way to ensure IP reliability? No.

Jul 19, 20043 mins

Two industry experts debate the merits of graceful restart vs. nonstop routing.

Protocol extensions are being developed to increase IP reliability. But by themselves, they are insufficient to fulfill the expectation for five-nines IP nodes that can support mission-critical, real-time voice, video and data in addition to lower-priority traffic. Devices entering this converged market must provide cost-effective reliability for many deployment scenarios – especially at the network edge. For non-redundant configurations supporting consumer-grade services or multi-vendor environments where some products don’t support non-stop routing, graceful restart can help improve reliability. However, to provide true five-nines reliability, nonstop routing is the best solution.

The other side by Juniper’s Matt Kolone

Face-off forum

Debate the issue with Kolon and Wadasinghe.

With graceful restart, each node depends upon all its neighbors running compatible protocol extensions. This becomes even more problematic if the neighbors are part of different autonomous systems.

In addition, the IETF’s graceful restart drafts are still not stable. Once stability is reached and vendors support the same drafts, their implementations might differ, resulting in costly and time-consuming testing and provisioning.

Despite safety mechanisms embedded in graceful restart, it is still susceptible to black holes and forwarding loops occurring because it operates on the premise of forwarding traffic without the control plane being operational.

Nonstop routing is a self-contained nodal capability that does not require protocol extensions and thus does not suffer from interoperability issues. It lets the node recover by itself without imposing requirements on other nodes, simplifying network engineering, testing and operations while reducing complexity and costs.

Non-stop routing provides for much faster recovery and convergence times because the routing protocols do not fail during control card switchovers. This greatly minimizes the time stale routes are in the forwarding table and the amount of control traffic, producing a more stable network topology.

Potential scalability and mirrored bug concerns with nonstop routing can be dealt with by processing and parsing only appropriate state information to the back-up processor, removing lock-step redundancy so that the specific combination of software bug and state are not replicated.

Even in the event of a systematic failure, nonstop routing is at least equivalent to graceful restart. If continuous corrupted control packets cause a failure, nonstop routing and graceful restart would have the same effect – except that nonstop routing would not have to wait for any potential timeout period before re-routing around the failing router.

Graceful restart is good for some deployment scenarios but insufficient for many. Nonstop routing doesn’t require complicated interoperability that creates higher operations costs or risk compounding outages by acting inappropriately to legitimate failures beyond the control plane.

Wadasinghe is senior product manager, multiservice edge, at Nortel. He can be reached at