Americas

  • United States
by Matt Kolon, Juniper

Is graceful restart the best way to ensure IP reliability? Yes.

Opinion
Jul 19, 20043 mins
Networking

Two industry experts debate the merits of graceful restart vs. nonstop routing.

Accepting that router control plane failures will occur occasionally, and providing a comprehensive and forthright way to handle these events, is a fundamental step toward five-nines availability. Graceful restart of router control protocols is the best way to accomplish this.

There are many facets to the problem of giving an IP network greater than 99.999% availability, including those related to network design, hardware redundancy and technology choices. But accepting that router  control plane failures will occur occasionally, and providing a comprehensive and forthright way to handle these events, is a fundamental step toward five-nines availability. Graceful restart of router control protocols is the best way to accomplish this.


The other side by Nortel’s Ru Wadasinghe Face-off forum Debate the issue with Kolon and Wadasinghe.


The exact mechanisms by which graceful restart functions are protocol-specific and, therefore, defined in separate IETF documents for each of the routing and control protocols in question. The general idea of their operation, however, is similar: allow a controlled pause in the control plane of a router during which traffic forwarding continues, with no need to cause a “convergence event” that puts the network as a whole into an unstable state. This is possible because modern router designs separate the functions of routing (the control flow of destination information) and forwarding (the actual data flow of packets through the router). While it used to be true that loss of a control connection meant that forwarding had stopped, the separation of routing and forwarding functions allows adjustments in, changes to or even restarts of the router control plane during normal operation – leaving the flow of customer traffic unaffected.

Graceful restart lets a router that might need to drop its control plane connections to its routing peers for a short time alert those peers of this capability proactively at the time of initial session establishment. Graceful restart also provides for standard protocol action to resynchronize the control connections or escalate the situation to a general forwarding failure if hardware or other failures require it.

Graceful restart provides several clear advantages over proprietary methods that seek to provide router availability. These include:

•  Simplicity. The mechanisms by which graceful restart operates are easily understood by anyone with a knowledge of routing protocols.

•  Vendor-neutrality. Graceful restart is defined in IETF documents that any vendor can follow and implement, and thereby be assured of interoperability.

•  Transparency. Graceful restart does not require proprietary changes to protocol operation, and control plane failures can be modeled in deterministic fashion, with a clear sense of what will or will not happen in a given failure scenario.

•  Addressing of software failures. When a software bug causes a router failure, graceful restart mechanisms let the router “start fresh” with a new instance of the routing tables and the configuration state that derives them.

Graceful restart provides a comprehensive, vendor-neutral and transparent mechanism for handling control plane issues. Service providers and vendors that seek to serve them are all best served by implementing it, because letting failures be handled without interrupting forwarding is a pragmatic step toward a truly nonstop control plane.

Kolon is a senior technical consultant at Juniper . He can be reached at matt@juniper.net.