When a route processor fails, two new software features have been designed to maintain edge router integrity: stateful switchover and nonstop forwarding. If a route processor fails, is there a network outage? Not necessarily. When the network device recovers from a failure with undetectable disruption, then the network has not failed, because as far as end users are concerned there was no outage and no downtime. But even in cases when a route processor does fail, two new software features have been designed to maintain edge router integrity: stateful switchover (SSO) and nonstop forwarding (NSF). Stateful switchover allows for a hot-standby processor to take control of the failed route processor while maintaining connectivity. SSO also assures that network management systems can manage a device with two route processors as one system and one manageable entity. With SSO, both active and standby route processors maintain Layer 2 data-link connectivity information by checkpointing the minimal data required to maintain ATM, frame relay and Ethernet connections from the active route processor to the standby one. Maintaining the connection is imperative to minimize CPU utilization, reduce the amount of data loss during a switchover and quickly establish the standby processor in hot standby state. Additionally, any method to create an SSO environment must be able to scale to tens of thousands of interfaces, because routers on the Internet keep connection information on tens of thousands of other routers to which they might need to connect. To accomplish this, the goal is to attempt to maintain only what is necessary and cannot be re-created across the route processors. Examples of states that are kept across the route processors are physical interface state, permanent virtual circuit state and command synchronization. In a failure, SSO switches the system to the hot standby route processor. The failed one will attempt to reboot and operate as the new standby. This handoff happens without rebooting line cards; therefore without creating a link flap, which might cause connectivity protocols to be dropped. Every step of the SSO process is monitored through SNMP, informing the network management team that there was a route processor failure. This is critical because customers won’t call the network operation center to report a failure because their applications are never interrupted. The SNMP traps tell the network management systems the cause of the failure and if the failed route processor could reboot. If not, it needs to be replaced, which is done without taking the router out of service. Nonstop forwarding ensures IP packets are forwarded continuously during the process. It is not practical to attempt to maintain all the route table states across two route processors, because route tables can have 100,000 to 200,000 route entries. So, the Internet Engineering Task Force has proposed protocol restart extensions that enable nonstop forwarding for Border Gateway Protocol (BGP), Intermediate System to Intermediate System and Open Shortest Path First protocols. Similar extensions will be available for Enhanced Interior Gateway Routing Protocol. These extensions enable the maintaining of Layer 3 relationships between the router experiencing a restart and all its peer routers, without maintaining any state between the route processors, thus eliminating scalability issues. When two routers form a peering relationship, they exchange capabilities. New capabilities have been added that caution peers not to remove a failed router from the database because it could come back even before connectivity protocols time out. These new routing protocol extensions allow a restarting router to notify peers when it has returned, to request all the information it needs to rebuild its route tables and, in the case of BGP, to reestablish the TCP session between peers. NSF and SSO preserve user sessions during a route processor failure. Even voice-over-IP calls have survived SSO tests. SSO and NSF are just two of a wave of new features coming to networks that provide graceful recovery from different types of network failures. The result is a new level of end-to-end resiliency on networks. Goldberg is manager of the product management Internet technologies division at Cisco. He can be reached at cgoldber@cisco.com. Related content news Broadcom to lay off over 1,200 VMware employees as deal closes The closing of VMware’s $69 billion acquisition by Broadcom will lead to layoffs, with 1,267 VMware workers set to lose their jobs at the start of the new year. By Jon Gold Dec 01, 2023 3 mins Technology Industry Technology Industry Markets news analysis Cisco joins $10M funding round for Aviz Networks' enterprise SONiC drive Investment news follows a partnership between the vendors aimed at delivering an enterprise-grade SONiC offering for customers interested in the open-source network operating system. By Michael Cooney Dec 01, 2023 3 mins Network Management Software Network Management Software Network Management Software news Cisco CCNA and AWS cloud networking rank among highest paying IT certifications Cloud expertise and security know-how remain critical in building today’s networks, and these skills pay top dollar, according to Skillsoft’s annual ranking of the most valuable IT certifications. Demand for talent continues to outweigh s By Denise Dubie Nov 30, 2023 7 mins Certifications Certifications Certifications news Mainframe modernization gets a boost from Kyndryl, AWS collaboration Kyndryl and AWS have expanded their partnership to help enterprise customers simplify and accelerate their mainframe modernization initiatives. By Michael Cooney Nov 30, 2023 4 mins Mainframes Mainframes Mainframes Podcasts Videos Resources Events NEWSLETTERS Newsletter Promo Module Test Description for newsletter promo module. Please enter a valid email address Subscribe