Cisco routers were the source of a major outage May 15 in an NTT network in Japan, according to an investment firm bulletin.
Between 2,000 and 4,000 Cisco routers went down for about 7 hours in the NTT East network after a switchover to backup routes triggered the routers to rewrite routing tables, according to a bulletin from CIBC World Markets. The outage disconnected millions of broadband Internet users across most of eastern Japan.
Cisco says it could not say which specific router models were involved.
The Cisco Subnet blog is the official blog of the Network World Cisco Subnet community, managed by Editor Linda Leung. Cisco Subnet is the independent voice of Cisco customers and is your gateway to daily Cisco news, blogs, opinion, books, prize giveaways and more. Visit the Cisco Subnet home page daily and while you are there, subscribe to the Cisco Alert e-mail newsletter, which includes news and views generated by the Cisco Subnet community as well as Cisco-related stories on Network World and elsewhere on the Web.
|
|
Outage
Anyone who has run a large network long enough knows that the source of this could be at least any of these reasons: A) vendor software bug or hardware related issue, B) human error either fat finger or misconfigured/poorly configured routers/protocols, or managing the network in such a way that memory resources, for example, were not properly managed so that instabilities could creep in, or C) another vendor injecting malformed BGP information (or whatever else), which happened several years ago with a major provider, where again, Cisco was blamed, but their only fault was not coding BGP to account for flaws in the code of another vendor. It's hard to say whether the real details of this outage will get past the few who know the details intimately or not. Everyone will most certainly spin it to their advantage though. Just don't assume that the title of the article is complete or accurate. DISCLAIMER: Not a cisco employee, stockholder, etc. Just a passerby...
with regardless of the real
with regardless of the real cause of the problem, the outage raised a red flag that the major telecom operator MUST maintains two vendors stratagy.
Another upgrade
It seems that anytime there is a problem or security flaw The Vendor'response is "upgrade" not bad for a vendor! Upgrade=increased revenue. When will customer's say enough is enough
re: with regardless of the real
Can you elaborate on the two vendor strategy? I keep looking for an answer on this that I can agree with. Thanks.
Who is to say that the other
Who is to say that the other vendor would not have experienced the same issue. Just because you have 2 or more vendors in the network does not guarantee that both won't fail. It improves the odds, but does not eliminate them. So stop making statements like it is the end-all solution to problems.
It can in fact add problems where a single vendor solution would not...i.e. the ever be deviling finger pointing issue! Certainly interop is now a larger issue. I have no opinion either way, it is the blanket statement that irks....
Are you kidding me?
This is a new low for network world. While is widely known in the industry that network world is anti-establishment rag, and loves to bash the big guys, this story takes that to a new low.
This story is clearly blaming the Cisco routers for the outage, when in fact it most likely will turn out to be a mis-configuration issue. Is the the hardware vendor's fault that their equipment is mis-configured? I guarantee that this will not turn out to be some flaw in the equipment, at the worst it will turn out to be a flaw in a standard protocol (of which network world will still blame Cisco, even though the flaw will exist in all vendor's equipment).
Network World before you publish a story, you really should make sure you know WHY the outage occurred, and not just take a cheap shot at a vendor because the happen to be a market leader.
I hope that when the truth on why this outage occurred comes out that you at least have the stones to publish it, unbiased for once.
Well, if the equipment
Well, if the equipment didn't fail, vendor support most certainly did. I can't imagine NTT working solo on what is probably a boilerplate configuration on those thousands of routers. Vendors should know what their gear can and can't do (aka best practices), and have a vested interest in selling a client the correct solution, so they don't look like fools when the house of cards tumbles. So what if a core router failover causes a full BGP table update? The gear should be ready to handle it, or should be configured only to accept a partial feed. In any case, I'm not going to give any credit to ci$co here. I've experienced firsthand shoddy, unstable IOS code, with questionable QA and revision control.
Well, if the equipment - REPLY
No vendor can make any client "do it right" unless the client wants to and knows how, even when the client pays for advanced support contracts. I've experienced firsthand shoddy, unstable network engineering practices that result later in blaming the vendor when there is a meltdown. The vendor could not MAKE them upgrade some critical routers' memory, despite trying for months. The memory in those routers 3 years ago was sufficient. Route tables grow. Memory is no longer sufficient. Nothing is done for nearly a year, except that a network becomes and remains very unpredictable. It's just TOOOOO easy to blame the big target instead of accept personal blame sometimes. Does IOS sometimes have serious issues? Yes. Can good procedures nearly always avoid those? Certainly.
NTT -- process fanatics
Having worked in Japan with NTT I am pretty sure they would have strong process in place and be lock step with the vendor recommendations. They try everything in their major labs before the roll it out. Stuff happens but agree that the code / architecture should be self healing. Of course this is a good opportunity for Cisco to sell new stuff which is how one of their Marketing folks responded to a question about this outage.
If this happened to any other vendor I am sure it would get a lot more negative news than what has happened here.
As someone who has had to
As someone who has had to suffer the ordeal of a large number Cisco router crashes, your comments sound just like the arrogant Cisco sales people that I have had this misfortune to deal with. Cisco always tries to blame the configs, the reality is IOS is well past it's sell by date.
One of the best decisions we ever made was to dump Cisco and deploy Juniper "amazing they don't crash" so now I can go home at night and get a good nights sleep, not having to worry about the network falling over. It's been amazingly good for my health.