Cisco says router bug could be result of ‘cosmic radiation’ … Seriously?

UPDATED: See Cisco's further explanation below

cosmic radiation cisco
Stephen Sauer

A Cisco bug report addressing “partial data traffic loss” on the company’s ASR 9000 Series routers contends that a “possible trigger is cosmic radiation causing SEU soft errors.”

Cosmic radiation? While we all know that cosmic radiation can wreak havoc on electronic devices, there’s far less agreement as to the likelihood of it being the culprit in this case. Or that Cisco could know one way or the other.

A reader of Reddit’s section devoted to networking asks the question: “Has anyone ever seen ‘cosmic radiation’ as a cause for software errors in a bug report before? The ‘fix’ is to reload the line card. This did resolve the issue in our case. Anybody else experience this?”

Here are a few of the replies:

Redditor: “Ex-TAC engineer here! Cosmic radiation is legit! However it's gotten a bad rep as it's not well explained and it's not the be-all and end-all of outages.

“It IS possible for bits to be flipped in memory by stray background radiation. However it's mostly impossible to detect the reason as to WHERE or WHEN this happens. … Also, cosmic radiation does not home in on a specific part of your box.... It would also hit the control plane and other parts. ECC memory tends to make this a non-issue.

“I'd call bullshit in this case. Request an EFA (Engineering Failure Analysis) to see if the hardware itself is at fault. If EFA comes back clean, then it's most likely software.”

A second commenter believes that “cosmic radiation” is not intended to be taken seriously.

Redditor 2: “I've certainly seen the 'cosmic radiation' or 'cosmic rays' explanation for supervisor reloads and parity errors before. Fairly tongue-in-cheek ('we don't know the cause yet') but I completely understand if you don't find it amusing when dealing with a frustrating fault.”

Another commenter sees obfuscation, not radiation:

Redditor 3: “I have seen this as well. When I see that … I always just laugh because there's not much anyone can say.... Which I think is the point.”

This isn’t the first time Cisco has cited “cosmic radiation” as a troublemaker, as this forum on Ars Technica includes entries dating back to the late 1990s. Even back then the claim was met mostly with derision.

So what does Cisco have to say? Is cosmic radiation really roughing up its routers? Or is that explanation really a smokescreen? I’ve reached out to the company and a public relations professional has promised to track down an explanation.

(UPDATE: Cisco’s reply: “While we can’t speak to this particular case, Cisco has conducted extensive research, dating back to 2001, on the effects cosmic radiation can have on our service provider networking hardware, system architectures and software designs. Despite being rare, as electronics operate at faster speeds and the density of silicon chips increases, it becomes more likely that a stray bit of energy could cause problems that affect the performance of a router or switch.

“Cisco published a blog post on this topic in January 2012. In an effort to minimize the impact of radiation from “Single Event Upsets” (SEUs), we sought to redesign our technology with custom silicon chips and software, and adopt protocols that utilize resiliency features.”)

Welcome regulars and passersby. Here are a few more recent buzzblog items. And, if you’d like to receive Buzzblog via e-mail newsletter, here’s where to sign up. You can follow me on Twitter here and on Google+ here.

070616blog box open

Copyright © 2016 IDG Communications, Inc.

The 10 most powerful companies in enterprise networking 2022