To misquote The Hitchhiker’s Guide to the Galaxy: Space is big. Really big. You just won't believe how vastly, hugely, mind-bogglingly big it is. I mean, you may think it's a long way down the road to the Apple Genius Bar, but that's just peanuts to space.
And out there, in the vast reaches of the cosmos, continuously streaming towards Earth are what are called cosmic rays which are protons and atomic nuclei theorized to come from both supernovae explosions and probably the centers of galaxies. The earth is continuously bombarded by these alien particles which, in turn, collide with the atmosphere and generate a whole range of secondary particles including neutrons, muons, pions and alpha particles.
Cosmic rays themselves have virtually no effect on living things but the secondary particles can because every now and then one of these will have enough energy to cause damage to whatever it strikes. Allow me to digress for a moment and mention the Oh-My-God Particle of which Wikipedia explains:
The Oh-My-God particle was an ultra-high-energy cosmic ray detected on the evening of 15 October 1991 over Dugway Proving Ground, Utah, by the University of Utah's Fly's Eye Cosmic Ray Detector. Its observation was a shock to astrophysicists (hence the name), who estimated its energy to be ... equivalent to a 142 g (5 oz) baseball travelling at about 26 m/s (94 km/h; 58 mph) ... roughly 60 times the collision energy of the Large Hadron Collider ... Since the first observation, at least fifteen similar events have been recorded, confirming the phenomenon. These ultra-high-energy cosmic ray particles are very rare ...
Now, if a secondary particle collides with a strand of DNA, the damage could result in a mutation, while if it’s something electronic, for example, a transistor in a processor chip, the result could be simply the flipping of a bit or, at worst, the destruction of some circuitry and thence a dead computer.
“Interesting” you’re probably thinking, “but, really, does this actually happen?” The answer, my friend, is resounding “yes!” According to Bharat Bhuva, professor of electrical engineering at Vanderbilt University, “This is a really big problem, but it is mostly invisible to the public.”
When Professor Bhuva says “invisible” he means that mostly we don’t notice the consequences. For example, if your copy of Justin Bieber's latest recording gets a bit flipped you’re unlikely to be aware anything has changed (it will still sound like whatever Bieber’s music is supposed to sound like). Even if your PC blue screens from a flipped bit or the processor dies from a trashed tansistor, you’ll most likely figure that the problem was due in the crappy software you’re running or the fact that your laptop just went out of warranty. But every now and then, one of these single-event upsets or SEUs causes havoc and, occasionally, it’s possible to trace the problem back to cosmic radiation as the cause.
In 2003, in a local election held in the town of Schaerbeek, Belgium, one candidate got an extra 4,096 votes which would have required the town to have more voters than citizens. Since this was obviously odd, an investigation was conducted which concluded that a single bit in a voting machine got flipped adding the extra votes. It was concluded that the most likely cause was an SEU. Other events that are thought to have been caused by SEUs including the sudden disengagement of the autopilot on a Quantas flight in 2008 which caused the aircraft to plummet 690 feet in 23 seconds injuring many of the passengers.
A 2004 report titled Soft errors' impact on system reliability by Cypress Semiconductor looked at the consequences of radiation-induced errors are concluded:
The potential impact on typical memory applications illustrates the importance of considering soft errors. A cell phone with one 4-Mbit, low-power memory … will likely have a soft error every 28 years. A high-end router with 10 Gbits of SRAM … can experience an error every 170 hours. For a router farm that uses 100 Gbits of memory, a potential networking error interrupting its proper operation could occur every 17 hours. Finally, consider a person on an airplane over the Atlantic at 35,000 ft working on a laptop with 256 Mbytes (2 Gbits) of memory. At this altitude, [there would be] a potential error every five hours.
As the feature size of microelectronics decreases, the amount of energy carried by a particle needed to cause an SEU decreases but there’s a tradeoff; being smaller, the features are smaller targets and therefore less likely to get hit. On the other hand, as systems become more complex, they incorporate more features so while failures caused by SEUs have decreased with successive generations of feature size, the failure rate of systems has increased. Bhuva was quoted in a Vanderbilt University press release on the impact of these findings:
The semiconductor manufacturers are very concerned about this problem because it is getting more serious as the size of the transistors in computer chips shrink and the power and capacity of our digital systems increase … In addition, microelectronic circuits are everywhere and our society is becoming increasingly dependent on them.
The press release continues:
The good news, Bhuva said, is that the aviation, medical equipment, IT, transportation, communications, financial and power industries are all aware of the problem and are taking steps to address it. “It is only the consumer electronics sector that has been lagging behind in addressing this problem.”
The engineer’s bottom line: “This is a major problem for industry and engineers, but it isn’t something that members of the general public need to worry much about.”
Who knew that failure is, indeed, not an option but a certainty?