Skip Links

Network World

Kerrie Meyler

Google’s email outage and the fallacy of 3 nine’s

Controlling outages through high availability and monitoring

By Kerrie Meyler on Mon, 09/07/09 - 10:59am.

Last Tuesday, September 1, Google's web-based email went offline. This follows another outage in May. The September outage was caused by engineers taking some servers offline and inadvertently overloading other servers as a result. May's outage was caused by a traffic routing error. There were previous outages earlier in May, one in mid-April, and another in March.

What does this mean?

For one thing, both Google and Hotmail promise 99.9% availability (also known "3 nines"). As good as that sounds, 3 nines doesn't matter during that other one-tenth of one percent. 3 nines also equates to nearly 44 minutes of unscheduled downtime per month, or 8 hours and 45 minutes per year. The more nines, the more uptime you have - with 4 nines only 52 minutes of downtime per year, and the vaunted 5 nines delivering 5 minutes 16 seconds of unscheduled downtime in a year's time. To be considered truly available, 3 nines really doesn't cut it. (And this is only regarding unscheduled downtime, not maintenance windows!)

While I don't know how Google structures its operations, there are some other areas to look at here. How does one control unscheduled downtime? Building redundancy, a.k.a high availability, into one's operations certainly can help. If Google had backup servers they could use or had clustered their servers, that may have prevented the last outage.

Another approach that can be helpful is monitoring your production environment to know what's going on. If you could see that other servers were starting to get overloaded, you could proactively bring on other servers or reassign the workload, perhaps using virtualization to do this quickly.

Monitoring tools and virtualization are not new technologies. Back in the days of "big iron" and mainframes, IBM had VM as an operating system and both they and third parties had monitoring tools. Today we also have virtualization technologies and monitoring tools. This may or may not get you to 6 nines of availability, but being able to proactively monitor your production environment and architect high availability can make a big difference.

About Managing Microsoft

Kerrie Meyler, MVP, MCSE, MCTS, MCT, is an independent consultant and trainer with over fifteen years of experience in IT. While at Microsoft in Field Technical Sales for four years she focused on infrastructure and mangement, presenting at numerous product launches. Kerrie has presented Operations Manager 2007 at TechEd 2007, MMS 2009, MMS 2011, and internal Microsoft conferences, receiving company recognition and awards including a SPAR MGS award. Kerrie worked with Microsoft Learning to develop functional specifications for the original Operations Manager Microsoft courseware, 2550: Implementing Microsoft Operations Manager 2000 and did the beta teach for that course.She also participated in development for several System Center certification exams.

Kerrie is the lead author of Microsoft Operations Manager 2005 Unleashed, System Center Operations Manager 2007 Unleashed, System Center Configuration Manager (SCCM) 2007 Unleashed, System Center Operations Manager 2007 R2 Unleashed, System Center Opalis Integration Server 6.3 Unleashed and System Center Service Manager 2010 Unleashed.

Check out an excerpt from System Center Operations Manager 2007 Unleashed, Chapter 3: Looking Inside OpsMgr.

You can also check out an excerpt from System Center Configuration (SCCM) Manager 2007 Unleashed, Chapter 3: Looking Inside ConfigMgr.

Read a sample chapter of System Center Operations Manager 2007 R2 Unleashed at Chapter 1: Introduction and What's New.

You can also read a sample chapter of System Center Opalis Integration Server 6.3 Unleashed at Chapter 1: Introducing Opalis Integration Server 6.3 and System Center Service Manager 2010 Unleashed at Chapter 1:Service Management Basics.

System Center Service Manager 2010 Unleashed was selected as the September, 2011 book giveaway for Microsoft Subnet.

  • Enter the monthly contest.
  • Read a free chapter excerpt of System Center Service Manager 2010 Unleashed
  • Buy the book.
  •  

    Most Discussed Posts