Gmail response shows Google gets it

Sometimes, you just need to say you're sorry--and that's just what Google did after its Gmail service experienced a three-hour outage yesterday. The outage, which took place from 1:30 a.m. to 4:30 a.m. PST, affected primarily users in Europe and Asia, who were unable to access their e-mail during prime-time working hours. Google's response? It got the service back up and running quickly, apologized, explained the problem -- and paid out on the SLAs for its Enterprise Apps users. Perhaps it's finally getting this enterprise-level crisis management thing down.

First, the apology. The Official Gmail Blog made sure all users know it feels their pain:

"while many of our users in the U.S. were asleep, many people couldn't access their email. Lots of people around the world who rely on Gmail were disrupted during their waking and working hours, and we’re very sorry.

Next, the explanation. Seems that routine maintenance in one European data center conflicted with software that aims to keep data geographically close, thwarting Google's best laid contingency plans:

"This morning, there was a routine maintenance event in one of our European data centers. This typically causes no disruption because accounts are simply served out of another data center. Unexpected side effects of some new code that tries to keep data geographically close to its owner caused another data center in Europe to become overloaded, and that caused cascading problems from one data center to another. It took us about an hour to get it all back under control.

Google says it identified the problem and fixed the bug so that a similar overload won't happen in the future. Still, enterprise Apps users had to feel good about the response. Not only could affected users get some work done, since Gmail's offline capability is up and running, but their SLAs kicked in, which means they now get an extra 15 days of service tacked onto their contracts. Not bad.

When it comes to cloud computing, the question is never if a service will go down, but when -- and the key to keeping customers happy is all about the response. It looks like Google finally "gets it."

* * *

Like this post? Visit the Google Subnet home page for more news, blogs and podcasts.

More blog posts from Google Subnet:

Sign up for the weekly Google newsletter. (Click on News/Google News Alert.)

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.
Now read: Getting grounded in IoT