Is the cloud reliable enough for your business?

Outages at a cloud computing service could hurt your business. The question is whether your in-house systems can do better.

In April of last year, Satoshi Nakajima, founder of Washington-based Big Canvas Inc., was eagerly inviting new customers to subscribe to his company's flagship product, PhotoShare, which lets users swap Apple iPhone photos for free.

Three months later, Nakajima was extending those same subscribers heartfelt apologies.

His mea culpa wasn't in response to product glitches, poor customer service or any of the other growing pains known to plague start-ups. Rather, PhotoShare and its 50,000 subscribers (now 150,000 strong) had fallen victim to stormy weather in the cloud computing environment: a seven-hour outage on July 20 when Inc.'s S3 cloud service went down -- for the second time in 2008.

Nakajima pays US$900 a month for Amazon's cloud computing services. He subscribes to the vendor's Elastic Compute Cloud (EC2) for flexible computing capacity and Simple Storage Service (S3) for unlimited data storage space, which Big Canvas uses to store customers' photos. As a result of the outage, brought on by what Amazon said were poorly communicating servers, Big Canvas lost photos belonging to 50 customers. Nakajima called each to apologize personally.

"We simply told them: 'The last photos you posted are gone. I'm sorry; either resubmit them -- or forgive us," recalls Nakajima, formerly the lead software architect on Microsoft Windows.

Nakajima isn't the only business owner who's been forced to pick up the pieces after a cloud computing outage. In February, about 113 million Google Gmail subscribers had to wait patiently or turn to an alternate e-mail service when Google Inc.'s Web-based e-mail system went on the blink for several hours. And last month, 14% of Google Apps users faced slow service or interruptions because of a network traffic jam.

And last July 18, Apple's online cloud service, MobileMe, which synchronizes e-mail, contacts and calendar events, remained unavailable to users throughout much of the day, prompting disgruntled users to say things like "MAC.COM BLOWS!" on support forums.

Cloud Pros and Cons

Top perceived benefits of cloud computing:

* 1. Easy/fast to deploy

* 2. Pay for only what you use

* 3. Less in-house staff and lower costs

Top challenges of cloud computing:

* 1. Security

* 2. Performance

* 3. Availability

* 4. Hard to integrate it with in-house IT

* 5. Inability to customize it

What customers want from cloud computing:

* 1. Competitive pricing

* 2. Performance assurances

* 3. Understanding of my business and industry

* 4. Ability to move cloud offerings back on-premises

Source: IDC survey of 244 CIOs and business executives, September 2008

Such snafus haven't stopped an increasing number companies from turning to cloud computing services for pay-as-you-go processing power and storage space that don't require an investment in IT infrastructure. Research firm IDC predicts that worldwide IT spending on cloud services will grow almost threefold by 2012, reaching $42 billion. But as dependency grows, so too do concerns about cloud computing's reliability and whether big-name vendors like Amazon, Google and Apple will accept responsibility for outages.

In a 2008 IDC survey of 244 CIOs and business executives, more than 63% of the respondents rated performance and availability as two of the top three challenges surrounding cloud computing services. And nearly 75% said they consider security to be a serious concern.

Small businesses like Big Canvas aren't the only ones sweating cloud computing's shortcomings. Although start-ups are often the hardest hit by outages, even the venerable New York Times, which uses S3 to store and deliver articles from its historical database, was down for the count when Amazon Web Services suffered a two-hour outage in February 2008.

"A short outage of a mission-critical application could cost millions of dollars," warns John Sloan, an analyst at Info-Tech Research Group, a market analysis firm in London, Ontario.

Who is responsible for picking up the check -- and cleaning up the mess -- is a prickly question for today's host of cloud computing providers, including Amazon, Google, Nirvanix,, Akamai Technologies, XCalibre Communications and Rackspace Hosting. Part of the difficulty stems from the fact that few cloud providers offer service-level agreements (SLA) promising 99.99% uptime or rebates for excess downtime. And companies that insist on a guarantee of four-9s performance can expect to pay a hefty price.

"In order to guarantee 99.99% service levels, a provider is likely to charge you more," says Sloan, adding that other trade-offs could include having to sign a multiyear contract with a provider.

The Fine Print

Semantics can also come into play when assigning responsibility -- and blame. Google's SLA, for example, reads, "The Google Apps SLA does not apply to... any performance issues: (i) caused by factors outside of Google's reasonable control." Whether poorly communicating servers or denial-of-service attacks qualify as outside of "Google's reasonable control" is a debate for ace legal teams -- a luxury smaller businesses simply can't afford.

When all is said and done, a cloud computing vendor's reputation and track record may be the best indicators of reliability. On the upside, many providers are getting better at keeping their customers informed of service outages. For example, in February Google unveiled its Google Apps Status Dashboard, which provides subscribers with an at-a-glance look at the current availability of applications such as Google Gmail, Video and Docs. Outages are flagged with a red "x," whereas uptime is denoted with a green check mark.

Peering Into the Cloud

Here are five questions you need answered before moving your business to the cloud:

1. How does my vendor define "good customer service"? Cutting-edge services are key, but you need to find out what a particular vendor considers "good" service and what that service includes, from refund policies to technical assistance.

2. How comfortable am I with my vendor's physical facilities? Part of the due-diligence process includes examining a vendor's facilities and paying attention to the processes around the maintenance of the building, as well as the vendor's equipment maintenance schedule and the number of people working in the building.

3. What types of service interruptions should I be prepared for? Sometimes a vendor will have to shut down a portion of its facility for renovations or equipment upgrades -- activities that can significantly interrupt your cloud computing services. Find out how often a vendor plans to conduct maintenance checks and what kind of advance notice you can expect.

4. How quickly is my vendor growing? You need to know whether your vendor is technically capable of taking on a significant number of new subscribers without it impacting your service levels.

5. What follow-up procedures does my vendor have in place? Outages happen; the important issue is how quickly and effectively your vendor can get to the root of the problem.

Some vendors are even willing to compensate certain customers for service disruptions. Google, for example, offered Google Apps Premier Edition paying customers a 15-day credit to make up for the service outage that occurred in February 2008. But not everyone receives vendors' largesse in equal measure. In the case of Big Canvas, Nakajima says, Amazon didn't charge the company for its seven hours of downtime -- but that was a discount of less than 1% on its monthly bill.

"Vendors need to promise us 99.9% availability, and if they miss that number, then they should refund us for the whole month," he says.

That's not going to happen anytime soon, according to R 'Ray' Wang, an analyst at Forrester Research Inc. "Most existing cloud contracts don't cover the fact that it's a loss-of-revenue issue for companies," says Wang. "You'll receive credits for future service, but there's really no way to cover your losses."

Not everyone is crying foul over cloud computing providers' refund policies. Just ask Peter Sanchez. He's the founder of, a Los Angeles-based start-up that sells automated customer service software such as virtual chat agents to online retailers. Since its launch last April, SmartJabber has relied on Amazon S3 to store image files for chat windows, JavaScript files and Web site images. But last July, a major S3 outage prevented SmartJabber and its customers from accessing those files for more than six hours.

"Our customers' chat windows and Web site images weren't loading correctly, which ultimately makes their Web site look bad to visitors, so we had some complaints," recalls Sanchez.

Despite having received about 15 customer complaints, Sanchez is surprisingly laid-back about the ordeal, noting that he never even bothered to contact Amazon for a refund. As far as Sanchez is concerned, the occasional bout of downtime is a small price to pay for a storage service that costs just $35 a month -- a fraction of the nearly $500 a month he'd have to spend to replicate Amazon's storage capabilities with in-house servers.

"As long as an SLA is available for everybody to read, and the vendor isn't trying to hide anything, then you either have to accept the agreement or find someone that you think can provide a better level of service," says Sanchez.

Nor does he subscribe to the notion that a cloud computing contract requires forfeiting complete control of your systems. Rather, Sanchez says that in the event of an outage, SmartJabber can offload the data it stores on S3 onto its own local storage servers in a matter of minutes. "It's not the best solution, but it's something that would keep us chugging along," he says.

Nakajima has a similar emergency plan. Today, as a precautionary measure, Big Canvas' EC2 server temporarily caches users' photos before transferring them to the S3 server.

Despite some complaints, in-house IT departments would be hard-pressed to outperform the service levels currently being met by many providers, including Amazon and Google -- occasional outages and all.

Says Wang, "If you were to compare the amount of uptime that the cloud providers are delivering and what's being delivered by your own internal IT teams, you'll find out that the external ones are doing a much better job, mostly because they're under a higher level of scrutiny."

Related Reading

Security Manager's Journal

At this point, the cloud remains too leaky

In the end, it's up to each company to decide how much risk it is willing to take on -- and whether the damages accruing from a service disruption might offset the savings and convenience promised by a cloud computing service.

"Cloud computing is reliable enough that if your business can tolerate the occasional outage, you're just starting out, and you don't have a lot to invest , you can take a chance on it," says Info-Tech Research's Sloan. "You might possibly even build a business on it."

Waxer is a freelance writer in Toronto. Contact her at

This story, "Is the cloud reliable enough for your business?" was originally published by Computerworld.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.

Copyright © 2009 IDG Communications, Inc.