Microsoft Subnet An independent Microsoft community View more

Should Azure customers worry about reliability?

Outage hints of maintenance process failure

It’s hard to stay on top of everything all the time so it’s understandable that something like renewing a security certificate could fall through the cracks as it did to Microsoft last week, grinding its Azure Cloud Service to a halt.

But if you provide a critical service to corporate customers,  routine updates -  like renewing certificates before they expire – ought to be just another routine part of doing business, details that gets taken care of in a routine way.

BACKGROUND: Microsoft's Azure service hit by expired SSL certificate 

RELATED: Microsoft Azure overtakes Amazon's cloud in performance test 

Apparently if there was such a routine it somehow broke down. Microsoft says is still sorting out what went wrong in order to prevent something similar from happening in the future.

Meanwhile businesses using Azure Cloud Service should reevaluate how much they entrust to it. They should have done this in the first place before buying the service, but even if they did it doesn’t hurt to review based on the outage.

Business-critical data that must be accessible all the time clearly does not belong in the Azure cloud unless it’s also available someplace else.

“All the time” is a tall order, something that even private storage could fail to achieve. The standard for most service providers – established by phone companies – is 99.999% uptime. That means downtime of just 25.9 seconds per month.

Microsoft’s SLA for Azure Storage Service kicks in when the monthly uptime percentage drops to 99.9%, which means downtime of 43.8 minutes per month. At that point customers are eligible for a 10% service credit, according to Microsoft’s SLA for the service.

If uptime drops to 99% - which translates to 7.2 hours per month downtime – customers are entitled to a 25% credit. Friday's outage was so bad that Microsoft says it will waive the requirement that customers report that service failures within 5 business days. The company is automatically crediting affected customers, according to a Microsoft blog written by Steven Martin the general manager of Windows Azure Business & Operations.

According to Microsoft’s timeframe the outage lasted from 3:44 p.m. Eastern Friday to 4 a.m. Eastern Saturday when more than 99% of customers had service restored. That’s about 11 hours, 16 minutes of downtime, which is below the 99% threshold for awarding a 25% service credit.

Getting a credit is great as far as it goes, but SLAs don’t prevent downtime. They just give providers an incentive to minimize it, and as this case shows they don’t always succeed. Azure had another outage just about a year ago for different reasons and affecting just its management services.

These two events don’t condemn Azure services, but they should encourage customers to carefully consider what types of data these services are appropriate for and what types they are not.

(Tim Greene covers Microsoft for Network World and writes the Mostly Microsoft blog. Reach him at tgreene@nww.com and follow him on Twitter https://twitter.com/#!/Tim_Greene.)

More on Microsoft:

Windows 8 guru names the top 8 trends at CES 

Windows 8 portables to get inexpensive, long-lived by Xmas 2013?

‘Christmas gift for someone you hate: Windows 8’

Rumored follow-ons for Surface tablets; reduced orders for original Surface

Microsoft buys a starring role for its Surface tablet on TV’s 'Suburgatory'

Microsoft bets the farm on Metro 

Windows Server 2012 isn't available yet, but it's running Bing

Is Google taking a run at Windows 8?

This Windows 8 tablet might actually be a PC

Demise of Cius offers lessons for Windows 8

Why aren’t Apple and Amazon dumping on Windows RT?

Insider Tip: 12 easy ways to tune your Wi-Fi network
Editors' Picks
Join the discussion
Be the first to comment on this article. Our Commenting Policies