Amazon struggles to restore lost data to European cloud customers

Developers vent frustration on Amazon support forum

Developers are growing increasingly frustrated over the inability to access applications they have built on top of Amazon's cloud service.

Amazon is struggling to restore customer data lost because of an outage at a data center in Ireland, as developers grow increasingly frustrated over the inability to access applications they have built on top of Amazon's cloud service.

"We are continuing to make steady progress in delivering recovery snapshots to affected customers accounts," Amazon wrote in its status dashboard Tuesday morning. The previous day, Amazon said a software error caused the deletion of Elastic Block Storage (EBS) snapshots that were incorrectly identified as unused.

FAILOVER LIMITS: Amazon EC2 outage calls 'availability zones' into question

But that progress is little consolation for developers who still cannot access their Amazon Elastic Compute Cloud data and complain that Amazon has not communicated sufficiently about the problems. "We've been waiting for over 48 hours now with extremely limited information on when the issue will be resolved," one customer wrote on an Amazon developer support forum.

Another customer noted that the Amazon failure should serve as a reminder that developers must back up their own data, either by transferring files to another location or making greater use of Amazon's S3 Simple Storage Service.

"EBS is not as reliable as we have been led to believe," the customer writes. "Hence, back the hell up of everything all the time and S3 the whole thing or better FTP out to wherever will have you."

Am Amazon employee who is monitoring the forum told customers that when their volumes are recovered they "will see a 'Recovery from SNAP-XXX' snapshot appear in your console."

But the Amazon employee said, "I cannot give an ETA on this volume for completion but we are working hard to resolve this ASAP."

The recovery process described by Amazon sounds iffy.

"We are in the process of creating a copy of the affected snapshots where we've replaced the missing blocks with empty block(s)," Amazon said. "Customers can then create a volume from that copy and run a recovery tool on it (e.g. a file system recovery tool like fsck); in some cases this may restore normal volume operation. ... We apologize for any potential impact it might have on customers applications."

In addition to the European problems that began Sunday with a lightning strike in Dublin, Amazon reported a U.S. outage originating in its Virginia data center this week. The Dublin lightning strike also took a Microsoft cloud service offline.

The outages come four months after a U.S. outage that lasted several days and called into question the effectiveness of Amazon's high availability services. While Amazon offers the ability to spread applications across multiple "availability zones" to ensure uptime in case one fails, the outage in April took multiple zones offline.

In a post-mortem after the April outage, Amazon said bad execution during a planned upgrade took systems offline, and promised to upgrade its own systems and make it easier for customers to build high availability services into the applications they host in Amazon data centers.

Follow Jon Brodkin on Twitter:

Learn more about this topic

Amazon Web Services reports outage in the U.S. late Monday  

Software error complicates Amazon's data center recovery in Ireland  

Amazon: Bad execution during planned upgrade caused outage

Must read: 11 hidden tips and tweaks for Windows 10
View Comments
You Might Like
Join the discussion
Be the first to comment on this article. Our Commenting Policies