Despite restoring service for many customers, Amazon's troubles with its Elastic Compute Cloud (EC2) and Relational Database Service continue this morning, meaning that debate over the wisdom of relying on such cloud services is certain to grow louder.
The latest report from Amazon's Service Health Dashboard was posted just before 6 a.m. EDT and reads:
We continue to make progress in restoring volumes but don't yet have an estimated time of recovery for the remainder of the affected volumes. We will continue to update this status and provide a time frame when available.
(See subsequent 9:18 EDT update below.)
A message from Amazon posted about four hours earlier paints a picture of the recovery effort:
Just a short note to let you know that the team continues to be all-hands on deck trying to add capacity to the affected Availability Zone to re-mirror stuck volumes. It's taking us longer than we anticipated to add capacity to this fleet. When we have an updated ETA or meaningful new update, we will make sure to post it here. But, we can assure you that the team is working this hard and will do so as long as it takes to get this resolved.
Yesterday the outage knocked an untold number of Web sites off the Internet, including Reddit, Foursquare, Quora and Hootsuite. This morning all but Reddit appear to be up and running normally. The Reddit homepage carries this message:
Reddit is in "emergency read-only mode" right now because Amazon is experiencing a degradation. They are working on it but we are still waiting for them to get to our volumes. There is no ETA at this time, but we are trying to work some magic and will very slowly be bringing the site back up. Please stand by.
Reddit was at full service for a time very early this morning, as evidenced by this post from the social bookmarking headlined: "And we're back."
Now they're not.
(Update, 9:18 EDT: Latest from Amazon: "We're starting to see more meaningful progress in restoring volumes (many have been restored in the last few hours) and expect this progress to continue over the next few hours. We expect that well reach a point where a minority of these stuck volumes will need to be restored with a more time consuming process, using backups made to S3 yesterday (these will have longer recovery times for the affected volumes). When we get to that point, we'll let folks know. As volumes are restored, they become available to running instances, however they will not be able to be detached until we enable the API commands in the affected Availability Zone." ... Reddit remains in read-only mode, which I can only imagine has caused a Redditor head or two to explode.)
(Update 2, 11:58 EDT: I can't escape this story even while eating my lunch. Just learned that the writers at my favorite political Web site, Talking Points Memo, are the only ones doing any talking there at the moment. A message at the end of each TPM story reads: "Due to an Amazon Web Services (AWS) outage, login and comments are currently unavailable.")
(Update 3: Latest from Amazon, posted at about noontime here, reports additional progress but makes clear that at least some customers have a long slot ahead: "We continue to see progress in recovering volumes, and have heard many additional customers confirm that they're recovering. Our current estimate is that the majority of volumes will be recovered over the next 5 to 6 hours. As we mentioned in our last post, a smaller number of volumes will require a more time consuming process to recover, and we anticipate that those will take longer to recover. We will continue to keep everyone updated as we have additional information.")
(Update 4: Nothing more from Amazon as of 5:00 p.m. EDT. Please check Amazon's Service Health Dashboard for further updates.)