Skip Links

Network World

  • Social Web 
  • Email 
  • Close

(Comma separation for multiple addresses)
Your Message:

Rackspace aims to repair credibility in wake of power failures

Customers lost service on June 29 and July 7
By Jon Brodkin, Network World
July 10, 2009 04:02 PM ET
  • Share/Email
  • Tweet This
  • Comment
  • Print

It's been a difficult two weeks for Rackspace and its users, with two power outages in a co-location facility interrupting service for an estimated 2,000 customers.

Rackspace, which prides itself on “fanatical support,” has been open about its failures, communicating with customers directly and through the company's official blog and Twitter account. Open communication and a commitment to fixing technical problems will both be crucial for Rackspace as it attempts to repair damaged credibility, says CEO Lanham Napier.

“Any time we have an incident like this, it does impact our credibility,” Napier said in an interview Friday with Network World. “The only way we earn it back is we have to execute at a high level for a long time.”

Power outages on June 29 and July 7 hit Rackspace's 144,000-square-foot data center in the Dallas suburb of Grapevine. Rackspace operates nine data centers worldwide for about 60,000 customers. Within the Dallas facility, some customers experienced downtime of about 40 minutes on June 29 and on July 7 some customers suffered downtime of 15 to 20 minutes.

The facility has three “phases,” or physical areas, and both outages hit the same phase, affecting a total of about 2,000 customers, according to Rackspace. Judging by comments on a recent Network World article, reactions range from anger at Rackspace for not eliminating every point of failure to acceptance that downtime can never be completely prevented and that Rackspace did well in quickly solving the problems and communicating with customers.

“I’m sure there will be some [customers] who are upset with us,” Napier said. “Let’s face it. We let them down. It wouldn't surprise me if some customers leave. I hope most of them stay with us.”

Rackspace has said it will issue between $2.5 million and $3.5 million in service credits to customers. Depending on the service a customer has paid for, service-level agreements can range between 99.9% uptime to 100%, Napier said.

On June 29, Rackspace suffered a utility power interruption, and was forced to move equipment over to generator power. The generators initially held the load and then failed, resulting in 40 minutes of downtime, Napier said.

An incident review cited failure of generators to synchronize with UPS systems, and failure of switches in the electrical infrastructure, preventing transfer of electrical load between different power sources. By July 3, the Rackspace blog reported that maintenance to the generator had “eliminated the excitation failures that caused recent customer disruptions.”

Trouble struck again on July 7 with the failure of a bus duct, a 10-foot, 300-pound piece of copper that distributes electricity. This prevented proper operation of a UPS system, taking customer servers down for about 20 minutes before Rackspace could connect them to generator power. The generators worked this time and carried the load for hours while workers replaced the bus duct, Napier said. Rackspace is still investigating the root cause of the bus duct failure, he said.

  • Share/Email
  • Tweet This
  • Comment
  • Print

Partner Content

Gartner 2009 Magic Quadrant for Job Scheduling

Gartner has positioned BMC CONTROL-M in the Leaders Quadrant of their "2009 Magic Quadrant for Job Scheduling." The report assesses the ability to execute and completeness of vision of key vendors in the marketplace. Read a full copy today, courtesy of BMC Software.

Download whitepaper

Dell's SMART Approach to Workload Automation

Read a compelling case study by EMA, Inc. to learn how Dell uses BMC CONTROL-M to cut cost and increase productivity with workload automation.

Download whitepaper

Workload Automation Cost Savings 2 Minute Video

A major computer manufacturer uses BMC CONTROL-M and just four people to schedule and run over 85,000 jobs every month. By switching to BMC CONTROL-M, they more than quadrupled the workload without adding a single staff member.  See how in this 2-minute video overview.

Go to video

Comments (3)
Login
Forgot your account info?

NO EXCUSEBy Anonymous on July 13, 2009, 11:48 amWith redundancy, emulation, and actual testing this failure could have been avoided. Rackspace was too busy taking on clients and "not" looking at the "BIG PICTURE"....

Reply | Read entire comment

Statements of note ....By Spee on July 13, 2009, 12:40 pmThe article states: "An incident review cited failure of generators to synchronize with UPS systems, and failure of switches in the electrical infrastructure, preventing...

Reply | Read entire comment

I have seen a failure like this beforeBy LinearBob on July 13, 2009, 7:01 pmI have seen a system failure very much like this once before, only then it was a UHF TV station transmitter facility that experienced it. The root cause was the...

Reply | Read entire comment

View all comments

Add comment
Anonymous comments subject to approval. Register here for member benefits.
Have a NetworkWorld account? Log in here. Register now for a free account.

Videos

rssRss Feed