How to avoid a data center overrun with idle servers

A third of data center servers are not productive. Here's why the problem exists, and what to do about it.

hp data center
Credit: Reuters/Stephen Lam

You've undoubtedly read, or at least seen the articles talking about "comatose" servers, servers in data centers that don't do any work and just sit idle. A study from Stanford University professor Jonathan Koomey and Jon Taylor, a partner at the consulting firm Athensis Group found that up to 30% of all physical servers in data centers do nothing all day long and no one notices.

This is not a new discovery; it has been around for several years. In 2008, McKinsey & Co. released a similar study, finding that up to 30% of servers in data centers were as they put it "functionally dead." The Uptime Institute issued a similar report in 2012, finding around 30% of servers to be idle and not working.

So why does this problem continue to go unaddressed? Two reasons: the IT group does not have responsibility for the electric bill and IT does a lousy job tracking ownership of the servers once deployed. It buys the servers but doesn't pay the electric bill or keep a proper inventory and that allows zombies to proliferate.

"It's a case of management and a challenge to management and an issue of lack of incentives. In a lot of cases IT people are incentivized to keep things up and running and aren't paying the power bill. They have no incentive to lower their power bill," said Taylor.

Aaron Rallo, CEO of TSO Logic, echoed the sentiment that it's a management problem. "I've built data centers and needed them for every business I've run. When you think about business priorities, I think a lot of executives view their data center as a process that they can't do anything about and let the data center slide from an expense perspective," he said.

"A CEO can tell you to the dollar how much they spend on labor, on marketing, on sales, but when it comes to the data center, which is often the largest expense, they don't have any idea," he added.

How does this happen?

If this news surprises you, it should not. Think about it. If you have ever walked through a data center, did you ever stop to wonder if the servers in the racks were all being used?

"Nobody's looking for it. It hasn't been easy to look for. There has been no easy way to find out. IT is not inclined to shut down boxes if they don't know who owns it," said Taylor. He added that thanks to less than ideal record keeping, sometimes the only way to find out what a server does is shut it down and see who screams.

Alastair Winner, vice president of technology services, compute at HP, said there are multiple ways servers get forgotten.

"The reality for most enterprises is they don’t write their apps, they have many apps for many workloads that require different servers and solutions.There is undoubtedly going to be a level of inefficiency built into the system," he said.

For example, some companies deliberately build in excess capacity and have systems standing by to take over when greater load capacity than usual is required. So they might overprovision for peaks and demand.

Then there's Shadow IT, the people who go into business for themselves within a company and skirt official company policy on IT. Some departments don’t utilize official enterprise channels and buy their own machines. And then there's merger and acquisition activity, where redundant systems might be set aside but not actually shut down.

John Abbott, director of advanced technical services at Centrilogic, a cloud services provider that also does migrations, runs into this all the time, especially now that he's doing Windows Server 2003 migrations. He said about 75% of his customers have at least some servers not being used, and that the larger the data center, the more likely there are things running no one knows about.

"What we find with these discoveries is where a server was set up and the app owner changed their mind or the app never got approval or they are proof of concept servers that never do anything," he said.

In his experience, zombie servers tend to be older most often because the person(s) using it have moved on one way or another and the server was never repurposed or decommissioned.

"Over time, what you find is the people with the knowledge of the environment change jobs or get let go and there is no knowledge transfer. No one knows what those servers are so they don't touch them. They assume if it's there it's running in production and they don't do anything with it," he said.

One of the biggest sources of help in flushing out zombie servers has been the Windows Server 2003 end of life this past July and subsequent migrations by firms off the aged operating system. Most firms used the end of support for Server 2003 as an opportunity to take a full and complete inventory of what they have and that is rooting out the zombies.

"The Windows Server 2003 project is straightening a lot of this out because people are analyzing their current inventory so that at the end of this they have a compete inventory," said Abbott. "Server 2003 helped expose this because otherwise no one would look at them. If they aren’t broken people aren't going to address them."

How to prevent the problem

Winner said that IT needs to start keeping much better track of what it has. "The key here, the way we solve this, is truly to have a very well thought through asset management approach that looks at the physical location of the server and understands where they sit in the operation of the environment," he said.

The other part of the solution is ownership. Part of the criteria for the purchase of servers or racks should be the total cost of ownership, which includes power and cooling requirements of that device. When the IT department gets the electric bill, it will become much more militant about keeping all servers running at maximum efficiency.

Abbot's view is somewhat similar, in that data center operators need to keep a decent inventory. "They need to be more stringent in their change control and documentation process so when they stand up a new server, they keep track of their assets. What they do after that is up to them. We can tell them they need to be more stringent and document what they are doing but at the end of the day it's up to the companies," he said.

Asset management needs to become a management priority and needs to be brought onto the radar of management and the CFO, said Taylor. "This is the kind of thing a CFO would love. If you went to them and said 'I can reduce your operating costs,' do you think they would listen? But you need to have the information to support that," he said.

Rallo said the data center must be viewed as part of the supply chain of a firm and with the same sharp eye as every other part of the chain. "Walmart looks at every part of their supply chain. All of its executives are focused on delivering product on time and reducing cost. If we started thinking about the data center as part of their supply chain, the change is going to come," he said.

This story, "How to avoid a data center overrun with idle servers" was originally published by ITworld.

To comment on this article and other Network World content, visit our Facebook page or our Twitter stream.
Related:
Must read: Hidden Cause of Slow Internet and how to fix it
Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.