• United States

Operational best practices

Jun 29, 20043 mins
Data Center

* Tips for keeping data centers running smoothly

Sometimes it’s the little things that matter most. We technologists tend to focus on “gee-whiz” new products and services that allow us to do novel and interesting things – but many times, it’s the “boring” operational practices and processes that make the difference between success and failure.

Herewith, a few data center best practices and insights that have been acquired the hard way:

* Test, test, and test again.

One of my favorite statistics is from a Nemertes benchmark in which we asked IT executives how important they thought application performance was. Seventy percent told us performance was “critical” or “important.” Then we asked how often IT execs benchmarked their application performance. You guessed it – more than 60% said “never.” Is there a disconnect here?

If you’re wondering what to test, here’s a short list: application performance, server performance, network performance and availability, storage performance and availability – and don’t neglect regular testing of the back-up systems.

True story: An IT executive managing a large, public, highly visible Web site had done an outstanding job stress-testing the site and was confident the servers could handle the heaviest possible traffic loads. What he forgot, however, was to subject the back-up data center servers to equivalent test rigor. You can guess the rest: an outage hit, the site failed over – and went belly-up.

* Periodically review the power, cooling, and other physical infrastructure and processes.

Another true story: during a recent heavy downpour, an IT executive walked into his data center – and found several inches of water under the flooring. Seems the drainage pans for the air conditioners couldn’t handle the flood, and water backed up into the data center. While you’re at it, don’t forget to review the back-up generator, and make sure the processes for refueling and backup are well understood and make sense.

Apparently one of the main telco switching centers that failed during 9/11 went down not because of the attack, but because one of the techs responsible for refueling the back-up generator had been instructed to turn the generator off if the fuel reached below the halfway point. After the attack, the switching center failed over to the back-up generator, which began consuming fuel. When the tech recognized the fuel had dipped below the halfway point, he shut off the generator, as previously instructed, only to discover it was impossible to get back to ground zero to turn it on again.

* Don’t forget about physical security.

One of the best practices regularly highlighted at our data center tech tours is to ensure the physical security of your servers – particularly if you’ve outsourced any portion of your data centers. Stories abound about security breaches that could have been prevented by the simple application of a lock and key.