Google research: Cranking up the heat may not harm your disk drives
Study of Google's computing infrastructure contradicts previous research on drive failures.
By
Jon Brodkin
,
NetworkWorld.com
, 02/26/2007
- Share/Email
- Tweet This
- Print
Temperatures exceeding 100 degrees Fahrenheit may not be damaging to disk drives, according to new research by Google engineers which casts doubt on previous findings linking heat to elevated failure rates.
After studying five years worth of monitoring statistics from Google’s massive data centers, researchers say they could find
no consistent pattern linking failure rates to high temperatures or high utilization levels. Temperature, they write, is often
called the most important environmental factor affecting disk drive reliability.
“This is a fairly surprising result, which could indicate that data-center or server designers have more freedom than previously
thought when setting operating temperatures for equipment that contains disk drives,” write Google engineers Eduardo Pinheiro,
Wolf-Dietrich Weber and Luiz Andre Barroso. “We can conclude that at moderate temperature ranges it is likely that there are
other effects which affect failure rates much more strongly than temperatures do.”
The Google researchers are more optimistic about the impact of heat on computer systems than a Forrester Research analyst
who, in a Webinar for IT professionals last month, said the increasingly fine features of new chips must be protected by lowering
maximum operating temperatures.
The Google research, presented this month in San Jose, Calif., at the 5th USENIX Conference on File and Storage Technologies, examined data center performance at temperatures from 15 to 45 degrees Celsius, or 59 to 113 degrees Fahrenheit.
They found negative effects from high temperature only for the higher end of the temperature range (104 degrees Fahrenheit
or more) and even at those temperatures the negative effects were only observed for drives at least 3 years old.
By contrast, a software and hardware manufacturer known as AVTECH Software says the “optimal” temperature range to maintain data center reliability is between 68 and 75 degrees Fahrenheit.
The Google engineers do report seeing a “modest increase” in failure rates at the lowest end of the temperature distribution
they studied.
The engineers did not see a consistent correlation between high utilization and high failure rates, a finding they say also
contradicts previous literature on the subject. Frequent utilization seems to lead to problems in drives that are less than
a year old, and also in drives that are at least five years old, but not in drives that are in the middle of the age range,
they found. This may happen because drives that perform poorly when utilized often do not survive past their first year.
More than 90% of new information produced today is stored on magnetic media, mostly hard disk drives, according to an estimate
cited in the Google paper. Drive manufacturers say yearly failure rates are below 2 percent, but user studies have found rates
as high as 6 percent, the paper states.
The Google researchers did find several measures useful for predicting drive failure. The measures, known as SMART (self-monitoring
analysis and reporting technology) parameters, include scan errors, which are reported as drives scan the disk surface in
the background.
Partner Content
www.bmc.com
Gartner 2009 Magic Quadrant for Job Scheduling
Gartner has positioned BMC CONTROL-M in the Leaders Quadrant of their "2009 Magic Quadrant for Job Scheduling." The report assesses the ability to execute and completeness of vision of key vendors in the marketplace. Read a full copy today, courtesy of BMC Software.
Download whitepaper
Dell's SMART Approach to Workload Automation
Read a compelling case study by EMA, Inc. to learn how Dell uses BMC CONTROL-M to cut cost and increase productivity with workload automation.
Download whitepaper
Workload Automation Cost Savings 2 Minute Video
A major computer manufacturer uses BMC CONTROL-M and just four people to schedule and run over 85,000 jobs every month. By switching to BMC CONTROL-M, they more than quadrupled the workload without adding a single staff member. See how in this 2-minute video overview.
Go to video
Comments (4)
Electronics usually fail before the hard driveBy Anonymous on February 26, 2007, 11:31 pmIn reference to your article: Cranking up the Heat May Not Harm Your Disk Drives. With 40 years of experience in electronics, I'm not surprised that the study...
Reply | Read entire comment
Electronics usually fail before the hard driveBy Tony Upchurch on February 27, 2007, 1:04 pmI'm curious to know more about the hardware that you used to conduct your tests. I am located in the same data center as Google and I see the servers that are rolled...
Reply | Read entire comment
Hard Drives Only A Smal Part Of EquationBy Anonymous on March 15, 2007, 10:34 pmI agree that the hard drives are only a small part of the equation. For truly useful statistics, Google should take their tests a step further and test how higher...
Reply | Read entire comment
After reading the theBy Anonymous on April 30, 2007, 5:06 amAfter reading the the research link, the temperatures given make sense. However, at first glance one could confuse the Google numbers of 59-113deg to be datacenter...
Reply | Read entire comment
View all comments