Skip Links

Network World

  • Social Web 
  • Email 
  • Close

It isn't smart to rely on SMART

By Chris Mellor , TechWorld , 02/21/2007
Newsletter Signup
  • Share/Email
  • Tweet This
  • Comment
  • Print

Google research has shown that built-in disk drive diagnostics only predict about half the drive failures that occur.

Modern disk drives have a built-in self-test and diagnostic facility termed Self-Monitoring, Analysis and Reporting Technology -- SMART. The drive firmware monitors a range of drive parameters, things like the number of seek errors and the disk spin-up time. If these parameters degrade over time it may indicate the unit is heading for a breakdown. With advance warning of an impending disk failure you will have a chance to move files and/or replace the unit before you lose any data.

Google's study looked at more than one hundred thousand disk drives which were a combination of serial and parallel ATA consumer-grade hard disk drives, ranging in speed from 5400 to 7200 rpm, and in size from 80 to 400 GB. The observed range of annualized failure rates varied from 1.7 percent, for drives that were in their first year of operation, to over 8.6 percent, observed in their third year.

The Google researchers found that SMART diagnostics are not as useful as they are supposed to be. They note that there is little independent research into drive life and diagnostics, stating 'Most of the available information comes from the disk manufacturers themselves. Their data are typically based on extrapolation from accelerated life test data of small populations or from returned unit databases.'

They note 'detailed studies of very large populations (of hard drives) are the only way to collect enough failure statistics to enable meaningful conclusions. In this paper we present one such study by examining the population of hard drives under deployment within Google’s computing infrastructure.' Google has 'built an infrastructure that collects vital information about all Google’s systems every few minutes, and a repository that stores these data in time-series format (essentially forever) for further analysis.'

The researchers mined this data and analyzed it looking for correlations between hard drive sensor and SMART readings and failure events. Their findings were:

-- Very little correlation between failure rates and either raised temperature or activity levels.

-- Some SMART parameters (scan errors, reallocation counts, offline reallocation counts, and probational counts) have a large impact on failure probability. Others do not. Out of all failed drives, over 56 percent of them had no count in any of these four strong SMART signals.

  • Share/Email
  • Tweet This
  • Comment
  • Print
Partner Content

NetScout and analyst Jim Metzler have teamed to deliver a series of IT Briefs on Network and Application Performance Management leveraging research from NetScout's nGenius & Sniffer users.

www.netscout.com

Metzler on Service Delivery Management

Delivering IT business value by evolving our thinking from managing application performance to focusing on services.

Learn More

2009 Handbook of Application Delivery

Successful IT organizations must know how to make the right application delivery decisions in these tough economic times.

Download the Handbook

Metzler on the Modern IP Network

Discusses the growing emphasis on network management and the need to implement a holistic view of the end-to-end experience of the user.

Read the Brief

Comment
Login
Forgot your account info?
Add comment
Anonymous comments subject to approval. Register here for member benefits.
Have a NetworkWorld account? Log in here. Register now for a free account.

Videos

rssRss Feed