Skip Links

Network World

  • Social Web 
  • Email 
  • Close

Troubleshoot to repair, or predict and prevent?

By Steve Henning , Network World , 06/10/2008
This vendor-written tech primer has been edited by Network World to eliminate product promotion, but readers should note it will likely favor the submitter's approach.
Newsletter Signup
  • Share/Email
  • Tweet This
  • Comment
  • Print

It sounds simple. Instead of spending hours or days troubleshooting an application slowdown or system outage, why not just avoid it to begin with?

Until recently, the only way for IT organizations to resolve problems was to sift through alerts, log files and trouble tickets and burn the midnight oil on conference calls. Today, powerful analytics and automation capabilities built into system management tools can help organizations identify and resolve issues before they become problems.

Interconnected business services have made management exponentially more difficult. Collecting more data isn’t the answer because:

* Monitoring static thresholds triggers a flood of alerts, most of which do not represent actual problems.
* Problems are identified by groups of abnormal behaviors, not a solitary metric.
* With tens of thousands of devices and millions of metrics, the correlation effort required to identify problems is impossible.

This deterministic approach is not only ineffective but also cannot scale to accommodate increasing complexity. Highly complex service infrastructures demand a new approach, a probabilistic approach.

Intelligent system-management solutions now employ sophisticated correlation algorithms to sample subsets of metric data and deliver accurate information about potential system behavior. In addition, new learning technologies continuously refine alert thresholds — providing dynamic thresholds that recognize and accommodate the normal ebbs and flows of business. A probabilistic approach allows organizations to solve problems faster and with far less manual effort.

Intelligent management solutions integrate with existing monitoring infrastructures, automatically collecting and analyzing metrics from across all tiers of an application — such as Web server, application server and database tiers.

The first job for the intelligent management solution is to learn the normal behavior of the application. It should be possible to build behavior models for each resource in your infrastructure by using dynamic thresholding algorithms to continuously collect data. This makes it possible to compare the real-time measurements of metrics with the expected range of values to determine when a metric should trigger a threshold violation.

  • Share/Email
  • Tweet This
  • Comment
  • Print
Partner Content

NetScout and analyst Jim Metzler have teamed to deliver a series of IT Briefs on Network and Application Performance Management leveraging research from NetScout's nGenius & Sniffer users.

www.netscout.com

Metzler on Service Delivery Management

Delivering IT business value by evolving our thinking from managing application performance to focusing on services.

Learn More

2009 Handbook of Application Delivery

Successful IT organizations must know how to make the right application delivery decisions in these tough economic times.

Download the Handbook

Metzler on the Modern IP Network

Discusses the growing emphasis on network management and the need to implement a holistic view of the end-to-end experience of the user.

Read the Brief

Comment
Login
Forgot your account info?
Add comment
Anonymous comments subject to approval. Register here for member benefits.
Have a NetworkWorld account? Log in here. Register now for a free account.

Videos

rssRss Feed