Skip Links

Reducing maximum time to resolution: The key to decreased downtime and increased savings

By Tim Nichols, VP, global marketing, Endace, special to Network World
November 16, 2012 04:06 PM ET

Network World - This vendor-written tech primer has been edited by Network World to eliminate product promotion, but readers should note it will likely favor the submitter's approach.

Network downtime is an inescapable fact for networks of all sizes, and all the prevention and detection tools in the world won't allow analysts to quickly solve the problem. What's needed are tools that will pinpoint the root cause of the problem and determine the appropriate steps to solve it. This isn't to say that protection and detection tools aren't necessary. Obviously they are an important part of overall network health, but they should be a part of the solution, not THE solution.

An emerging sector of network technology addresses response and root cause through full-packet capture, which allows analysts to drill down to the epicenter of the incident and drastically shorten the amount of time required to solve a network's most difficult problems. By shrinking maximum-time-to-resolution (MAX-TTR) -- which requires a shift of focus to response and root cause -- organizations can unleash savings that will show a dramatic difference on the bottom line.

BEST PRACTICES: Network packet brokers increase visibility and performance

MTTR vs. MAX-TTR

The ultimate goal of dedicating resources to response and root cause is the reduction of time-to-resolution (TTR), which is the amount of time it takes to correct a network anomaly. Doing so requires 100% packet capture, which offers clear, historical network visibility. If analysts can quickly retrace each step of the problem, guesswork is all but eliminated and they can expedite efforts to repair.

Organizations, however, tend to make a common mistake when attempting to reduce TTR: They focus on mean-time-to-resolution (MTTR). While knowing the average amount of time required to repair network anomalies can be useful, it doesn't tell the whole story. Cutting MTTR from four hours to three hours and 50 minutes is irrelevant, for the most part. The area where full-packet capture offers the most "bang for the buck" can be found in reducing MAX-TTR. That's where a real impact can be made relatively easily.

For most organizations, the majority of incidents are clustered around the four-hour mark, but there are a smaller number of events that can take days and weeks to fix. While not as frequent, they cause the most network downtime and cost the most to repair. Because the technology to rapidly zoom in on where the particular issue was reported or alarmed and identify exactly what happened exists, organizations can drastically reduce the length of the network's most frustrating fixes.

If an organization can drop its MAX-TTR from 24 hours to four hours, it will not only reduce the mean TTR, but it will shrink the amount of resources required to deal with the problem. Less downtime equals greater savings and better network uptime.

The old model is broken

There's nothing fundamentally new about full-packet capture. For as long as networks have been around, operational teams have been "sniffing" packets for diagnostic and troubleshooting purposes. In the past, network recording was reactive, responding to a problem of some description by deploying a recording device -- typically a laptop attached to a span port on a router or switch -- to get a trace file.

Our Commenting Policies
Latest News
rssRss Feed
View more Latest News