Watching the WAN
Service-level monitoring tools help IT executives better manage their connections and work with carriers to fix problems quicker.
|
|
|||
|
|
A growing number of service-level monitoring platforms combine versatile probes and agents with a database and reporting infrastructure, enabling users to gather data from a variety of sources and "normalize" it so it can be analyzed, sliced, diced and presented in reports. Such products include Concord's Network Health, DeskTalk's Trend, InfoVista's VistaViews, Lucent's VitalNet and NextPoint's S3. First Union is now implementing InfoVista's VistaViews. The bank chose the product because of its customization features: "You can poll routers once an hour or once a minute" - and the broad range of sources it monitors, Spears says. In addition to SNMP and RMON MIBs, VistaViews can gather data from proprietary sources such as Cisco's Service Assurance agent, any management tool that generates flat files or an Open Database Connectivity-compliant database.
Able agents
The next step for the bank is to deploy end-to-end service-level management. There are now several desktop agents on the market that can monitor end-to-end performance and latency for specific protocols and applications such as Microsoft's Exchange and SAP R/3. First Union is evaluating First Sense's Enterprise and Lucent's VitalAgent desktop agent. There are two types of agents to choose from: passive and active. Passive agents are installed on the client and monitor whatever traffic the user generates. For example, Lucent's VitalAgent sits on desktops and monitors specific types of transactions, such as an HTTP or SQL database query. Server software collects the data and determines the source of the problem. First Sense's Enterprise is a similar type of passive agent. An active agent measures response time by simulating application transactions, generally at regular intervals. For example, Response Networks' ResponseAgents query servers, measure response time, and then perform pings and other basic tests to pinpoint sources of problems. The tests are initiated by middleware entities called Domain Controllers. Users view the collected data on the Response Service Explorer console. All three products are components of the Response Center Suite, which costs $50,000 and up. Active and passive agents have potential drawbacks. Because passive agents must wait for the user to generate specific traffic, they don't work when users aren't at their desktops. For example, it would be difficult to use the agents over the weekend to test whether you adequately fixed a network problem that surfaced Friday afternoon. Desktop agents, on the other hand, depend on accurate information about the precise applications a computer runs in order to function effectively. This forces IT to perform lots of upfront discovery work and then check back regularly to see what's changed. Some companies are waiting for the agent technology to mature before plunging in. "We've chosen to wait it out and probably jump over the current technology," says Bob Uhl, director of network technologies for Ernst & Young in New York. Once the professional services firm has finished moving most of its desktops to browser-based software, it may be possible to set up client-based response-time reporting through applets, he adds. As an application service provider (ASP), Equant has a strong interest in monitoring customer service levels all the way to the desktop - but doesn't expect to accomplish this in a hurry. "The service-level management market is very fragmented. I don't think this will be a one-tool decision," says Anita Folk, a spokeswoman for the Atlanta company. There are also logistical challenges associated with implementing software on all those desktops. Privacy is a concern when a company needs to install software on the desktops of partners or customers. And if you're an ASP, there are some serious scalability issues. "We deal with multiple customers. Just how many desktops are we talking about putting agents on?" Folk asks. "And do we ask users to standardize their applications so we can monitor them?" In addition to potential difficulties with the agents, businesses - particularly ASPs such as Equant - are wondering how easy it will be to gather and correlate the data the agents deliver. "We'll need some kind of server engine," Folk says. Make that a very scalable engine.Collective coordination
Many companies want a WAN management platform that not only collects client response-time data from client agents, but also correlates it with network performance and availability data generated by RMON and SNMP probes, DSU/CSU agents, and other service-level monitoring tools. There have been some promising developments. Concord, for example, recently acquired Empire, which sells active monitoring agents, and First Sense, which sells the passive agent Enterprise. Concord has promised to integrate the tools into its Network Health suite, although it hasn't yet announced a time frame. Lucent's VitalSuite 7.0 provides a single infrastructure for collecting and reporting on data from VitalAgent, the desktop client agent, and VitalNet, the SNMP-based WAN and LAN monitoring tool. Visual Networks is working to integrate Visual UpTime with two products the vendor recently acquired: Avesta's Trinity, which correlates service-level alerts and other events to determine the source; and Inverse's IP Insight, a client-based agent that monitors latency primarily over access lines. Meanwhile, a working group within the Internet Engineering Task Force is developing an Application Performance Measurement MIB. The MIB will provide standardized definitions for key information associated with measuring end-to-end application performance over a network, says Steve Waldbusser, chief strategist at Lucent's VitalSoft division in Sunnyvale, Calif. Network managers will be able to gather data from different vendors' agents then merge it with other SNMP-based data into reports. The standard is scheduled to become stable enough for vendor implementation in about a year, Waldbusser says. As service-level monitoring tools become more powerful and widely used, the question arises as to whether the information they provide will pit corporate network managers against their carrier counterparts. Will customers use such tools to try to catch carriers breaching SLAs? Not necessarily. While companies are definitely using service-level monitoring tools to check if carriers are meeting SLA metrics, several people emphasized that they see little advantage in treating their carriers as adversaries. "We could run Visual UpTime reports and see if we come up with the same numbers our carriers did, but it's a cumbersome process," Reynolds Metals' Shashaty says. Besides, she points out that the penalties a carrier pays for breaching the SLA don't even come close to matching the business cost of downtime. "Anyway, we don't want our money back, we want quality of service," she says. A more fruitful way to use such tools, some IT executives suggest, is working collaboratively with carriers to deliver better service. Shashaty notes, "We use our tools to help carriers meet high levels of availability and fast restoration, rather than waste a lot of time proving they don't." Chuck Williams, First Union's senior vendor relationship manager, shares the same goal. "I envision us going to monthly review meetings and providing metrics, accurate information we could use to determine whether they're fulfilling their SLAs and to identify the source of a breakdown quicker," he says. "We just want to take some of the management responsibility so we can give useful information back to the carrier and be true partners."
|
Horwitt is a freelance writer and consultant in Waban, Mass. She can be reached at ehorwitt@world.std.com . Related links
The service provider alternative
Don't feel up to doing your own service-level monitoring? There are several options to check out.
SLA enforcement tools to the rescue
Visual UpTime wins Blue Ribbon Award for accuracy and reporting features.
Scouting the network
NetScout Manager Plus gave us more than 40 integrated baselining and trending tools along with customizable, useful reports.
Interactive scorecard and NetResults

Reynolds Metals decided it needed a service-level monitoring tool shortly after it migrated from dedicated leased lines to frame relay, and users and applications began contending for bandwidth.
In the reports AT&T and MCI WorldCom were giving Reynolds Metal, for example, "The information was averaged over 15-minute intervals and was very after-the-fact: There was no way to see what was happening at any given time," Shashaty says. Moreover, standard carrier reports don't break down bandwidth usage by protocol. "We needed [to put] our own eyes into the network," she says.