The ITworld.com Network   Search ¦ Sites ¦ Services ¦ ITcareers
Search and DocFinder
 
Search help/advanced search

 



News NetFlash: Daily News Internat'l News This Week in NW The Edge Net.Worker Features Research Buyer's Guides Reviews Technology Primers Vendor Profiles Forums Columnists Knowledgebase Help Desk Dr. Intranet Gearhead Careers Free Newsletters Subscription Center Seminars/Events Reprints/Links White Papers Partner with Us Site Map Contact Us Awards Corporate info Home

Special Advertising Section

Informed Management Through High-Quality Data Sources

White Paper by Russ Currie

Netscout Systems, Inc.

Contents

  • Informed Mangement Through High Quality Data Sources
  • What are the Functions of Infrastructure Performance Management?
  • Understanding Data Sources
  • What Data Sources are Available?
  • Matching Data Sources to Infrastructure Performance Management Functions
  • Conclusion

Today, more than ever, business is conducted in the electronic medium over company networks and intranets as well as Web sites and the Internet.  This is an environment of private, outsourced, and Internet-based infrastructures mixed with a complex assortment of networked devices, topologies and applications. The importance of managing the IT infrastructure cannot be overemphasized. 

Infrastructure Performance Management (IPM) is a way to measure and report on the infrastructure’s ability to perform and meet its service-level objectives. IPM manages the three key components of the infrastructure: the application environment, the computing environment, and the network environment. It optimizes performance to meet end user demands for availability and reliability while operating more cost-effectively.

While many solutions may aid in this effort, not all can provide the right information to the right person with enough speed and efficiency. This document addresses four fundamental aspects of infrastructure performance management and the data sources that empower them.

What are the Functions of Infrastructure Performance Management?

While managing the infrastructure’s performance involves many activities, this paper focuses on the four most important functions:

            •           Real-time Troubleshooting

            •           Capacity Planning

            •           Service Level Management

            •           Usage-Based Billing

Each of these functions has a distinct goal, generates measurable tasks and is often performed by discrete individuals or groups. The following is a brief definition and explanation.

Real-time Troubleshooting

Real-time troubleshooting is the problem/repair function of network management. Always urgent, this activity is usually driven by calls to the Help Desk or by alarms sent automatically to an Umbrella Management Station (UMS) such as HP OpenView or MicroMuse NetCool. When a portion of the network fails, or application performance degrades, it becomes a crisis that drives network and system managers to catch failures immediately and repair the problem as quickly as possible. If the infrastructure is well managed, IT can also investigate insidious performance degradations in time to prevent a greater impact on the business.

Capacity Planning

Capacity planning involves reporting on and forecasting the infrastructure resources that are required to keep the business running at peak efficiency, both short-term and long-term. Although it is most often associated with bandwidth, capacity planning can also be used for network hardware (e.g. router capacity) and application servers.

Planning ahead to ensure that each part of the infrastructure has the right amount of resources at all times is essential to the success of the business. Demand for increased bandwidth and additional network equipment and/or computing devices often grows as new applications are added to networks and extra features are loaded on existing applications. Because equipment is often expensive and requires long lead times to acquire and deploy, IT managers must plan effectively for future capacity to maintain reliable operations while staying within the capital budget.

Service Level Management

Service level management is the function of measuring and reporting on services (applications or content) delivered via the network. Whether network application services are outsourced or delivered in-house, measuring and reporting on service levels is important to managing the infrastructure.

Service levels address the expectations of users and the businesses that rely on the infrastructure. In most cases, service level expectations are not articulated, documented, or measured. To be effective, IT must create the service levels they will commit to delivering and measure them based on actual network usage. Specifically, service levels must address the natural perspective of the user community, which includes both the availability and responsiveness of applications.

Usage-Based Billing

Usage-based billing is the function of tracking and (potentially) charging for the use of infrastructure resources. While the concept of charging for connect time is well accepted, the idea of charging for the use of a specific service is relatively new but enhances IT’s ability to influence user behavior. For example, when users are charged for the volume of traffic that they generate on the network, systems managers can account for the costs of the infrastructure equipment required to support it. In an ideal scenario, the rate that users are charged varies based on the rules and values that the business applies to a specific service.

While this function may seem to apply primarily to service providers, enterprises can gain great value from a usage-based billing solution. In most cases the cost of deploying and operating the infrastructure is hidden from much of the business. By tracking and reporting on the costs associated with delivering a level of service on the infrastructure, IT can justify resources and promote "good network behavior."

Understanding Data Sources

Infrastructure management relies heavily on the Simple Network Management Protocol (SNMP), which enables the umbrella management system to communicate with any network equipment that supports SNMP. These devices run agents that write information to a Management Information Base (MIB).

The network management system polls the equipment with SNMP and retrieves data from the MIB. This information is used to perform the network management functions discussed earlier. The MIBs store a variable amount of information.

Nearly all devices that support SNMP store information in the MIB II format, which provides the framework for storing data about the devices as well as varying levels of performance information. At its most elemental, MIB II provides basic system configuration information (the type of equipment, uptime, network interfaces and rudimentary traffic data) and alerts the umbrella management system with messages, called traps, when predefined conditions occur.  Private extensions allow equipment and software manufacturers to add their own unique management information.  These too are often limited to configuration information (the number and types of interfaces, versions of software, etc.), however, and do not reflect the performance of the device.

An extension to MIB II is RMON (Remote MONitor), which was specifically designed to manage networks. This MIB contains detailed information on the flow of application traffic across the network and the status of the actual network segments. An RMON data source continually monitors the network, and probes are the most effective of these devices.

What Data Sources are Available?

Three broad types of data sources manage infrastructure performance. They provide:

            •           Information from infrastructure equipment,

            •           Information from desktops and application servers, and,

            •           Information from dedicated data sources – network probes.

Each of these data sources is unique and provides a distinct set of information that supports infrastructure performance management functions.

Information from Infrastructure Equipment

Because most network equipment deployed in the last several years supports SNMP, it is an excellent source of information. As mentioned earlier, MIB II provides the basic status of the network equipment. For example, an umbrella management system retrieves the type of equipment, its configuration, the number of packets moved and some basic error statistics. The network equipment also sends a trap to the network management system when a catastrophic problem occurs.

Also available are the private extensions implemented by most vendors. Because MIB II is a standard and the information it stores is generic and device-centric, many vendors have added these extensions to improve the manageability of their equipment. For example, one vendor’s product may view the version of software that is running the equipment, or the performance of unique features that do not fit the standard definitions of MIB II. While these extensions may offer great value, the umbrella management system must be configured to recognize them.

In practice, network equipment generates only the most basic information because its primary function is to move data, not to monitor it. Added functionality draws processing power and thus may impact the equipment’s performance. The effect of standard MIB II is often factored into the equipment’s performance because most vendors understand the need for manageability.

Information from Application Servers and Desktops

Many desktops and most application servers support SNMP. While the basic MIB II functionality is present, the real value comes from private extensions and special implementations of agents. These gather basic performance data on the servers and desktops but the most interesting information pertains to the applications that are running on the equipment.

Application Servers

Standard SNMP agents in application servers give insight into the basic operations of the server, items such as CPU, disk and memory usage. Private extensions provide greater insight into hardware or applications based on the specifications of the manufacturer.  An additional source of data is an agent that has been written specifically for the server or application. Usually developed by and available from a third party, these agents provide detailed insight into the performance of the application server. As most applications have some degree of management data available (usually through an application-specific console), the application server agent provides access to this often-extensive source of performance data.

PRO: The benefit of these agents is that they provide a great deal of insight into the performance of both the application and the server.

CON: The drawbacks are that few infrastructure performance management applications take advantage of this data and the agent must be current with the application being monitored.

Desktop Agents

Three types of agents measure application performance at the desktop. They are:

1.              Embedded application "hooks"

2.              Passive monitors, and

3.              Active agents

1. Embedded Application Hooks

In a few rare cases, applications have been written or re-written to measure the application’s performance at the desktop. These "hooks" watch the communications between the desktop and the application for attributes that have been defined by the application developer. The most common, although not widely used, implementation of this type of monitoring is ARM (Application Response Monitor).

PRO: The benefit of this type of solution is that its measurements are specifically tailored to the particular application.

CONs: Several drawbacks exist:

·      The application must often be re-written to support ARM because not many applications have built these hooks in.

·      ARM provides only the structure for measuring application performance, not the way in which this information is stored. Thus, the network management application must be aware of the specifics before it can generate reports.

·      As the hooks measure performance by observing user activities, they can only measure the application when users are running it.

·      Complete coverage requires distribution and collection from all the desktops being measured.

2. Passive Monitors

Like application "hooks," passive monitors are software that run on the desktop and observe the actions of the application’s users. In most cases, these solutions attempt to watch both the activities of the application on the desktop and the network activity associated with the application. Because passive monitors are independent of the application, the application does not have to be re-written but the monitor does interpret the application functions. For example, the monitor may watch the Windows environment and interpret changes as application transactions, but this may or may not accurately reflect the application’s design. In fact, it represents only the desktop’s interpretation of how the application is functioning.

PRO: The benefits of the passive monitor are similar to those of application "hooks." One key differentiator, however, is that the application does not have to be re-written.

CONs: The drawbacks are also similar to those of application "hooks." 

        • Wide coverage requires wide distribution and collecting data from many agents. This makes application  "hooks" and passive monitors generally impractical in large environments.

For example, a business with 2000 users would require 2000 instances of agent technology to cover 100% of the network and application. While a sample group can provide a statistically accurate representation of performance, the complexity of most infrastructures, combined with the unpredictable nature of users, minimizes a test group’s impact.

3. Active Agents

Active Agents that mimic a users activity provide a unique mechanism for measuring network and application performance. By mimicking the user’s activity, the active agent measures a defined set of application transactions (tasks) at regular intervals. This level of control ensures that the performance measure is well defined and exercises the application components that are critical to the success of the business.

Ideally, the active agent should exercise the actual application, application server and business processes that are critical to the business. Some solutions, however, measure the network and application in a less-than-comprehensive manner. For example, some provide no more than a basic network call to the application server (PING). Others simulate both the transaction and the application by having an agent at the application server respond to the desktop agent in a fashion similar to how the application would respond. The only viable solution is one that actually measures the business application itself.

                        PROs: An active agent that mimics user activity has many benefits.

      • PROs: It provides extensive control in measuring application performance.
      • It measures the applications that are critical to the business regardless of the activities of users. This allows IT to identify potential problems proactively.
      • It is unburdened by the variable usage patterns exhibited by users. Thus, critical business applications are measured and problems identified even if the users have yet to access this function.
      • It measures the application in terms of both the front- and back-office systems that support the business.
      • In terms of deployment, the active agent also minimizes the impact on IT staff. Because the agent simulates the user’s access to the application, agents do not have to be deployed on every desktop. A single instance of an active agent can literally mirror the performance of many simultaneous users.

CONs: The drawbacks of the active agent are:

      • CONs: That it cannot identify an individual user’s experience with the given application, and it does not manage application performance in real-time.
      • As the active agent is the "user" whose experience is being measured, it cannot identify the activities of a specific user of the application. Rather, the active agent represents the experience of a group of users colocated with the active agent, such as at remote sites, branches, or different floors of an office building.
      • As it would be impractical to run these tests every second, problem identification is limited to the time period in which the active agent runs (usually 15-minute intervals).

Information from Dedicated Data Sources

Network Probes

Probes are the only data sources that are designed for and dedicated to infrastructure management. As mentioned earlier, probes are based on industry standards for managing network technologies and identifying application traffic. In addition to these standards, some vendors extend the capabilities of probes to measure the response time of applications. Two proposed standards, Application Response Time MIB (ART MIB) and Application Performance Monitor (APM), measure response time by observing the application traffic on the network.

Placed on critical segments of the network, probes provide the scalability to manage large networks and large amounts of data. Originally designed for managing Ethernet networks, probes now support a wide range of network technologies from Gigabit LANs to ATM WANs.

In our example of a 2000-user network, we can assume that these users are spread throughout several facilities that are connected via a WAN. The application servers and Internet business are centralized at a single location   or Data Center. In our fictional network, there are at least three critical network segments where user traffic comes together to share the network: at the WAN connection for each of the remote sites, at the WAN connection(s) to the data center, and within the Data Center itself.

While it is certainly possible to put agents on each of the desktops in the network, it is impractical. It is also unnecessary. A probe placed on each of the WAN links at the remote facilities provides visibility into all of the network application traffic. Additionally, the probe provides the ability to manage the WAN link itself. With it, IT receives network application performance data, and manages a costly and critical component of the network: the WAN links.

Probes can also be used to perform troubleshooting duties. Because the probe listens to all traffic on the network segment, it captures traffic data for detailed analysis. Application traffic is decoded to find problems in the way that applications are using the network. Also, most probe management applications can monitor traffic as it passes by, which allows the network manager to view the status of the network segment almost instantaneously.

            PROs: The benefits of the probe are leveraged through its design.

      • Because the probe is dedicated to network management, it provides crucial information on the performance of networked applications.
      • These network appliances are designed to handle very large amounts of data and can literally monitor hundreds of thousands of connections.
      • Placed on critical network links, probes provide real-time access to performance data. They monitor applications traversing these links for basic information such as number of users and traffic mix. At the same time, they continually monitor the quality of the network segment.
      • Additionally, probes with ART MIB or APM provide real-time application response time monitoring.

            CONs: The drawbacks of probes are also related to their design.

      • As they work directly on the network, probes must be connected to the specific network type for which they were designed.
      • Additionally, a probe can only monitor traffic that traverses the network link on which it is placed. Thus, the network designer must determine the appropriate locations in which to install probes.

Matching Data Sources to Infrastructure Performance Management Functions

As we have seen, several options exist for gathering information on network and application performance. Each data source possesses unique characteristics that alone cannot provide a complete solution. Thus, it is important to match these data sources with the value that they add to infrastructure performance management functions.

Real-time Network Management

Real-time network management is primarily a problem/repair operation. When a problem interrupts service, the network must be restored to normal operation immediately. While repair is often driven by trouble tickets or device failure alarms, the ideal scenario is alerting IT to service degradation before a catastrophic failure occurs. To achieve this, IT must put in place a proactive monitoring solution. Probes provide the ideal mechanism for monitoring the network and applications in real time.

  • As the probes constantly listen to the network and applications, they can be configured to watch for potential problems. For example, probes can watch for increases in the amount of traffic on a network link. When traffic reaches a level that may impact users, the probe immediately sends an alarm to the umbrella management system so the network or application manager can take action.
  • When using probes that support application response time monitoring, the network and application managers can also track and alert on the responsiveness of applications. Ideally, the probe should be able to recognize how applications are used on the network. For example, when monitoring a Web-based application, the probe should track the responsiveness of URLs. This allows IT to manage the applications proactively in terms that are important to the user.

Probes provide the only viable solution for real-time network and
application performance management.

Capacity Planning

Capacity planning depends on the ability to know how infrastructure resources are being used today, and projecting when they will be exhausted. Obtaining capital for new equipment purchases or ordering additional bandwidth can be a complex process with very long lead times. Attempting to plan resource requirements through educated guesswork opens the door to failure or performance degradation.

A capacity planning solution must be able to monitor the existing environment and forecast trends based upon information with robust data sources. Visibility into raw utilization numbers is good, but it is ideal to understand who is using the network and how.

  • Network equipment provides valuable information on raw performance. For example, a router can tell you the utilization of a connected WAN circuit. However, the router cannot tell you what applications are using that circuit.
  • Probes provide insight into the applications on the network. Through a probe, the capacity planner can understand if the traffic that is driving the need for more resources is truly business critical.

The combined information from probes and network equipment facilitates informed decision-making. Growth of the infrastructure can be restricted to what is essential for the success of the business, resulting in less waste when buying costly equipment and bandwidth. Additionally, forecasting when resources will be required ensures that resources will not sit idle, or be bought at a premium to fix an emergency.

Combining the raw information from network equipment with the detailed usage information from probes makes the ideal data source for a capacity planning solution.

Service Level Management

Service level management is perhaps the most discussed, yet least implemented of the infrastructure performance management solutions. Because service level management reflects directly on the ability of the network and applications to support the business, an SLM solution must measure infrastructure performance in the terms of the business. It has two parts:

  • Availability: Determines whether an application or network service is available for the user.
  • Response Time: Measures the performance of the service.

To effectively measure the business transaction, the SLM solution must reflect the user’s actual activity. While a passive monitor can measure the performance of a user once he has accessed the service, it cannot ensure that the service is available for him. Just knowing that the service is available, however, is insufficient if the performance is unacceptable.

Thus, a solution must address both of these issues and Synthetic Transactions™ provides this capability. By measuring the performance of the network service continually, active agents provide both an availability metric and a performance measurement. As it can be deployed anywhere in the network and mimic an actual user, or multiple users, active agents provide the necessary measurement of business processes.

Additionally, as the active agent measures performance regardless of whether a user is present, potential problems can often be discovered before they impact the user community.

Active agents that mimic a user’s activity provide the foundation for an effective Service Level Management solution.

Usage-Based Billing

Usage-based billing raises organizational awareness to the expense of running the infrastructure and provides the means for IT to recover the costs by educating users and creating incentives for change. Currently, IT levies connect-time costs and/or flat fees on users to recover or account for these costs. A usage-based billing solution extends this approach and makes it more effective by addressing the actual use of the infrastructure.

As the name implies, usage-based billing mandates accounting for the actual use of infrastructure resources. While this information may be available from some of the application servers in the infrastructure, it may be difficult or impossible for IT to collect this information. The application environment is as large as the desktop environment and usually more complex, involving multiple tiers of servers and applications. Additionally, a growing amount of business is done outside of IT’s control via the Internet.

  • To be effective, a usage-based billing solution must monitor many thousands of conversations as well as traffic that may exceed the boundaries of the infrastructure. The best data source for tracking this information is a probe.
  • Probes can monitor literally hundreds of thousands of conversations and attribute them to a specific application. The usage-based billing application can thus attribute infrastructure use to specific users and the applications they are using.

Probes provide the full visibility into the infrastructure that is required for an effective usage-based billing solution.

Conclusion

Data sources matched to infrastructure performance management function

 

DATA SOURCE

     
 

Infrastructure Equipment

Probes

Active Agents

Real-Time

 

X

 

Capacity Planning

X

X

 

Service Level Management

   

X

Usage Based Billing

 

X

 

The growth, complexity and importance of corporate infrastructure show no signs of slowing in the near future. Although significant money has been spent on expansion in the past few years, infrastructure management continues to be an afterthought. For the success of the business, it is imperative that IT management implements infrastructure performance management to ensure that businesses and infrastructures alike perform to expectations.

Crucial to these deployments are the data sources that inform them. IT’s primary concern must be the quality of the data that is used to manage the infrastructure. Because management decisions based on incomplete or inaccurate data can cost time, money and opportunity, an infrastructure performance management solution should allow IT to control the infrastructure effectively and provide unmatched visibility into how the infrastructure is used.

About NetScout Systems

NetScout Systems, Inc. (NASDAQ-NTCT) is the leading provider of infrastructure performance management solutions for large enterprises, e-businesses and service providers worldwide. Our products help organizations increase the return on their infrastructure investments by optimizing not only the performance of their networks but also the networks’ ability to deliver applications and content to end-users. 

The nGenius‘ system collects data from our proprietary active agents, award-winning probes, and network devices. The accuracy, timeliness, and robustness of the information produced, provides end-to-end network visibility—both in real-time and historically—for better network and application control. This comprehensive approach enables critical business applications such as e-commerce, supply chain management, ERP, and CRM to run smoothly and reliably.

NetScout’s achievements as one of the industry’s most successful network solutions companies have led to numerous distinctions, including being ranked among Forbes 200 Best Small Companies in America, Business Week’s Top 100 Hot Growth Companies, and Red Herring’s IPO Top 100 Technology Stocks.

NetScout Systems, headquartered in Westford, Massachusetts, has over 300 employees, and offices located in North America, Europe and Asia. Further information on the company is available on the World Wide Web at www.netscout.com.




  Copyright, 1995-2001 Network World, Inc. All rights reserved.