|
|
Search | ¦ | Sites | ¦ | Services | ¦ | ITcareers |
![]() ![]() |
|
Special Advertising Section
|
|
|
Informed Management Through High-Quality Data Sources White Paper by Russ Currie Netscout Systems, Inc. Contents
Today, more than ever, business is conducted in the electronic medium over company networks and intranets as well as Web sites and the Internet. This is an environment of private, outsourced, and Internet-based infrastructures mixed with a complex assortment of networked devices, topologies and applications. The importance of managing the IT infrastructure cannot be overemphasized. Infrastructure Performance Management (IPM) is a way to measure and report on the infrastructure’s ability to perform and meet its service-level objectives. IPM manages the three key components of the infrastructure: the application environment, the computing environment, and the network environment. It optimizes performance to meet end user demands for availability and reliability while operating more cost-effectively. While many solutions may aid in this effort, not all can provide the right information to the right person with enough speed and efficiency. This document addresses four fundamental aspects of infrastructure performance management and the data sources that empower them. What are the Functions of Infrastructure Performance Management? While managing the infrastructure’s performance involves many activities, this paper focuses on the four most important functions: • Real-time Troubleshooting • Capacity Planning • Service Level Management • Usage-Based Billing Each of these functions has a distinct goal, generates measurable tasks and is often performed by discrete individuals or groups. The following is a brief definition and explanation. Real-time TroubleshootingReal-time troubleshooting is the problem/repair function of network management. Always urgent, this activity is usually driven by calls to the Help Desk or by alarms sent automatically to an Umbrella Management Station (UMS) such as HP OpenView or MicroMuse NetCool. When a portion of the network fails, or application performance degrades, it becomes a crisis that drives network and system managers to catch failures immediately and repair the problem as quickly as possible. If the infrastructure is well managed, IT can also investigate insidious performance degradations in time to prevent a greater impact on the business. Capacity PlanningCapacity planning involves reporting on and forecasting the infrastructure resources that are required to keep the business running at peak efficiency, both short-term and long-term. Although it is most often associated with bandwidth, capacity planning can also be used for network hardware (e.g. router capacity) and application servers. Planning ahead to ensure that each part of the infrastructure has the right amount of resources at all times is essential to the success of the business. Demand for increased bandwidth and additional network equipment and/or computing devices often grows as new applications are added to networks and extra features are loaded on existing applications. Because equipment is often expensive and requires long lead times to acquire and deploy, IT managers must plan effectively for future capacity to maintain reliable operations while staying within the capital budget. Service Level ManagementService level management is the function of measuring and reporting on services (applications or content) delivered via the network. Whether network application services are outsourced or delivered in-house, measuring and reporting on service levels is important to managing the infrastructure. Service levels address the expectations of users and the businesses that rely on the infrastructure. In most cases, service level expectations are not articulated, documented, or measured. To be effective, IT must create the service levels they will commit to delivering and measure them based on actual network usage. Specifically, service levels must address the natural perspective of the user community, which includes both the availability and responsiveness of applications. Usage-Based BillingUsage-based billing is the function of tracking and (potentially) charging for the use of infrastructure resources. While the concept of charging for connect time is well accepted, the idea of charging for the use of a specific service is relatively new but enhances IT’s ability to influence user behavior. For example, when users are charged for the volume of traffic that they generate on the network, systems managers can account for the costs of the infrastructure equipment required to support it. In an ideal scenario, the rate that users are charged varies based on the rules and values that the business applies to a specific service. While this function may seem to apply primarily to service providers, enterprises can gain great value from a usage-based billing solution. In most cases the cost of deploying and operating the infrastructure is hidden from much of the business. By tracking and reporting on the costs associated with delivering a level of service on the infrastructure, IT can justify resources and promote "good network behavior." Understanding Data SourcesInfrastructure management relies heavily on the Simple Network Management Protocol (SNMP), which enables the umbrella management system to communicate with any network equipment that supports SNMP. These devices run agents that write information to a Management Information Base (MIB). The network management system polls the equipment with SNMP and retrieves data from the MIB. This information is used to perform the network management functions discussed earlier. The MIBs store a variable amount of information. Nearly all devices that support SNMP store information in the MIB II format, which provides the framework for storing data about the devices as well as varying levels of performance information. At its most elemental, MIB II provides basic system configuration information (the type of equipment, uptime, network interfaces and rudimentary traffic data) and alerts the umbrella management system with messages, called traps, when predefined conditions occur. Private extensions allow equipment and software manufacturers to add their own unique management information. These too are often limited to configuration information (the number and types of interfaces, versions of software, etc.), however, and do not reflect the performance of the device. An extension to MIB II is RMON (Remote MONitor), which was specifically designed to manage networks. This MIB contains detailed information on the flow of application traffic across the network and the status of the actual network segments. An RMON data source continually monitors the network, and probes are the most effective of these devices. What Data Sources are Available?Three broad types of data sources manage infrastructure performance. They provide: • Information from infrastructure equipment, • Information from desktops and application servers, and, • Information from dedicated data sources – network probes. Each of these data sources is unique and provides a distinct set of information that supports infrastructure performance management functions. Information from Infrastructure EquipmentBecause most network equipment deployed in the last several years supports SNMP, it is an excellent source of information. As mentioned earlier, MIB II provides the basic status of the network equipment. For example, an umbrella management system retrieves the type of equipment, its configuration, the number of packets moved and some basic error statistics. The network equipment also sends a trap to the network management system when a catastrophic problem occurs. Also available are the private extensions implemented by most vendors. Because MIB II is a standard and the information it stores is generic and device-centric, many vendors have added these extensions to improve the manageability of their equipment. For example, one vendor’s product may view the version of software that is running the equipment, or the performance of unique features that do not fit the standard definitions of MIB II. While these extensions may offer great value, the umbrella management system must be configured to recognize them. In practice, network equipment generates only the most basic information because its primary function is to move data, not to monitor it. Added functionality draws processing power and thus may impact the equipment’s performance. The effect of standard MIB II is often factored into the equipment’s performance because most vendors understand the need for manageability. Information from Application Servers and DesktopsMany desktops and most application servers support SNMP. While the basic MIB II functionality is present, the real value comes from private extensions and special implementations of agents. These gather basic performance data on the servers and desktops but the most interesting information pertains to the applications that are running on the equipment. Application ServersStandard SNMP agents in application servers give insight into the basic operations of the server, items such as CPU, disk and memory usage. Private extensions provide greater insight into hardware or applications based on the specifications of the manufacturer. An additional source of data is an agent that has been written specifically for the server or application. Usually developed by and available from a third party, these agents provide detailed insight into the performance of the application server. As most applications have some degree of management data available (usually through an application-specific console), the application server agent provides access to this often-extensive source of performance data. PRO: The benefit of these agents is that they provide a great deal of insight into the performance of both the application and the server. CON: The drawbacks are that few infrastructure performance management applications take advantage of this data and the agent must be current with the application being monitored. Desktop AgentsThree types of agents measure application performance at the desktop. They are: 1. Embedded application "hooks" 2. Passive monitors, and 3. Active agents 1. Embedded Application Hooks In a few rare cases, applications have been written or re-written to measure the application’s performance at the desktop. These "hooks" watch the communications between the desktop and the application for attributes that have been defined by the application developer. The most common, although not widely used, implementation of this type of monitoring is ARM (Application Response Monitor). PRO: The benefit of this type of solution is that its measurements are specifically tailored to the particular application. CONs: Several drawbacks exist: · The application must often be re-written to support ARM because not many applications have built these hooks in. · ARM provides only the structure for measuring application performance, not the way in which this information is stored. Thus, the network management application must be aware of the specifics before it can generate reports. · As the hooks measure performance by observing user activities, they can only measure the application when users are running it. · Complete coverage requires distribution and collection from all the desktops being measured. 2. Passive Monitors Like application "hooks," passive monitors are software that run on the desktop and observe the actions of the application’s users. In most cases, these solutions attempt to watch both the activities of the application on the desktop and the network activity associated with the application. Because passive monitors are independent of the application, the application does not have to be re-written but the monitor does interpret the application functions. For example, the monitor may watch the Windows environment and interpret changes as application transactions, but this may or may not accurately reflect the application’s design. In fact, it represents only the desktop’s interpretation of how the application is functioning. PRO: The benefits of the passive monitor are similar to those of application "hooks." One key differentiator, however, is that the application does not have to be re-written. CONs: The drawbacks are also similar to those of application "hooks."
For example, a business with 2000 users would require 2000 instances of agent technology to cover 100% of the network and application. While a sample group can provide a statistically accurate representation of performance, the complexity of most infrastructures, combined with the unpredictable nature of users, minimizes a test group’s impact. 3. Active Agents Active Agents that mimic a users activity provide a unique mechanism for measuring network and application performance. By mimicking the user’s activity, the active agent measures a defined set of application transactions (tasks) at regular intervals. This level of control ensures that the performance measure is well defined and exercises the application components that are critical to the success of the business. Ideally, the active agent should exercise the actual application, application server and business processes that are critical to the business. Some solutions, however, measure the network and application in a less-than-comprehensive manner. For example, some provide no more than a basic network call to the application server (PING). Others simulate both the transaction and the application by having an agent at the application server respond to the desktop agent in a fashion similar to how the application would respond. The only viable solution is one that actually measures the business application itself. PROs: An active agent that mimics user activity has many benefits.
CONs: The drawbacks of the active agent are:
Information from Dedicated Data SourcesNetwork ProbesProbes are the only data sources that are designed for and dedicated to infrastructure management. As mentioned earlier, probes are based on industry standards for managing network technologies and identifying application traffic. In addition to these standards, some vendors extend the capabilities of probes to measure the response time of applications. Two proposed standards, Application Response Time MIB (ART MIB) and Application Performance Monitor (APM), measure response time by observing the application traffic on the network. Placed on critical segments of the network, probes provide the scalability to manage large networks and large amounts of data. Originally designed for managing Ethernet networks, probes now support a wide range of network technologies from Gigabit LANs to ATM WANs. In our example of a 2000-user network, we can assume that these users are spread throughout several facilities that are connected via a WAN. The application servers and Internet business are centralized at a single location or Data Center. In our fictional network, there are at least three critical network segments where user traffic comes together to share the network: at the WAN connection for each of the remote sites, at the WAN connection(s) to the data center, and within the Data Center itself. While it is certainly possible to put agents on each of the desktops in the network, it is impractical. It is also unnecessary. A probe placed on each of the WAN links at the remote facilities provides visibility into all of the network application traffic. Additionally, the probe provides the ability to manage the WAN link itself. With it, IT receives network application performance data, and manages a costly and critical component of the network: the WAN links. Probes can also be used to perform troubleshooting duties. Because the probe listens to all traffic on the network segment, it captures traffic data for detailed analysis. Application traffic is decoded to find problems in the way that applications are using the network. Also, most probe management applications can monitor traffic as it passes by, which allows the network manager to view the status of the network segment almost instantaneously. PROs: The benefits of the probe are leveraged through its design.
CONs: The drawbacks of probes are also related to their design.
Matching Data Sources to Infrastructure Performance Management Functions As we have seen, several options exist for gathering information on network and application performance. Each data source possesses unique characteristics that alone cannot provide a complete solution. Thus, it is important to match these data sources with the value that they add to infrastructure performance management functions. Real-time Network ManagementReal-time network management is primarily a problem/repair operation. When a problem interrupts service, the network must be restored to normal operation immediately. While repair is often driven by trouble tickets or device failure alarms, the ideal scenario is alerting IT to service degradation before a catastrophic failure occurs. To achieve this, IT must put in place a proactive monitoring solution. Probes provide the ideal mechanism for monitoring the network and applications in real time.
Probes provide the only viable solution
for real-time network and Capacity PlanningCapacity planning depends on the ability to know how infrastructure resources are being used today, and projecting when they will be exhausted. Obtaining capital for new equipment purchases or ordering additional bandwidth can be a complex process with very long lead times. Attempting to plan resource requirements through educated guesswork opens the door to failure or performance degradation. A capacity planning solution must be able to monitor the existing environment and forecast trends based upon information with robust data sources. Visibility into raw utilization numbers is good, but it is ideal to understand who is using the network and how.
The combined information from probes and network equipment facilitates informed decision-making. Growth of the infrastructure can be restricted to what is essential for the success of the business, resulting in less waste when buying costly equipment and bandwidth. Additionally, forecasting when resources will be required ensures that resources will not sit idle, or be bought at a premium to fix an emergency. Combining the raw information from network equipment with the detailed usage information from probes makes the ideal data source for a capacity planning solution. Service Level ManagementService level management is perhaps the most discussed, yet least implemented of the infrastructure performance management solutions. Because service level management reflects directly on the ability of the network and applications to support the business, an SLM solution must measure infrastructure performance in the terms of the business. It has two parts:
To effectively measure the business transaction, the SLM solution must reflect the user’s actual activity. While a passive monitor can measure the performance of a user once he has accessed the service, it cannot ensure that the service is available for him. Just knowing that the service is available, however, is insufficient if the performance is unacceptable. Thus, a solution must address both of these issues and Synthetic Transactions™ provides this capability. By measuring the performance of the network service continually, active agents provide both an availability metric and a performance measurement. As it can be deployed anywhere in the network and mimic an actual user, or multiple users, active agents provide the necessary measurement of business processes. Additionally, as the active agent measures performance regardless of whether a user is present, potential problems can often be discovered before they impact the user community. Active agents that mimic a user’s activity provide the foundation for an effective Service Level Management solution. Usage-Based BillingUsage-based billing raises organizational awareness to the expense of running the infrastructure and provides the means for IT to recover the costs by educating users and creating incentives for change. Currently, IT levies connect-time costs and/or flat fees on users to recover or account for these costs. A usage-based billing solution extends this approach and makes it more effective by addressing the actual use of the infrastructure. As the name implies, usage-based billing mandates accounting for the actual use of infrastructure resources. While this information may be available from some of the application servers in the infrastructure, it may be difficult or impossible for IT to collect this information. The application environment is as large as the desktop environment and usually more complex, involving multiple tiers of servers and applications. Additionally, a growing amount of business is done outside of IT’s control via the Internet.
Probes provide the full visibility into the infrastructure that is required for an effective usage-based billing solution. ConclusionData sources matched to infrastructure performance management function
The growth, complexity and importance of corporate infrastructure show no signs of slowing in the near future. Although significant money has been spent on expansion in the past few years, infrastructure management continues to be an afterthought. For the success of the business, it is imperative that IT management implements infrastructure performance management to ensure that businesses and infrastructures alike perform to expectations. Crucial to these deployments are the data sources that inform them. IT’s primary concern must be the quality of the data that is used to manage the infrastructure. Because management decisions based on incomplete or inaccurate data can cost time, money and opportunity, an infrastructure performance management solution should allow IT to control the infrastructure effectively and provide unmatched visibility into how the infrastructure is used. About NetScout SystemsNetScout Systems, Inc. (NASDAQ-NTCT) is the leading provider of infrastructure performance management solutions for large enterprises, e-businesses and service providers worldwide. Our products help organizations increase the return on their infrastructure investments by optimizing not only the performance of their networks but also the networks’ ability to deliver applications and content to end-users. The nGenius‘ system collects data from our proprietary active agents, award-winning probes, and network devices. The accuracy, timeliness, and robustness of the information produced, provides end-to-end network visibility—both in real-time and historically—for better network and application control. This comprehensive approach enables critical business applications such as e-commerce, supply chain management, ERP, and CRM to run smoothly and reliably. NetScout’s achievements as one of the industry’s most successful network solutions companies have led to numerous distinctions, including being ranked among Forbes 200 Best Small Companies in America, Business Week’s Top 100 Hot Growth Companies, and Red Herring’s IPO Top 100 Technology Stocks. NetScout Systems, headquartered in Westford, Massachusetts, has over 300 employees, and offices located in North America, Europe and Asia. Further information on the company is available on the World Wide Web at www.netscout.com. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|
Copyright, 1995-2001 Network World, Inc. All rights reserved. |