Performance Management (PM) comprises four fundamental ITIL-defined processes that build upon one another. Incident management is the foundation, with availability management above it, then capacity management, and service level management positioned as the most sophisticated "capstone". Here is what these building blocks include in a data networking context.
Incident Management
An incident is an unplanned service interruption or service quality reduction that affects performance. Other ITIL processes like event management can provide early evidence of an emerging incident, however, most events are inconsequential to performance management and are usually automatically removed from consideration through policies, thresholds or correlation analysis.
Incident management comes into play when automated event processing indicates degradation or users start to complain. It generally involves network operations center and service desk staff, often using a common trouble ticket system. Incident management starts with good instrumentation followed by the processing of events such as SNMP feeds, probe reports, pings and traps. Systems that support incident management turn events into actionable alarms with good diagnostic information. The goal of incident management is to restore service operation or quality as quickly as possible, thus minimizing business disruption.
If you can only implement one performance management building block, this is the one to choose. All systems have incidents, so a well oiled incident management system covering key parts of your infrastructure and business services is essential
Availability Management
There several paths to "availability awareness". You can proactively implement an availability plan during the service design phase. You can reactively monitor your incident management system to identify chronic incidents that may indicate availability issues. Or you can reactively monitor customers' complaints about availability.
Whatever path you take, availability is difficult to quantify. It is not one hundred percent less the percent down-time. That would be the case if the entire time you thought the system was working all system components were indeed fully operational and providing service - but life is not that simple.
Availability management involves carefully defining and rigorously measuring availability, and determining availability trends using historic data. Availability measurements must be converted into reports that shed light on infrastructure and service health. A report might show the percent of time a resource operated without any reported service outage incidents. A more rigorous approach might use synthetic agents to periodically test the system from key user locations. You can't test all the time from everywhere because you would overload the system under test - so you must make thoughtful tradeoffs.
Capacity Management
This next building block is often seen as a long-term solution to availability issues. We often hear: "If we just had enough capacity, then incident problems would go away and availability would be great." We're afraid not. Yes, capacity management is valuable, but no, it does not eliminate the need for the first two building blocks - and in fact rests upon them.
Capacity management should be performed at the business, service and component levels. Business capacity management involves working with business managers to understand changing business conditions that will impact system load. Service capacity management quantifies user and application demand to properly size the service. And Component capacity management measures and tracks utilization at each device (e.g., buffers, memory, CPU, bandwidth).
Many organizations do a good job of component level capacity planning but find it difficult to up-level to the service and business levels. They measure utilization over time and examine trends to determine if utilization is "out of range" on the high or low side. Costs can be aligned with business needs by decommissioning under-utilized and supplementing over-utilized resources. Capacity management enables organizations to create "what if" scenarios to plan for anticipated business needs.
Service Level Management
Service level management involves identifying, continually monitoring and reviewing services compared to agreed-upon service targets. Service level targets must be tailored to business needs. Assessing the effect of system changes on service quality and the ability to meet service level targets is key. Service level management also involves determining whether agreed-upon service levels are delivered when and where specified. Effective service level management does not stand alone. It coordinates incident management, availability management and capacity management to ensure that required service levels are achieved.
Service level management capabilities should be comprehensive and provide sufficient system performance information to implement service level agreements (SLAs). An SLA is a complex agreement that must be carefully negotiated. More on that in future installments . .
|
Does Verizon's Voyager stack up to the iPhone? |
|
|
5 IT skills that won't boost your salary
[1,407]
Women 4 times more likely than men to cough up personal info
[589]
Japan's 10 funniest tech-related commercials [Videos]
[407]
Throwing away a promo CD is "unauthorized distribution"?
[1,265]
Adults too quick to dismiss educational video games
[682]
Attack of the iPhone clones [Slideshow]
[578]
10 things IT needs to know about AJAX
[1,258]
This Year's 25 Geekiest 25th Anniversaries [Slideshow]
[409]
|
|