Chapter 1: Service Management Basics

Excerpt from System Center Service Manager 2010 Unleashed.

Excerpt from System Center Service Manager 2010 Unleashed.

By Kerrie Meyler, Alexandre Verkinderen, Anders Bengtsson, Patrik Sundqvist, David Pultorak

Published by Sams

ISBN-10: 0-672-33436-4

ISBN-13: 978-0-672-33436-8

Extras: Read author Kerrie Meyler's blog for Microsoft Subnet, Managing Microsoft

Newsletters: Receive Special Offers from InformIT newsletters

In This Chapter

  • Ten Reasons to Use Service Manager

  • The Problem with Today’s Systems

  • Service Management Defined

  • Evolution of the CMDB

  • Strategies for Service Management

  • Overview of Microsoft System Center

  • The Value Proposition of Service Manager 2010

System Center Service Manager 2010, a new addition to the Microsoft System Center suite, is an integrated platform for automating and adapting Information Technology service management (ITSM) best practices, such as those found in the Information Technology Infrastructure Library (ITIL) and Microsoft Operations Framework (MOF), to your organization’s requirements. Service Manager provides built-in processes for incident resolution, problem resolution, change control, and configuration management.

Service Manager is a help desk and change management tool. By using its configuration management database (CMDB) and process integration, Service Manager automatically connects knowledge and information from System Center Operations Manager (OpsMgr), System Center Configuration Manager (ConfigMgr), and Active Directory (AD) Domain Services. Service Manager provides the following capabilities to deliver integration, efficiency, and business alignment for your Information Technology (IT) services:

  • Integrating process and knowledge across the System Center suite: Through its integration capabilities with Operations Manager and Configuration Manager, Service Manager provides an integrated service management platform. This helps to reduce downtime and improve the quality of services in the data center.

  • Providing an accurate and relevant knowledge base: Knowledge base information resides in the CMDB and contains the product and user knowledge to enable IT analysts to quickly identify and resolve incidents. Users can use the Self-Service portal (SSP) to search the knowledge base for information to help find solutions to issues. An organization can create and manage its own knowledge base articles and make this information accessible to both IT analysts and end users.

  • Lowering costs and improving responsiveness: Service Manager’s capabilities can improve user productivity and satisfaction, while reducing support costs using the SSP and increasing confidence in meeting compliance requirements with the IT GRC (governance, risk, and compliance) Process management pack.

  • Improving business alignment: Service Manager helps your organization align to its business goals and adapt to new requirements through its configuration management, compliance, risk management, reporting, and analysis capabilities.

  • Delivering immediate value with built-in process management packs: Included with Service Manager are core process management packs for incident and problem resolution, change control, and configuration and knowledge management.

This chapter introduces System Center Service Manager 2010. Various abbreviations for the product include SCSM, SM, Service Manager, and SvcMgr; this book uses the nomenclature of Service Manager and SvcMgr. Service Manager provides user-centric support, enables data center management efficiency, and enables you to align to your organization’s business goals and adapt to ever-changing business requirements.

Ten Reasons to Use Service Manager

Why should you use Service Manager 2010 in the first place? How does this make your daily life easier? Although this book covers the features and benefits of Service Manager in detail, it definitely helps to have a general idea about why Service Manager is worth a look!

Let’s look at 10 compelling reasons why you might want to use Service Manager:

  1. Your support desk is overwhelmed with manually entering user requests (24x7).

  2. You realize help desk management would be much simpler if you had visibility and information for all your systems on a single console.

  3. You discover email is down when upper management calls the help desk. Although this mechanism is actually quite effective in getting your attention, it is somewhat stress inducing and not particularly proactive.

  4. You would be more productive if you weren’t dealing with user issues all day... and night... and during lunch and vacation.

  5. The bulk of your department’s budget pays for teams of contractors to manage user support and the help desk.

  6. You are tired of going through each of your servers looking for reports you need on your client, server, physical, and virtual environments.

  7. Your system admins are patching and updating production systems during business hours, often bringing down servers in the process.

  8. By the time you update your user documentation, everything has changed, and you have to start all over again!

  9. You can’t stay on top of adapting to your organization’s business needs when you’re not sure of your current capabilities.

  10. You don’t have the time to write down all the troubleshooting information that is in your brain, and your boss is concerned you might be hit by a truck (or want to take that vacation). This probably is not the best way to support end users.

While somewhat tongue-in-cheek, these topics represent very real problems for many IT managers and support staff. If you are one of those individuals, you owe to it yourself to explore how you can leverage Service Manager to solve many of these common issues. These pain points are common to almost all users of Microsoft technologies to some degree, and Service Manager holds solutions for all of them.

However, perhaps the most important reason for using Service Manager is the peace of mind it can bring you, knowing that you have complete visibility and control of your IT systems. The productivity this can bring to your organization is a tremendous benefit as well.

The Problem with Today’s Systems

With increasing operational requirements unaccompanied by linear growth in IT staffing levels, organizations must continually find ways to streamline administration through tools and automation. Today’s IT systems are prone to a number of problems from the perspective of service management, including the following:

  • Configuration “shift and drift”

  • System isolation

  • Lack of historical information

  • Not enough expertise

  • Missing incidents and information

  • Lack of process consistency

  • Not meeting service level expectations

This list should not be surprising, because these problems manifest themselves in all IT shops with varying degrees of severity. In fact, Forrester Research estimates that 82% of larger shops are pursuing service management, and 67% are planning to increase Windows management. Let’s look at what the issues are.~

Why Do Systems Go Down?

Let’s start with examining reasons why systems go down. Figure 1.1 illustrates reasons for system outages, based on the authors’ personal experiences and observations, and the following list describes some of these reasons:

  • Software errors: Software is responsible for somewhat less than half the errors. These errors include software coding errors, software integration errors, data corruption, and such.

  • User errors: End users and operators cause just fewer than half the errors. This includes incorrectly configuring systems, failing to catch warning messages that turn into errors, accidents, unplugging the power cord, and so on.

  • Miscellaneous errors: This last category is fairly small. Causes of problems here include disk crashes, power outages, viruses, natural disasters, and so on.

As Figure 1.1 demonstrates, the vast majority of failures result from software-level errors and user errors. It is surprising to note that hardware failures account for only a small percentage of problems, which is a tribute to modern systems such as redundant array of independent disks (RAID), clustering, and other mechanisms deployed to provide server and application redundancy.

Figure 1.1

Causes of System Outages. D. Scott, in a May 2002 presentation titled Operation Zero Downtime, discussed similar statistics at a Gartner Group Security Conference.

The numbers show that to reduce system downtime, you need to attack the software and user error components of the equation. That is where you will get the most “bang for the buck.”

Configuration “Shift and Drift”

Even in IT organizations with well-defined and documented change management, procedures fall short of perfection. Unplanned and unwanted changes frequently find their way into the environment, sometimes as an unintended side effect of an approved, scheduled change.

You might be familiar with an old philosophical question: If a tree falls in a forest and no one is around to hear it, does it make a sound?

Here’s the change management equivalent: If a change is made on a system and no one is around to hear it, does identifying it make a difference?

The answer to this question is absolutely “yes.” After all, every change to a system can potentially impact the functionality or security of a system, or that system’s adherence to corporate or regulatory compliance.

For example, adding a feature to a web application component may affect the application binaries by potentially overwriting files or settings replaced by a critical security patch. Or perhaps the engineer implementing the change sees a setting he thinks is misconfigured and decides to just “fix” it while already working on the system. In an e-commerce scenario, where sensitive customer data is involved, this could have potentially devastating consequences. Not to mention that troubleshooting something you don’t know was changed is like looking for the proverbial needle in a haystack.

At the end of the day, your management platform must bring a strong element of baseline configuration monitoring and enforcement to ensure configuration standards are implemented and maintained with the required consistency.

System Isolation

Microsoft Windows Server and the applications that run on it expose a wealth of information with event logs, performance counters, and application-specific logs. However, this data is isolated and typically server centric—making it difficult to determine what and where a problem really is. To get a handle on your systems, you need to take actions to prevent the situation shown in Figure 1.2, where you have multiple islands of information.

Figure 1.2

Multiple islands of information.

Here are places you might find isolated information:

  • Event logs: Events are generated by the Windows operating system, components, and applications. The logs include errors, warnings, information, and security auditing events. These event logs are stored locally on each server.

  • Performance counters: The Windows operating system and multiple applications expose detailed performance information through performance counters. The data includes processor utilization, memory utilization, network statistics, disk free space, and thousands of other pieces of information. This information can help with forecasting performance trends and identifying response issues that can affect application availability.

  • Windows Management Instrumentation (WMI): WMI provides access to an incredible amount of information, ranging from high-level status of services to detailed hardware information.

  • Expertise: Consultants, engineers, and subject matter experts have information locked up in their heads or written down on whiteboards and paper napkins. This is as much an island of information as the statistics and data stored on any computer.

Although system information is captured through event logs, performance counters, file-based logs, and experiences, it is typically lost over time. Most logs roll over, are erased to clear space, or eventually overwritten. Even if the information is not ultimately lost or forgotten, it typically is not reviewed regularly.

Most application information is also server centric, typically stored on the server, and specific to the server where that application resides. There is no built-in, systemwide, cross-system view of critical information.

Having islands of information, where data is stranded on any given island, makes it difficult to get to needed information in a timely or effective manner. Not having that information can make managing user satisfaction a difficult endeavor.~

Lack of Historical Information

Sometimes you may capture information about problems but are unable to look back in time to see whether this is an isolated instance or part of a recurring pattern. An incident can be a onetime blip or can indicate an underlying issue. Without having a historical context, it is difficult to understand the significance of any particular incident.

Here’s an example: Suppose that a consultant is brought in to review why a database application has performance problems. To prove there is an issue, the in-house IT staff points out that users are complaining about performance but the memory and CPU on the database server are only 50% utilized. By itself, this does not indicate anything. It could be that memory and the CPU are normally 65% utilized and the problem is really a network utilization problem, which in turn is reducing the load on the other resources. The problem could even be a newly implemented but poorly written application! A historical context could provide useful information.

1 2 3 4 5 6 Page 1
Page 1 of 6
IT Salary Survey: The results are in