Chapter 3: Looking Inside OpsMgr

Sams

1 2 3 4 5 6 Page 4
Page 4 of 6
  • Discoveries—A discovery is a workflow that discovers one or more objects of a particular type. A discovery can discover objects of multiple types at one time. As introduced previously in the "Service Modeling" section of this chapter, there are both object discovery and relationship discovery rules.

  • Rules—A rule is a generic workflow that can do many different things. As an example, it could collect a data item, alert on a specific condition, or run a scheduled task at some specified frequency. Rules do not set state at all; they are primarily used to collect data to present in the console or in reports and to generate alerts.

  • Tasks—A task is a workflow that is executed on demand and is usually initiated by a user of the OpsMgr console. Tasks are not loaded by OpsMgr until required. There are also agent-initiated tasks, where the agent opens up a TCP/IP connection with the server, initiating the communication. After the connection is established, it is a two-way communication channel.

  • Monitors—A monitor is a state machine and ultimately contributes to the state of some type of object that is being monitored by OpsMgr. There are three monitor types: aggregate (internal rollup), dependency (external rollup), and unit monitors. The unit monitor is the simplest monitor, one that simply detects a condition, changes its state, and propagates that state to parent monitors in the health model that roll up the status as appropriate. We cover monitors in more detail in the next section of this chapter.

  • Diagnostics—A diagnostic is an on-demand workflow that is attached to a specific monitor. The diagnostic workflow is initiated automatically either when a monitor enters a particular state or upon demand by a user when the monitor is in a particular state. Multiple diagnostics can be attached to a monitor if required. A diagnostic does not change the application state.

  • Recoveries—A recovery is an on-demand workflow that is attached to a specific monitor or a specific diagnostic. The recovery workflow is initiated automatically when a monitor enters a particular state or when a diagnostic has run, or upon demand by an operator. Multiple recoveries can be attached to a monitor if required. A recovery changes the application state in some way; hopefully it fixes any problems the monitor detected!

  • Overrides—Overrides are used to change monitoring behavior in some way. Many types of overrides are available, including overrides of specific monitoring features such as discovery, diagnostics, and recoveries. Normally the OpsMgr administrator or operator sets overrides based on his specific, local environment. However, in some cases, a management pack vendor may recommend creating overrides in particular scenarios as a best practice.

Monitors

It all starts with monitors in Operations Manager 2007. We have mentioned that a health model is a collection of monitors. If you were to author a management pack, you would probably start with creating unit monitors. Unit monitors would detect conditions you determine are essential to assess some aspect of the health of the application, device, or service needing to be managed.

Monitors provide the basic function of monitoring in OpsMgr. You can think of each monitor as a state machine, a self-contained machine that sets the state of a component based on conditional changes. A monitor can be in only one state at any given time, and there are a finite number of operational states.

A monitor can check for a single event or a wide range of events that represent many different problems. The goal of monitor design is to ensure that each unhealthy state of a monitor indicates a well-defined problem that has known diagnostic and recovery steps.

Using a single monitor to cover a large number of separate problems is not recommended, because it provides less value. We mentioned in the lead-in to the "Health Models" section of this chapter that adding monitors to a health model increases the richness of an object's monitoring experience. The enhancement of an object's health model with many monitors adds fidelity to the health state of the object. More monitors in a health model also means more relationship connection points for other managed objects that host, contain, depend on, or reference that object.

We pointed out the "pearl" icon used to represent a monitor in health model diagrams. An empty pearl icon represents a generic or a non-operational monitor. Figure 3.11 is a chart showing the default monitor icon images and their corresponding operational state.

Figure 3.11

These state icons are encountered in the Operations console.

A functioning monitor displays exactly one of the primary state icons: green/success, yellow/warning, or red/critical. A newly created or nonfunctional monitor will show the blank pearl icon. The gray maintenance mode "wrench" icon appears in all monitoring views inline with the object that was placed in maintenance mode. The final type of state icon you will encounter is the grayed state icon, which indicates that the managed object is out of contact. For example, this could reference a managed notebook computer that is off the network at the moment.

To be clear, there are three kinds of monitors that management pack authors can create: aggregate rollup monitors, dependency rollup monitors, and unit monitors. In the next sections we will describe each of these monitor types.

Aggregate Rollup Monitors

Let's return to the Figure 3.9 view of the layers of the SML, which permits tactical placement of interrelated monitors. On the right, notice the monitors are classified in categories, essentially four vertical columns that are connected by a rollup to the top-level entity health status. Microsoft selected these four categories during OpsMgr development as a framework to aggregate the health of any managed object.

The four standard types of aggregate monitors in a state monitor are detailed in the following list:

  • Availability Health—Examples include checking that services are running, that modules within the OpsMgr health service are loaded, and basic node up/down tracking.

  • Performance Health—Examples include thresholds for available memory, processor utilization, and network response time.

  • Security Health—Monitors related to security that are not included in the other aggregate monitors.

  • Configuration Health—Examples include confirming the Windows activation state and that IIS logging is enabled and functioning.

Dependency Rollup Monitors

The second category of monitor is the dependency rollup. Such a monitor rolls up health states from targets linked to one another by either a hosting or a membership relationship. Dependency rollup monitors function similarly to aggregate rollup monitors, but are located at intermediate layers of the SML hierarchy.

In Figure 3.9, notice again the unit monitors for the IIS service located in the lower right. There are two unit monitors of the performance type at the IIS Service level that merge at the Windows Computer Role level. The merge point represents one or more dependency rollup monitor(s) targeted at the Windows Computer Role.

Earlier in the "Service Modeling" section of this chapter, we explored how objects such as disk partitions, logical disks, and physical disks have numerous relationships. Figure 3.12 shows a sample dependency rollup monitor involving disk systems created in the OpsMgr authoring space.

Figure 3.12

Creating a dependency rollup monitor when the target is a disk partition.

The monitor created in Figure 3.12 is targeted against the Windows Server 2003 Disk Partition class. OpsMgr knows that disk partitions contain logical disks, so when you create a new dependency rollup monitor targeting the Windows Server 2003 Disk Partition class, OpsMgr offers existing monitors to select from for the Windows Server 2003 Logical Disk class.

We can also expand the example of the "merged" IIS service performance unit monitors in Figure 3.9. If we were creating that dependency rollup monitor in the authoring space, we would have selected the Windows Computer Role as the target of our monitor. The Create a Dependency Monitor Wizard would provide us with a list of dependent objects to select from that includes those IIS service performance monitors.

 Unit Monitors

A unit monitor allows management pack authors to define a list of states and how to detect those states. A simple unit monitor is a Basic Service Monitor. This monitor raises state changes when a Windows service stops running. More complex unit monitors run scripts, examine text logs, and perform Simple Network Management Protocol (SNMP) queries. A unit monitor is deployed, or targeted, at a class of objects when it is authored.


Target the Agent to Deploy a Monitor to All Computers - Targeting a monitor at the Agent object class deploys the monitor to all managed computers. Use the Agent target like an "All Computers" group for monitors, but also use it sparingly. It is an OpsMgr best practice to deploy the minimum set of appropriate monitors to a managed computer.


When creating monitors and envisioning operational states, Microsoft advises OpsMgr administrators and management pack authors to do so without initially regarding actual implementation of those monitors. The reasoning is that OpsMgr not only provides many monitor types by default for common scenarios, but makes it possible to build different workflows to meet any monitoring requirement. Basically, the management pack architect is encouraged to think "outside the box" and describe in plain ideas how an application's health can be assessed. After that, you can look to the many tools OpsMgr provides to instrument the application accordingly.

Figure 3.13 presents a montage screenshot that includes all possible types of unit monitors available in the authoring space of the OpsMgr console. These are the tools used to architect the instrumentation of the health model.

Figure 3.13

The complete menu of types of unit monitors that can be created.

Over 50 unit monitor types are available to place as software instrumentation in the SML framework. Remember that unit monitors roll up into the aggregate monitors (Availability, Performance, Security, and Configuration), sometimes via dependency rollup monitors. The goal of monitor design is to ensure that each unhealthy state of a monitor indicates a well-defined problem that has known diagnostic and recovery steps. Table 3.2 provides some explanation of the unit monitor types found in the menu in Figure 3.13.

TABLE 3.2 Unit Monitor Types

Monitor type

Description

Average Threshold

Average value over a number of samples.

Consecutive Samples over Threshold

Value that remains over or below a threshold for a consecutive number of minutes.

Delta Threshold

Change in value.

Simple Threshold

Single threshold.

Double Threshold

Two thresholds (monitors whether values are between a given pair of thresholds).

Event Reset

A clearing condition occurs and resets the state automatically.

Manual Reset

Event based; wait for operator to clear.

Timer Reset

Event based; automatically clear after certain time.

Basic Service Monitor

Uses WMI to check the state of the specified Windows service. The monitor will be unhealthy when the service is not running or has not been set to start automatically.

Two State Monitor

Monitor has two states: Healthy and Unhealthy.

Three State Monitor

Monitor has three states: Healthy, Warning, and Unhealthy.

To conclude this section on monitors, we're going to put it all together by overlaying the SML and the health model for a live service monitor. Figure 3.14 is a fully expanded view of the health model of the OpsMgr Health service itself running on a management server.

Beginning at the lowest level of the object description tree, we see the MonitoringHost Private Bytes Threshold unit monitor on the computer Hurricane. Five unit monitors are shown in the lowest row that roll up into the Health Service Performance monitor. These unit monitors are labeled with the abbreviations Svc Handle, Svc Priv, Mon Handle, Mon Priv, and Send Queue in Figure 3.14. The MonitoringHost Private Bytes Threshold (abbreviated Mon Priv) unit monitor is in a critical state.

We can follow the propagation of this unit monitor state up the health model. The OpsMgr Health service is an application component of Windows Local Application Health Rollup. The Health Service is in a critical state due to the critical state of the MonitoringHost Private Bytes Threshold (abbreviated Mon Priv) unit monitor. Progressing upward, the application state is rolled up along with the hardware, OS, and computer states to the performance component of the object.

The critical state is propagated to the application component of the performance monitor. Finally at the top of the health model, an aggregate monitor rolls up the performance, availability, security, and configuration monitors. The root entity, which is the server Hurricane itself, indicates the aggregated health state, which is critical.

Figure 3.15 shows the Health Explorer for the computer in the state illustrated in Figure 3.14. If you noticed the critical state of the computer in the Monitoring pane of the Operations console, you would probably open the Health Explorer for the computer, which allows you to understand quickly what is wrong. By comparing the structure of the Health Explorer in Figure 3.15 with the SDK and health model layers presented in Figure 3.14, you can match up the same critical health icons in the health model and the Health Explorer.

Figure 3.14

Expanded view of the health model for the OpsMgr Health Service.

Figure 3.15Figure 3.14.

Health Explorer screenshot of the health model detailed in

Workflow

It is accurate to describe Operations Manager 2007 at its core as being a giant workflow engine. In fact, monitoring in OpsMgr is based around the concept of workflows. An Operations Manager agent and server will run many workflows simultaneously in order to discover and monitor applications, devices, and services.

Module Types

Module types are the building blocks of Operations Manager workflows. Workflows are defined in management packs and then distributed to managed computers. Workflows can do many things, including collecting information and storing data in the Operations database or data warehouse, running timed scripts, creating alerts, and running on-demand tasks. Workflows are defined using modules, and modules are defined to be of a particular type known as a module type. Four different module types can be defined: data source, probe action, condition detection, and write action. Figure 3.16 illustrates these module types.

Figure 3.16

Workflow in OpsMgr is performed through four specific module types.

In the "Architectural Overview" section of this chapter, we compared the management group and management pack to macro and micro views that answer the question "How does OpsMgr do it"? In this section, we are going sub-micro! At the programmatic level, these are the terms and data flow structures used internally by the OpsMgr services:

Related:
1 2 3 4 5 6 Page 4
Page 4 of 6
SD-WAN buyers guide: Key questions to ask vendors (and yourself)