CPU utilization: When to start getting worried

Apr 10, 2003
A CPU that's utilized at an average of 50% is probably ideal, but higher spikes can be tolerated.

Support questions seem to run in cycles, and NetWare support questions are no different. One that seems to pop up in the NetWare support forums every spring (usually multiple times each spring) has to do with CPU utilization. This year was no exception.

Since NetWare 3, Novell has provided the MONITOR facility on the server console that will tell you, among lots of other things, the CPU Utilization percentage – a number ranging from 0 to 100. When it’s at 100, it means the CPU has no free cycles so cannot deal with any new input until the utilization drops. This can lead to dropped connections, hung servers and disaster. But a momentary rise to 100, as happens during large file copies or major compression/decompression activities, can be simply an irritant as network activity slows down.

A server whose CPU frequently “pins the meter” at 100% needs some serious maintenance, and anyone in that situation deserves all the help we can give them. Sometimes it’s simply a misconfiguration or a faulty driver but often there is a physical problem in either the disk channel or the network channel (even, possibly, a memory fault) that is causing the CPU to be overworked. I’ve no complaint about that.

What does raise my hackles, though, is seeing a note from some brand new network manager (usually someone who was the inventory manager the week before) all atwitter because his server’s CPU utilization was now “hovering” in the 25%-30% range, rather than its usual 8%-12% range. No one was reporting any problems, mind you. There were no symptoms of anything being wrong. But this 30% CPU utilization was causing anxious moments.

A quick look at the documentation shows that the CPU utilization number is “…relative to the amount of time the kernel spends in the idle loop process.” That means that there’s something tracking the time the CPU is in the idle loop process. If the server has been up for 10,000 seconds and it has been in the idle loop for 7,000 seconds, then the utilization (the 3,000 seconds NOT in the idle loop) is 30%.

What is the idle loop? It’s when the CPU is, in essence, sitting and twiddling its thumbs waiting for an event to occur. Thus, a CPU that got a 30% utilization spends more than twice that amount of time (70%) doing absolutely no work.

Suppose you spent 70% of your time in the “idle loop.” Not even surfing the ‘Net, clipping your nails or sipping coffee – just sitting in your chair waiting for an event to occur? Would your boss think there was a problem? You bet she would. But would she think you were overworked? I sincerely doubt it.

A CPU that’s utilized at an average of 50% is probably ideal, but higher spikes – even to 100% – can be tolerated. Under 30% and that server is good candidate for consolidation with another layabout server in your network.

Maintaining the network isn’t rocket science, you know. Most of it is simple logic and rational thinking. Of course if we all thought rationally we might not have chosen a career in IT.