- Silicon Valley's 19 Coolest Places to Work
- Is Windows 8 Development Worth the Trouble?
- 8 Books Every IT Leader Should Read This Year
- 10 Hot Hadoop Startups to Watch
Network World - A system crash: If you're lucky, it only ruins your day. More than likely, you're in for several bad days followed by a few stressful weeks or months. After all, systems rarely fail only once. Rather, they keep crashing until you find the cause and fix the problem.
This primer will show you how to solve problems quickly. Using a tool that costs nothing, you can solve approximately 50% of Windows server and workstation crashes in a few minutes. The tool is WinDbg , the free Windows debugger.
You've probably never used the debugger, don't have it and don't want it. After all, it's a developer's tool, not an administrator's, right? Yes, but what you need to know is remarkably easy to learn, and even a rudimentary familiarity with the debugger could enhance your skills and your resume.
Still hesitant? Think about this: After rebooting a crashed machine, we've brought up the debugger, opened a memory dump file, given the debugger a single command, and learned not only that the cause was a driver, but also the driver's name — all in less than a minute. Granted, the debugger was installed and configured, we knew what commands to use and what to look for.
But so will you by the end of this article.
To date, Windows has been used most commonly on the x86 processor. The x86 implements a protection mechanism that lets multiple programs run simultaneously without stepping on each other's toes. This protection comes in four levels of privilege or access to system memory and hardware. Two of these levels are commonly referred to as kernel mode and user mode.
Kernel mode is the most privileged state of the x86. Both the Windows OS and drivers are considered trusted, and, therefore, run in kernel mode. This ensures unfettered access to system resources and the ability to maximize performance. Other software is assigned to user mode, the least-privileged state of the x86, restricting direct access to much of the system. Applications, such as Microsoft Word, run in user mode to guard against applications corrupting system-level software and each other.
Although kernel-mode software is protected from applications running in user mode, it is not protected from other kernel-mode software. For example, if a driver erroneously accesses a portion of memory that is being used by other software (or not specifically marked as accessible to drivers), Windows stops the entire system. This is called a bug check or a crash, and Windows displays the popularly known Blue Screen of Death (BSOD). About 95% of Windows system crashes are caused by buggy software (or buggy device drivers), almost all of which come from third-party vendors. The remaining 5% is due to malfunctioning hardware devices, which often prompt crashes by corrupting memory contents.
Another little-known fact is that most crashes are repeat crashes. Few administrators can resolve system crashes immediately. As a result, they typically happen again and again. It's common to see weeks and months pass before the answer is found. By solving a crash immediately after the first occurrence, you can prevent time-consuming and costly repeat crashes.
We'll focus on solving crashes under Windows 2000, XP and Server 2003. The process is identical for Windows servers and desktops. With respect to the debugging and interpretation process, this information applies with remarkably little differences to other operating systems, such as Linux, Unix and NetWare.
To resolve system crashes using WinDbg, you need the following:
The memory dump is a snapshot of what the system had in memory when it crashed. Few things are more cryptic than a dump file at first glance. Yet it is the best place to go for information on a crash. You can try to get this data in other ways - a user or administrator may remember what the system was doing when it crashed, or that they installed a new hardware device recently, in which case you can check related drivers or hardware - but they could also forget, providing incomplete or inaccurate information.
Windows Server 2003, 2000 and XP create three types of memory dump files:
Small or mini dump : A mini dump is a tiny 64K-byte file. One reason it's so small is that it doesn't contain any of the binary or executable files that are in memory at the time of a system crash. The .exes are needed for full and proper crash analysis, therefore, mini dumps are of limited value without them. However, if you are debugging on the machine that created the dump file, the debugger can find them in the System Root folders, unless they were changed by a system update (we'll provide a workaround for this later). XP and Server 2003 produce mini dumps by default, one for each crash event, as well as a full dump file. While it saves all mini dumps, the system only saves the most recent full dump. Windows 2000 can save mini dumps, but by default it is set to save only a full dump.
Kernel dump : This is equal to the amount of RAM occupied by the operating system's kernel. For an XP PC with 512M bytes of RAM, this is usually around 60M bytes, but it can vary. For most purposes, this crash dump is the most useful. It is significantly smaller than the full memory dump, but it only omits those portions of memory that are unlikely to have been involved in the crash.
Complete or full dump : This is equal to the amount of RAM in the box. Therefore, a machine with 512M bytes of RAM creates a 512M-byte dump file (plus a little). While a full dump contains all possible data and executables the memory has to offer, its sheer size can make it awkward to save or transfer to another machine for debugging. Windows 2000 produces a full dump by default.
Because XP and 2003 are set up to save a mini dump for every crash event, there should be a mini dump file for every crash the machine has had since it was turned on. This data can be extremely valuable, giving you a rich history to inspect.
To resolve system crashes through the inspection of memory dumps, set your servers and PCs to automatically save them with these steps:
While still in the Start up and Recovery dialog box, ensure that the following options are checked in the System failure section:
In the Write debugging information, you have the option to save only the most recent dump file or to have the system rename the existing dump file before it creates a new one. We prefer saving the dump files because previous dump files may provide additional or different information - however, space can be an issue, so set this option according to your needs.
The Write debugging information section also tells you where the dump file will be created. On XP and 2003 systems, mini dumps are located at %SystemRoot%\Minidump, or c:\Windows\Minidump; kernel and full dumps are located at %SystemRoot%\MEMORY.DMP or c:\Windows\MEMORY.DMP. For Windows 2000, memory dump files are located at c:\winnt\memory.dmp.
If you don't have a dump file on your machine, you can get one from another system or download one here. This kernel dump is about 20M bytes zipped and 60M bytes extracted. It was created using a testing tool that generates a system crash.