• United States
by Cheryl Currid

Get reacquainted with life in the trenches

Jul 07, 20035 mins
Data Center

* Feeling your team's pain

Here’s a self-improvement project for tech managers: Make a date with disaster. Ask your technical staff to invite you to the next emergency, say when the routers or servers go down, or when an administrator-privileged employee quits and they need to bolt down the system. Or, if disaster never strikes, then ask someone else in your industry for the invitation. I assure you the field trip will be worthwhile.

When you get there, don’t plan on managing, plan on working. Ask to be assigned so you can work shoulder to shoulder with technical people. Believe me, if the staff is smart enough to do this right, you will feel their pain. Nothing replaces the white-knuckle experience of saving a mission critical server. But if you haven’t done it lately, I bet you’ve forgotten the pain.

I had the fortunate/unfortunate chance to follow my own advice recently. And while the details are not important, imagine walking into a server room with the main e-mail server sputtering, spewing out error messages, and finally going down. The key person in charge of the system resigned abruptly.

Then picture looking for the system documentation only to find that old and incorrect records remained. Nothing defined how everything worked. It would be necessary to unravel the spaghetti and carefully reconstruct it.

In the mean time, the e-mail service – probably the most mission critical of all services for the company – would need to stay down for a while. How long? No one knew.

What do you, as a manager do? Aside from blaming you poor management skills that made you unaware of the personnel and documentation problem, it’s time to sit down and focus.

In my case, I started from the error logs and worked from there. Frankly I had little experience with the underside of Windows 2000 Advanced Server, or Exchange. But, I did know a lot about networking and the e-mail process. I had a wonderful and willing person working with me and together we learned more in 48 hours than any course could teach in a semester.

Now to make this story short, I’ll bypass the error-by-error account. The good news is that everything – knock-on-wood – is up and working. The system will need several more upgrades, but these can be planned.

The lessons learned were technical and managerial. They include:

*You must inspect so people will respect. It’s an old line, but still true.

*Keep on top of major updates and service patches. In my case the primary e-mail server had many small patches applied but none of the larger, more intensive, service packs. This made it difficult to understand errors when any parameter of the system was changed. For us, applying service packs fixed a lot of problems all at one time.

*Subscribe to automatic upgrade programs. Then inspect how upgrades are maintained. In this case the MSDN subscription disks were nowhere to be found – it took 162M bytes of downloads to apply the service packs in this emergency.

*Double check each error with documentation. I was pleasantly surprised that Microsoft’s operating system was well-documented and easy to access online.

*Unravel the spaghetti. Make sure the cable plant is correctly documented. Remember, the devil lies in the detail – and the cabling closet. When you attempt anything new, you can be sure that you or someone else doesn’t introduce new errors.

*Maintain hot-spares or spare parts. Uncanny as it seems, something else unrelated can break during the crisis. Keeping a spare router or server on hand pays back with dividends.

*Reward the volunteers. Some people will offer to help during a crisis. At this point, it is amazing how people will rise to the occasion and work on a whole different level. When it’s time to put on your manager’s cap again, it’s important to show volunteers sincere appreciation.

*After the crisis is over, meet with the staff and figure out what worked and what didn’t. This is a great time to fix things that needed fixing before the crisis.

If your date with disaster was people induced – then look for a new way to manage. Figure out what type of management techniques could have been used.

For example, it’s probably time to be more in tune to attitudes. Watch people who display a sudden change in attitude. Even if you have great relationship with someone, you never know when life or work challenges will send him or her over the brink.

Double-check the work of people when attitude changes occur. While a slip-up may be totally innocent, people can become overwhelmed by changes and it will affect their work.

Overcoming challenges that are caused by technology or people disaster should offer many lessons. More importantly, working the disaster du jour is a good way for managers to get touch with the issues of the technical staff.

Cheryl Currid is president of Currid & Company. You can write to her at mailto: