CCIE R/S Lab Troubleshooting: Initially Shocked, but Wasn't Too Bad

2 hours; 10-12 trouble tickets; but 30-40ish devices – Yikes!

So I'm sitting at the CCIE R/S lab Beta, and beginning the part I was sooooo looking forward to - the new 2-hour troubleshooting section. 2 minutes into that section, I was completely bummed. It seemed too much to even attempt. 40-ish routers and switches... 10-12 trouble tickets... 2 hours... plus a new user interface. But by the end of the process, I had completely reversed my opinion - I thought it was completely reasonable. Today, I'll describe the process and draw a few conclusions about the t'shooting section.

First, a little context is in order. The day began with a brief talk w/ the proctor, and a look around the parts of the user interface you could see before starting the timer. Then, it was time to start a timer and then answer the open-ended questions. Cisco didn't require that the Beta candidates pass that part - they wanted the labs, process, interface, etc tested - so then it was time to click start to crank up the t'shooting part of the lab.

(I'm assuming you read my last post, which talks about the GUI; if not, go here.)

There was no physical lab book, but there was a window that you could display that lists the various t'shooting tasks that I would refer to as trouble tickets. Each task or ticket describes a problem, eg, PC1 can't ping PC2, intervening routers should be set up to do x, y, z, make it work, that kind of thing. I can't say how many I had, but I did get permission to say 10-12 tickets is typical. Just my opinion, but that number may change over time, particularly as they build more, some of which may take longer or shorter time.

The GUI actually worked pretty well for reading the tickets. The t'shooting tasks could be easily scrolled from the window that displays the tasks. So if I looked at my list, thought about doing numbers 3, 5, or 7 next, it was quick and easy to review each briefly. Because the tickets did not seem to be dependent on each other, it seemed a good strategy to do the easiest for me first, ignore those for which I had no current knowledge of the config, and get the tweeners in between. (Your tickets might be interdependent - they just weren't in my case.) Regardless, the GUI made it easy to navigate to the trouble ticket I wanted to tackle next.

The shock in the first few minutes was the unveiling of the topology figure. 30-to-40 or so routers/switches interconnected, lots of redundancy. Yikes! On top of that, I came to the lab with the typical strategy for doing the config section, planning to get a good handle on the topology before doing any of the exercises. However, a 4X bigger topology than a typical config topology made that difficult. Plus, the information in the figure was a bit sparser than the typical multiple view figures for the config section. So, going in with my "understand the topology" strategy was not good, particularly in light of the time constraints.

An interesting aside: the config section continues to be performed on real gear, sitting in a rack somewhere else (I think San Jose, not sure, but it wasn't next to us in Raleigh). But the big t'shooting topology was on a "virtual environment", to use the approved description. So they don't have tons of 30-to-40-device pods waiting on you. I don't know where those instances were hosted, but the response time was better on the t'shooting labs than in the config labs.

The individual tickets were clear and fair. Plus, they were not long - after reading through them all at once, on the 2nd time through, it took maybe 10 seconds to re-read, and it was time to get into the consoles. So, there was only a little time consumed to understand each issue.

For those tickets for which I knew the topic well - those for which the list of likely potential root causes leaped to mind - I took 7-8 minutes from re-reading the ticket to finding/fixing the problem. (Yep, anal retentive Wendell timed them all.) Those for which I knew the topic pretty well, I needed 12-ish minutes on average. I had no clue on 2 of them, but they were well within the scope of the lab blueprint. I think they were all solvable for a prepared candidate, but I thought that maybe... 1 less ticket would have been appropriate, especially with the learning curve on the GUI at that point. But I think if I had mastered all the topics in those tickets, 2 hours was enough to get them done.

The biggest issue with the t'shooting section was how I approached the task. If I took it again, I would not use the usual config section approach of reading everything twice, making diagrams, understanding the topology, and then actually starting to use the CLI. I think I'd use a "read once, and then react" strategy, and be into the CLI for the first ticket within 5 minutes of starting. Think real life, the phone rings, a problem has occurred, you listen for a few minutes, and start logging in. Because most of the tickets let me zero in on 3-5 devices to solve the problem, it was more like a bunch of separate labs with separate topologies, rather than the more comprehensive config section. (I was told the fact that my tickets were focused to a small area of the topology doesn't mean that all will be, so be warned.) Regardless, if I had to take it again, my strategy would be as follows:

  • Read all tickets once, enough to find the main topics, and make a checklist in your notes
  • Pick the easiest, and start picking them off; (Order wasn't important in what I saw)
  • For each ticket:
  • Create a mental checklist of things to check while reading the ticket
  • No style points: the solution is to change the config, so "show run" is probably needed within the first 3-4 commands
  • Personal t'shooting style plays a role
  • I'd make a list of the devices I'm working on for each ticket, in case a later ticket also uses those devices; I could've shaved a few minutes by being more aware of where I had been before.
  • If you plan to spend time later reviewing/confirming, assuming you finish faster than 2 hours, take notes on what you configured on each device, on each ticket

Let me know your thoughts. Next time, I'll wrap up on CCIE for now by discussing a few prep tips for the t'shooting section. Also, here's a few related links from this spring:

T'shooting returns, October 18th

CCIE R&S Written Changes

How to Prep for CCIE R/S Troubleshooting?

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.

Copyright © 2009 IDG Communications, Inc.