CCIE R/S Versus CCNA/CCNP Troubleshooting

Breadth Matters More as You Go Higher in the Cert Pyramid

It's a long story, but I started out planning to blog about a book I used to prep for the CCIE R/S lab troubleshooting section. However, I think there's a broader topic that grows out of what I experienced in the lab. Today I'd like to offer a few thoughts that crystallized around this theme: the CCIE R/S Troubleshooting tasks require such a broad view of everything in the lab blueprint that it may require a different approach than the CCNA and CCNP t'shooting questions. As a result, your preparation, and the tools you use, may need to be a little different. Today I'll describe this one book that I tested when taking the CCIE R/S lab, and discuss why I think this new section requires a little different plan of attack than do the CCNA and CCNP t'shooting topics. At the end, I'll offer a survey to see how you think you'd approach the t'shooting topics on lab day.

So, first, the background. If you missed the last two posts (here and here), I took the CCIE R/S lab Beta a few weeks back. I knew that I didn't have time to prep, other than maybe 2 hours of time in airports and on the plane. So, rather than just go and kick the tires in the lab, I decided to test a theory that had been percolating ever since I heard they were adding specific troubleshooting section back to the lab.  I've been a fan of the book "Troubleshooting IP Routing Protocols" from Cisco Press for a while, but I only used it in small intervals. My theory was that it was the perfect book from which to do final prep for the routing protocol topics in the lab's troubleshooting section. I wanted to test the book's use as a final prep tool, hoping it would help me confirm what I knew, brush up on what I'd forgotten, and learn stuff I'd never come across - and do so relatively quickly as a tool used the last week before the test.

To be fair to the authors, I don't think that this book is billed as a "cram" or fill-in-the-holes book for CCIE lab study, but basically, that's how I used it. I already thought it was an excellent book for explaining lots of issues that arise with routing protocols.

The short version is that the book was fantastic for my purposes. First, the Table of Contents (TOC) is detailed. It lists problems - lots of problems - including the symptom and the root cause, right there in the TOC. With enough info in the TOC, I could read and quickly know which I already knew, which I thought there might be some small point to learn, and which might be completely new (or forgotten). Then I could flip over to read more about the one that were interesting, and solidify my skills. And I did learn some new stuff!

As a brief aside, here's one I didn't know: that in an NSSA area, only one ABR (the one with the highest RID) will convert the Type 7 LSA to a Type 5. The book showed that fact, and a scenario that then results in the intended area 0 routers not learning about the external routes injected into the NSSA area. As you can see from this PDF of one of the TOC pages, I thought the scenario was pretty cool.

So what? It's a good book, hoorah. Next, consider what happens on CCIE lab day. Let's say you're taking the CCIE R/S lab. You start the troubleshooting section, and you spend the first 10 minutes looking at the diagrams, reading all the troubleshooting tickets. With 110 minutes left, you begin the first of 11 tickets, figuring 10 minutes each. Do you start with a show run? Start with ping/traceroute? Do several show commands? Just one or two show commands, and then dive into show run? How do you get from general idea, to your mental list of related things to check, to finding all the missing/busted config, fixing it, and verifying that the problem is solved, in 10 minutes or less?

Next page over, I'll comment a bit on that game, both for this scenario with CCIE, and for CCNA/CCNP.

First opinion: most CCNA and CCNP level troubleshooting problems stay in the technology neighborhood mentioned in the question statement. To solve those problems, the faster/better strategy may well be to do a show run or two right away. Most people seem to begin troubleshooting with a "show run" within the first 2-3 commands. This strategy works well because the likely cause, at least on the exam, is a misconfiguration related to the main topic.

For example, a question might suggest that PC1 cannot ping PC2. A quick look at a router's IP routing table would reveal a lack of routes. A quick look at OSPF neighbors show a missing neighbor. But a quick show run on the two routers might have revealed a mismatch in the OSPF area number between two routers that should be neighbors, and it immediately shows what re-config is needed. So, starting with the show run may save a few commands, and speed up the process.

That's just my opinion, of course, but I think that these exams assess troubleshooting skills in the same general conceptual neighborhood as the problem statement.

Next opinion: CCIE has always focused on interactions between the topics within the blueprint. That same problem of "PC1 can't ping PC2" might require an examination of DHCP on one router, NAT on another, a GRE tunnel on two routers, 2 IGPs with redistribution, VLAN trunking, STP, private VLANs, etc etc. Yes, you need to master all the intricacies of the topics in the neighborhood - eg, that two OSPF neighbors must be in the same area on their common subnet - but also the impact of the other features on each other. For example, you may need to know that two routers in the same VLAN, in the same subnet, with OSPF enabled, but for which they do not form neighborships, may be impacted by the fact that they each sit in isolated ports in the same private VLAN. For a CCNP exam even, starting with an OSPF neighborship problem statement but then needing to solve it by re-configuring private VLANs may just be beyond what Cisco expects from CCNPs, but not so for CCIE.

Third opinion: The CCIE constraints may require you to start elsewhere besides "show run" to pass this part of the lab. Here's why: you basically get 10 minutes or so per ticket, and for the sake of argument, assume you have 4-6 devices. If by definition the root cause - aka the thing you have to reconfigure - could be anything, you couldn't just go look for a couple of items in the config - you might need to read at least half of the config. The problem listed in the problem statement may imply one topic area, but the root cause may be way out of the neighborhood - meaning the cause may be on a not-so-obvious device. And unlike with the configuration section of the lab, where you already know a lot about the underlying config, you're basically learning the layout of what should be working while you troubleshoot - more like you're a contractor showing up to save the day for a client, but first needing to do learn the basic layout. So, you're figuring what should be there for subnets, masks, VLANs, tunnels, routing protocol adjacencies, etc - and then figuring out what's wrong/missing.

So, on this third opinion, if you had more time, show run may work well. With the time constraints, I'd say you'd want to use other show commands to isolate the problem first, and then start looking at the configs. I'd argue that you might be better off doing 5-10 show commands to isolate the root cause. When I took the lab, I was doing 5-10 "show" commands before doing the first "show run" on most of the trouble tickets. Honestly, even on CCNA, I tend to do 3-4 other show commands before resorting to "show run", mainly because when I take CCNA, I'm mainly kicking the tires.

Enough of my opinion - let's get yours. Let me ask all of you what you do, or what you expect to do. The first option in the two surveys is that you read the problem statement, guess the best device to start with, and do a "show run" there. The 2nd option is to first do a couple of other show commands to decide on which device to start with your "show run" commands. The third option is like the 2nd, but using ping/traceroute to find the right device. And the last is that you do more than just picking the best place for a "show run" by doing show commands on the devices likely to have the problem, to further isolate the topic area, before looking at the configuration. Also, please comment on what's worked well with you, and what didn't. And as always, don't mention specific issues you saw on this exam or the other. Thanks ya'll...


Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.

Copyright © 2009 IDG Communications, Inc.