A couple weeks ago I wrote a blog about how I can fix anything with a tunnel. In that blog I described a situation where using a campus LAN physical box design, instead of a triangle design, can lead to black hold routing because of OSPF summarization.
The other day I received the following e-mail:
----- Original Message ----
From: Dave [dave@somewhere.com]
To: mjmorris@yahoo.com
Sent: Thursday, April 3, 2008 11:00:16 PM
Subject: tunnel articleMorris, that was a very intersting article about tunnels. How you please go into a bit of detail on how a box design can cause that black hole situation you mentioned in OSPF. Always trying to learn.
Regards
Dave
So, I thought I'd expand on this topic a little. The rule can be summarized as "don't split your OSPF Area".
Let's say you have the following setup. Two core routers (CR) serving as OSPF ABRs. Area 51 is where the access switches (AS) are located. The physical connectivity is a box design, not a triangle design, because ports are expensive (let's assume they are 10 GIG). OSPF is configured to the access layer routers (switches) where the hosts are connected. The address range in Area 51 is 10.128.0.0/16 and Area 0 is 10.0.0.0/16. The CRs are configured to send and OSPF summary route (Type-3 LSA) of 10.128.0.0/16 into Area 0:
This works fine. Both ABRs have more specific routes to the access routers. So when packets arrive at either ABR from Area 0 following the summary 10.128.0.0/16 route, the CRs know where to send the packets to.
But, now assume there's an outage of one of the uplinks:
CR01, one of the ABRs, is still sending the summary 10.128.0.0/16 route into Area 0. So, packets from Area 0 ultimately destined to host 10.128.1.51 are still sent to CR01. But CR01 no longer has a more specific route to the user subnet on AS01 and AS02 - 10.128.1.0/24 - since the uplink is down. So, the packets are sent to Null0 and die on CR01. Area 51 is now split and you have a black hole. 50% of packets destined for the 10.128.1.0/24 subnet will be lost.
This is exactly why good network design recommends triangles, so problems like these do not arrive. With another uplink to CR01 from AS02 forming a triangle, CR01 would still have a more specific route to 10.128.1.0/24, avoiding the black hole. This problem is not just a campus LAN problem. I've seen it in WANs too, with dual circuits from a remote router to two ABRs. The ABRs are not linked inside the area and a circuit goes down splitting the area. We used to fix this all the time......with a tunnel.
This rule also applies to other routing protocols where summarization is configured. It something that many engineers miss when designing routing protocols.
More >From the Field blog entries:
What Goes Into a Written Network Architecture?
I Can Fix Anything With a Tunnel
No Love For Central Office Techs
How to Establish an Architecture Revision Process
Do You Have an Architecture Review Board?
Go to Cisco Subnet for more Cisco news, blogs, discussion forums, security alerts, book giveaways, and more.
Michael Morris is a communications engineering manager at a $3 billion high-tech company. His background is in enterprise WANs working with telcos, and developing large-scale routing designs. He has worked on networks at government and corporate organizations, including networks at two Fortune 10 companies. In his current role, he leads large-scale IT networking projects and develops and maintains architectural standards for data networks, storage area networks, IP Telephony, and security. Michael is a CCIE and has 11 years experience in networking and communications, including four years as a paratrooper in the U.S. Army. He has a bachelor's degree in MIS from the University at Buffalo. Recently, he was awarded the Network Professional Association® (NPA) Professional Excellence and Innovation Award for his work on network architecture, templates and enterprise MPLS design.
The opinions expressed in this Weblog are those of the writer and may not represent the opinions of Network World.
|
|
exchange routes between CR01 and CR02
Michael,
In your diagram why not also include a summary route of 10.128.0.0/16 between CR01 and CRO2? This could easily be done across your /30 (i.e. 10.0.0.0/30) links.
This way you wouldn't hit the null route and use the neighbor router CR02 which has the specific route. When the link comes back up CR01 would then use it's more specific route.
RE: exchange routes between CR01 and CR02
That is already done by OSPF automatically since the /30 between CR01 and CR02 is in Area 0.
Plus, that would not fix this problem. There is still a summary route probagated by CR01 into Area 0 even when the uplink to the access switch is down, resulting in a black hole.
Mike
RE: exchange routes between CR01 and CR02
Why does the route goes to Null? Wouldn't CR01 has multiple route entries in its routing table? One entry which is via itself to 10.128.1.0 and another entry via CR02 which would have higher metric?
Prior to the link failure,
Prior to the link failure, CR01 only had one specific route to 10.128.1.0/24 which was learned as an intra-area route from AS01-F2. Once the link to CR01 to AS01-F2 fails, there is no longer an intra-area link for CR01 to learn this route:
-CR01 won't learn the route from CR02 because there is no intra-area link between the ABRs. If CR02 were to advertise the 10.128.1.0/24 route to CR01, this would technically be a loop (CR01 would be sending packets destined to 10.128.1.0/24 back into area 0 across the link to CR02).
-CR01 won't learn the route from AS03-F2 because AS03-F2 doesn't have a ospf neighbor relationship with AS04-F2. The connection between AS03-F2 and AS04-F2 is only a L2 trunk. (The only router that told AS03-F2 about 10.128.1.0/24 before the link failure was CR01)
Post new comment