How to use Anycast to provide high availability to a RADIUS server

Anycast for RADIUS Server High Availability

A brilliant solution for providing high availability in a small RADIUS server/ISE deployment

After months of issues, they have finally restored my access to my blog! After such a hiatus, it is my pleasure to bring this particular post. I'm certain many will find it at the very least cool in an "I'm a network geek" kind of a way, or even better: you will find it very educational and even leverage it in your own world. 

This is a solution I have been wanting to write about for a long time now, and let's be clear—it is not mine. This entire post is owed to a long-time personal friend of mine who is also one of the most talented and gifted technologists roaming the earth today. His name is Epaminondas Peter Karelis, CCIE #8068 (Pete).

Pete designed this particular high-availability solution for a small ISE deployment that had two data centers, as is crudely illustrated by me in the below figure. 

Figure 1 - Architecture Aaron T. Woland

The 2 DC architecture and IP SLA

I have often used Anycast in my Identity Service Engine (ISE) deployments. It's a terrific tool in the security toolbox to help ensure traffic goes to one place—the correct place, the closest place—and has a backup if that closer place is not available. However, this particular use of Anycast was something I never considered before.  

For those of you who may not be network heads, Anycast is a networking technique where the exact-same IP address exists in multiple places within the network. In this case, the same IP address (2.2.2.2) is assigned to the Gig1 interfaces on all of the RADIUS servers (ISE PSNs in our case). The router in each data center is configured with a static route to 2.2.2.2/32 with the Gig0 IP address of the PSN as the next hop.  Those static routes are redistributed into the routing protocol; in this case, EIGRP is used. Anycast relies on the routing protocols to ensure that traffic destined to the Anycast address (2.2.2.2) is sent to the closest instance of that IP address. 

Now that Anycast is setup to route 2.2.2.2 to the ISE PSN, Pete used EIGRP metrics to ensure that the preferred route pointed at the primary data center, while the route to the secondary data center is listed as the feasible successor (FS). With EIGRP, there is a sub-second delay when a route (known as the successor) is replaced with the backup route (known as the feasible successor). 

How do we make the successor route drop from the routing table when the ISE node goes down? Pete configured an IP service-level agreement (IP SLA) on the router that checked the status of the HTTP service on the ISE PSN in the data center every five seconds. If the HTTP service stops responding on the active PSN, then the route is removed and the feasible successor takes over, causing all the traffic for 2.2.2.2 to be sent to the PSN in the secondary data center. The below figure illustrates the IP SLA function. And when it occurs, the only route left in the routing table is to the router at the secondary data center.

Figure 2 - IP SLA at Work Aaron T. Woland

The IP SLA causing the routing table to change

All network devices are configured to use the Anycast address (2.2.2.2) as the only RADIUS server in their configuration. The RADIUS requests will always be sent to whichever ISE node is active.

Example 1 below shows the interface configuration on the ISE PSN. The Gig0 interface is the actual routable IP address of the PSN, while Gig1 is in a VLAN to nowhere using the Anycast IP address.

Example 1 —  ISE Interface Configuration

Interface gig 0

  !Actual  IP of Node
  ip address 1.1.1.1 255.255.255.0
interface gig 1
  !Anycast VIP assigned to all PSN nodes on G1
  ip address 2.2.2.2 255.255.255.255
ip default-gateway [Real Gateway for Gig0]
!note no static routes needed.

Example 2 shows the IP SLA configuration on the router, to test port 80 on the PSN every five seconds but to timeout after 1000 msec. When that timeout occurs, the router will be removed.

Example 2 — IP SLA Configuration

ip sla 1

  !Test TCP to port 80 to the actual IP of the node.
  !"control disable" is necessary, since you are connecting
  !to a host instead of an SLA responder
  tcp-connect 1.1.1.1 80 control disable
  ! Consider the SLA as down if response gt 1000msec
    threshold 1000
    ! Timeout after 1000 msec.
    timeout 1000
    !Test every 5 Seconds:
    frequency 5
ip sla schedule 1 life forever start-time now
track 1 ip sla 1
ip route 2.2.2.2 255.255.255.255 1.1.1.1 track 1

Example 3 shows the route redistribution configuration where the EIGRP metrics are applied. Pete was able to use the metrics that he chose specifically because he was very familiar his network. His warning to others attempting the same thing is to be familiar with your network or to test thoroughly when identifying the metrics that would work for you.

Example 3 — Route Redistribution

router eigrp [Autonomous System Number]
  redistribute static route-map STATIC-TO-EIGRP
route-map STATIC-TO-EIGRP permit 20
  match ip address prefix-list ISE_VIP
  !Set metrics correctly
  set metric 1000000 1 255 1 1500
ip prefix-list ISE_VIP seq 5 permit 2.2.2.2/32

Well, that's it! I hope you enjoyed this as much as I did seeing it go into production. As always, I look forward to reading your comments below.

This article is published as part of the IDG Contributor Network. Want to Join?

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.
Must read: Hidden Cause of Slow Internet and how to fix it
Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.