How to properly use a load balancer in Cisco's Identity Services Engine

Here are some guidelines for one of the most common problems among those running Cisco's Identity Services Engine.

No SNAT

Figure1 - No SNAT

Credit: Aaron Woland

So, this is my first blog post on here. Hope it goes well.

One of the most commonly asked questions of late is how to properly use a load-balancer with Cisco's Identity Services Engine. Here are some basic guidelines to use when configuring a Load Balancer for the ISE Policy Services Nodes (PSNs).

Understanding terms:

  • PSN: Policy Services Node. The PSN is the ISE persona that handles all of the radius requests and make the policy decisions. If you are using profiling, the PSN is also handling the profiling for you.
  • PAN: Policy Administration Node. The PAN is the ISE persona that handles all the database synchronization/replication, and provides the administrative GUI. This node must talk to the PSN directly, without going through NAT.
  • VIP: Virtual IP Address. This is the IP Address that Load Balancer listens on, and will redirect traffic destined to the VIP to the real IP Addresses of the servers in the Server Farm.
  • Server Farm: The Grouping of servers that will be load balanced when traffic is destined to the VIP.
  • Endpoint: The actual device accessing the network.
  • NAD: Network Access Device. The Access-Layer device (switch/wireless controller) that provides and enforces network access to the endpoint.
  • SNAT: Source Network Address Translation. Function of load balancers to hide the source IP address of the NAD, which allows the load-balancer to run "out of band.”
  • Server NAT: The reverse of Source NAT. This is hiding the IP Address of the actual ISE PSN when it initiates communication to the NAD for things like Change of Authorization (CoA), and replacing that IP Address with the VIP instead.

General Guidelines

When using a Load-Balancer (anyone's) you must ensure a few things.

  • Each PSN must be reachable by the PAN / MNT directly, without having to go through NAT (Routed mode LB, not NAT). No Source-NAT. This includes the Accounting messages, not just the Authentication ones.
    • This means the Load-Balancer must be in the direct path between the clients and the ISE PSNs.
    • Some organizations have used Policy Based Routing (PBR) to accomplish the path, without physically locating the Load-Balancer between the clients and the PSNs.
  • Endpoints (clients) must be able to reach each Policy Services Node Directly (not going through the VIP) for redirections/Centralized Web Authentication/Posture Assessments/Native Supplicant Provisioning, and more.
  • You may want to "hack" the certs to include the VIP FQDN in the SAN field (my next blog post should cover this trick).
  • Perform sticky (aka: persistence) based on Calling-Station-ID and Framed-IP-address.
  • VIP gets listed as the RADIUS server of each NAD for all 802.1X related AAA.
  • Dynamic-Authorization (CoA):
    • If you use Server NAT to replace the PSN IP address with the VIP Address for Change of Authorization, then you would use the VIP address as the Dynamic-Authorization (CoA) client.
    • Otherwise, use the real IP Address of the PSN, not the VIP.
  • The LoadBalancers get listed as NADs in ISE so their test authentications may be answered, to keep the probes alive.
  • ISE uses the Layer-3 Address to identify the NAD, not the NAS-IP-Address in the RADIUS packet. This is a big reason to avoid SNAT.

Failure Scenarios:

  • The VIP is the RADIUS Server, so if the entire VIP is down, then the NAD should fail over to the Secondary DataCenter VIP (listed as the secondary RADIUS server on the NAD).
  • Use probes on the Load-Balancers to ensure that RADIUS is responding, as well as HTTPS (at minimum).
    • LB Probes should send test RADIUS messages to each PSE periodically, to ensure that RADIUS is responding, not just look for open UDP ports.
    • LB Probe should also examine the response for HTTPS, not just look for the open port(s).
  • Use node-groups with the L2-adjacent PSN's behind the VIP.
    • If the session was in process and one of the PSN's in a node-group fails, then another member of the node-group will issue a CoA-reauth; forcing the session to begin again. 
    • At this point, the LB should have failed the dead PSN due to the probes configured in the LB; and so this new authentication request will reach the LB & be directed to a different PSN…

Why can't we use Source NAT (SNAT)?

One of the most common questions when load balancing is: "Why can't we use SNAT?" Source NAT is a fantastic thing for general Load-Balancing, but not with ISE. The reasons listed below pertain to ISE version 1.1.x, and may change with ISE 1.2+. Network Access Device (NAD) will be wrong:

Reason No. 1:

With SNAT, the source Network Access Device will show up in ISE as being the Load-Balancer, NOT the Network Access Device.

No SNAT Aaron Woland

Figure1 - No SNAT

ISE uses sessionized network authentication. This means ISE is tracking the session along with the NAD - so the NAD & ISE stay in-sync about the state and location of the endpoint. This session also gives ISE the NAD address to send Change of Authorizations to, as well as the location of the endpoint. 

  • The source NAD is used in many different ISE Policies, especially for location data.
  • If all nodes always appear to be coming from the Load-Balancer, instead of the NAD - how can we know the location of the endpoint?

Location is not nearly as big of a problem as the Change of Authorizations, which are essential to a successful deployment.  

  • ISE records the Layer-3 Address of the NAD from the Layer-3 headers. 
  • There is a RADIUS field known as NAS-IP-Address, which embeds the IP Address of the Network Device in the RADIUS Packet.
  • However, ISE does not currently use that field; therefore, the L3 IP Address of the NAD must be correct for Change of Authorization to be sent to the correct device.
  • If the NAD appears as the IP Address of the Load-Balancer, then ISE will send the Change of Authorization to the Load-Balancer - not the switch.

Reason No. 2: URL Redirection and Web Portals:

Next, ISE 1.1.x only has one interface that can be used for all functions. Yes, ISE can run RADIUS on any of ISE's four interfaces, but the Gigabit 0/0 interface is the ONLY interface for Management Traffic. Also, the fqdn of the Policy services node is embedded into the certificate for ISE 1.1.x; and that is what gets used for URL Redirection for WebAuth & Device Registration & Supplicant Provisioning, etc...

Redirection Aaron Woland

Figure2 - Redirection

So, when the URL Redirection occurs, the endpoints will need to talk to ISE Directly (not the VIP) - and reach the web portals. The Portals can ONLY exist on the Gigabit 0/0 Interface in 1.1.x. (This may change in a future version of ISE).  Routing Tables:

Reason No. 3:

Unless you add a static route to ISE for every NAD Subnet, ISE does not have the ability in 1.1.x to return traffic on a different subnet through a different Gateway, only its default Gateway. Therefore, the Load-Balancer MUST be the Default-Gateway for the ISE PSN's (or at least in the path).

Since the Load-balancer must be the default Gateway, then all Management Traffic is also flowing through the Load-Balancer, unless you physically locate the Policy Administrative Node (PAN) and Monitoring & Troubleshooting Node (MNT) behind the load-balancer as well (just don't include those in the ServerFarm).

I hope that helps.

This article is published as part of the IDG Contributor Network. Want to Join?

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.
Must read: Hidden Cause of Slow Internet and how to fix it
Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.