DHCP puzzle: Why does the pool of IP addresses freeze?

A dynamic host-configuration protocol issue leaves wireless users unable to connect, and with no permanent fix, the solution is an ongoing workaround.

Dynamic host-configuration protocol (DHCP) has a lot of benefits, including saving time by assigning IP addresses and other attributes to networked devices rather than IT pros having to do it manually.

Sometimes, though, problems arise that eat up time in a different way. This is one such case affecting Cisco Catalyst 6500 and 9600 Layer 3 chassis switches used as distribution switches for the network, with different groups of buildings linked to them.

DHCP is commonly deployed on a server to deliver IP addresses, subnet masks, default gateways, and DNS server information, but DHCP can also be deployed on switches and routers including Cisco’s, and that’s one method used in our network.

Specifically, the DHCP server capabilities of our Cisco switches are used to distribute IP addresses to devices on our wireless network as a means of segmenting wireless traffic from the wired infrastructure, which uses a separate DHCP server. An added benefit to using the DHCP support on switches, especially in smaller networks, is the cost savings that it can be realized by having the switch perform double duty versus purchasing a separate server for DHCP and having to manage it.

The Cisco 6500 had served the network in those capacities for many years, but as our organization moved toward adopting software-defined networking (SDN), it was time to upgrade to the 6500’s successor, the Cisco 9600, which supports automation and higher port speeds.

An issue with both switches was that the DHCP address pools would freeze. Devices trying to join the network could not because they did not receive IP addresses, which led to end users filing trouble tickets stating that the wireless network was down.

The switches can deliver information about the available DHCP IP addresses using the command “show ip dhcp pool”, which returns a display that looks like this:

Router# show ip dhcp pool 1

Pool 1:

 Utilization mark (high/low)    : 85 / 15

 Subnet size (first/next)       : 24 / 24 (autogrow)

 VRF name                       : abc

 Total addresses                : 28

 Leased addresses               : 11

 Pending event                  : none

 Current index        IP address range           Leased addresses

 10.1.1.12            10.1.1.1 - 10.1.1.14       11

 10.1.1.17            10.1.1.17 - 10.1.1.30      0

 Interface Ethernet0/0 address assignment

   10.1.1.1 255.255.255.248

   10.1.1.17 255.255.255.248 secondary

Over time, our network engineers noticed that when users had trouble getting IP addresses, the readout for the “show ip dhcp pool” command showed that the current index was 0.0.0.0 but also showed that there were still addresses available in the pool. That would look something like this:

Current index  IP address range                   Leased/Excluded/Total
0.0.0.0             172.30.52.97 - 172.30.53.128    0   /   7    / 30

The pools had worked well when they were set up about three years ago, then the problem cropped up periodically with no apparent triggering event. It wouldn’t affect all the switches at once, but rather individual switches sporadically throughout the network. During this period more buildings and areas were added to the wireless network, and more devices were connecting.

The problem was common enough that whenever the help desk reached out to us about trouble connecting to the wireless network, our go-to troubleshooting steps were to verify that the Wi-Fi access points were up and to check that the DHCP pool wasn’t frozen. If it was, we’d follow the advice of the Cisco Technical Assistance Center (TAC) and delete the DHCP pool and readd it. This would reset the pool, and DHCP would begin handing out addresses again. That would correct the problem for a while.

This workaround was performed remotely and would take a matter of seconds as we had copied and pasted the delete and readd commands into the switch in order to save time.

Many times we had to remove the pools and readd them on the 6500s, and we were looking forward to the replacement 9600s to see if they would resolve the issue. They did not.

The issue continued to occur, not every day, but every so often, usually reported as a network outage. Other organizations have posted similar issues that they solved using the same workaround.

Despite this bug, the DHCP does serve the goal of segmenting the wireless network and, even with the periodic need to readd the DHCP pools, is much more efficient than the alternative of making DHCP a manual process.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.
Related:

Copyright © 2022 IDG Communications, Inc.