Troubleshooting IPv6 Networks and Systems

Techniques for finding problems in a dual-protocol environment

Whether your organization has deployed IPv6 or not, you may end up troubleshooting IPv6-related issues as other nodes on the Internet move to dual-protocol connectivity. We need to consider how the introduction of IPv6 will change the way we troubleshoot networks, now that we are operating in a dual-protocol world. This article will focus on troubleshooting dual-protocol applications running on dual-protocol servers over a dual-protocol network.

As IPv6 begins to be added to environments the techniques we have been using for decades to troubleshoot TCP/IP networks will need to adapt. When we add IPv6 to the network we will need to learn how to troubleshoot IPv6 connectivity issues. When we look back to the early IPv4 networks we used different troubleshooting techniques than we use today. Now IPv4 networks have had decades to mature and we seldom "blame the network" as much as we did in the 1990s. Now that IPv6 deployments are newer it is conceivable that we will again be hearing that "it's the network's fault".

Even if your organization has not yet deployed IPv6 you may have end-users, customers or business partners who have some form of IPv6 connectivity. Even though the fault of the problem may not be in your network, you may be the organization responsible for the service and you will have to troubleshoot these IPv6-enabled networks. Other complexities for end-to-end connectivity could be introduced by Large Scale NAT (LSN) or NAT64/DNS64 deployments used along the path to the remote endpoint.

Now that we have two protocols that could be used at the network-layer of the OSI model we will need to test each of them. The Internet Protocol (IP) has been referred to as the "narrow waist" of the TCP/IP protocol hourglass. The hourglass metaphor comes from the fact that IP can operate on many Layer-2 protocols (Ethernet, PPP, POS, Frame Relay, Fibre Channel, IEEE 802.15.4, IEEE 1394, Token Ring, FDDI, ARCnet, etc.) and IP can also support many different types of transport protocols (TCP, UDP, SCTP, DCCP, etc.) and even more numerous applications. We will need to exhaustively troubleshoot the network and systems to make sure that we have end-to-end IPv4 and/or IPv6 connectivity.

You might have seen this comedic troubleshooting flowchart. However, in order to be an effective network engineer you must have a solid network troubleshooting methodology. When you are troubleshooting networks, a common technique is to troubleshoot based on the layers of the OSI model. Sometimes you start from the physical layer and work your way up the protocol stack (but this can be time consuming). Sometimes you may start at the application layer and work your way down (applicable when you are troubleshooting an application-specific issue). Other times you may start at the network layer, do a quick ping, and based on the result decide to move up or down the layers. In a dual-protocol network you would need to do two pings, one for IPv4 and one for IPv6.

One of the first places to start your troubleshooting process is at the endpoints of the communication. We need to first validate that the end-nodes have the correct IP addresses and are operational on their local networks. We will need to verify that each system has IPv4 and/or IPv6 addresses and correct DNS resolvers. IP addresses could be statically configured (common practice for servers in a datacenter) or dynamically configured (common practice for end-users on access networks). In IPv4 networks we would be troubleshooting DHCP. However, in an IPv6-enabled network we need to be able to troubleshoot Stateless Address Autoconfiguration (SLAAC) and investigate the ICMPv6 Router Advertisement (RA) messages coming from the local first-hop router. Based on the information contained in the RA message, an end-node could use SLAAC, stateless DHCPv6 (router provides the DNS prefix and resolver information) or stateful DHCPv6. We must also be aware that WindowsXP and Mac OS X do not use DHCPv6 but they can use SLAAC and then locally configure their DNS server. Another option would be to use Dibbler (an open-source DHCPv6 client/server/relay).

Now that we have validated that the node has its IP addresses we need to validate that the host can ping its default router and can ping beyond that first hop. Often times the default router could have both a Aggregatable Global Unicast address and a Link-Local address. Our host may be configured with the Link-Local address that comes from the RA message. We can ping using Link-Local addresses as follows, depending on your operating system. When we ping a Link-Local address we must specify the interface that we would like to use to send this ICMPv6 echo request packet.

ping6 -I eth0 fe80::1 ping fe80::1%12 ping fe80::1%GigabitEthernet0/0

The next layer we want to troubleshoot involve the application mapping of human-recognizable fully-qualified domain names into IP addresses. We will need to validate IP connectivity to the DNS resolver and troubleshooting DNS lookups. We can use nslookup, dig, and the host command to validate the DNS queries for A and AAAA records as well as PTR records. We need to be cognizant of DNS servers that may communicate with IPv4-only or are dual-protocol. We also need to remember that WindowsXP, Windows Server2003 and Mac OS X only perform DNS lookups over IPv4 transport. It may also be useful to use Wireshark or tcpdump to view the DNS lookup packets. We will want to see how the client sends separate A and AAAA queries and follows RFC 4074.

Most dual-protocol operating systems will perform DNS queries for IPv4 and IPv6 records and will prefer to make a connection using IPv6 if at all possible. However, older versions of Mac OS X use the first returned DNS response to make the connection. If the A record response came back first then the connection would take place over IPv4, but if the AAAA record response came back first then the connection would take place over IPv6. Furthermore, various web browsers and other applications may not make connections over IPv6 even though the node has dual-protocol capability and is on an active dual-protocol network.

The next step in our troubleshooting methodology is to ensure bi-directional end-to-end connectivity with IPv4 and IPv6. This means that we will want to perform a ping and traceroute in both directions. We need to do these tests in both directions to see if there is any asymmetry in the communications path. We must also be aware of any IPv6-in-IPv4 tunnels that could exist along the path. There could be manually-configured tunnels, dynamically-configured tunnels (ISATAP, 6to4, Teredo) or translation (NAT-PT, NAT64/DNS64) occurring along the traffic path that could affect end-to-end connectivity. Tunnels could add to the latency and performance of the communications. We could also use pathping (e.g. pathping -6 2001:DB8:0DD:BA11::1) or JPerf to verify end-to-end performance.

The next phase of our troubleshooting will focus on IPv6-specific issues that we have not yet tested. IPv6-capable nodes follow a process of default address selection (RFC 3484). If there is something wrong with the prefix policy within the operating system it could cause mysterious behavior. This could affect either source address selection or destination address selection. On a Microsoft system we can use the "netsh interface ipv6 show prefixpolicies" command to view the policy table. On a BSD system we can use the ip6addrctl command and on a Solaris system we can use the ipaddrsel command to view the policy table.

Another thing we will need to test is the Neighbor Discovery Protocol (NDP). This is the IPv6 equivalent to IPv4's ARP. Because IPv6 doesn't use broadcast, the NDP ICMPv6 messages use multicast to map Layer-2 addresses (MAC Addresses) to IPv6 addresses. We can use ping to verify IPv6 connectivity to the other nodes on a LAN and then check the neighbor cache (like the IPv4 ARP cache). On a Windows host we can use the command "netsh interface ipv6 show neighbors". On a Linux system we can use the "ip neighbor show" command. On a BSD system the command is "ndp -a" and on a Solaris system the command "netstat -p -f inet6" will show you its neighbor cache. On both a Cisco router and a Juniper router, the command is "show ipv6 neighbors".

Another problem that could be encountered on dual-protocol networks is links with reduced Maximum Transmission Unit (MTU) size. This can happen if the IPv6 packets have encountered a tunnel and the tunnel overhead has reduced the MTU size. If the IPv6 packets are placed inside a 6in4 tunnel within IPv4 Protocol 41 packets then the MTU size will be reduced by 20 bytes (the IPv4 header size). Because IPv6 routers do not perform fragmentation it is required that the router drop the IPv6 packet and send back an ICMPv6 Packet-Too-Big message indicating the preferred MTU size. The IPv6-capable source must then perform Path MTU Discovery (PMTUD) and then fragment the packet into the proper size. Using ping with various packet sizes can reveal if there is an MTU size reduction along the traffic path. You can perform a "ping -l 1500 2001:DB8:DEAD:C0DE::1" and then verify the ICMPv6 packet too big response with the embedded ideal packet size.

Once we have verified solid end-to-end connectivity with both protocols, then one of the final things to test is end-to-end application protocol communication. It is conceivable that there is a stateful firewall between the nodes that is blocking some type of traffic. In order to test this we may want to generate some synthetic traffic and validate that it makes it between the two end-nodes. We could use a utility like netcat6 to create simulated traffic between the nodes using a specific port number. We could also use telnet or SSH (BTW, my favorite SSH client is SecureCRT). We could perform an NMAP scan of the destination host from the source. We could also use an IPv6-capable web browser and browse by IPv6 or IPv4 address.

As we begin to encounter more IPv6-enabled systems we will need to refine our troubleshooting skills to compensate for this added complexity. Even though dual-stack is the preferred transition technique, it is not a panacea. No one claimed that living in a dual-protocol world would be easy. During the lengthy period of time where systems will need to speak both the IPv4 and IPv6 language we will need to become bilingual and become fluent troubleshooting either protocol.


Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.

Copyright © 2011 IDG Communications, Inc.