One of my favorite tools for tracking down network problems is a tool called PingPlotter published by Nessoft. I’ve written about PingPlotter several times (the last time in 2011, although it did also have a starring role in sorting out my problems with AT&T in 2012). In fact, this utility has helped me so many times it’s ridiculous. Here’s my summary of PingPlotter from my last roundup of my favorite tools:
Nessoft's PingPlotter … tests connectivity to one or more target hosts on your local network or on the Internet and plots the results. What PingPlotter is actually doing is repeatedly running a traceroute to identify all of the intermediate routers between your machine and the targets and testing each for how long it takes to respond. The tool also measures "packet jitter" (the variation in how quickly packets are handled), VoIP Mean Opinion Score or MOS (an estimation of perceived voice quality), and standard deviation of packet transit time. Pingplotter comes in freeware, Standard ($24.95) and Pro ($199.95) versions. Pingplotter … gets a rating of 5 out of 5.
I recently had cause to use PingPlotter again when devices on my network that configure their network connections via DHCP served up by my AT&T U-verse NVG-510 ADSL+ gateway started to slow down.
In the NVG-510’s log (which is a circular log that is way too short) I saw multiple occurrences of DNS failures; for example, from earlier today:
2014-07-10T22:21:28-07:00 L3 dnsmasq: no responses from nameserver '126.96.36.199'
2014-07-10T22:21:28-07:00 L3 dnsmasq: nameserver '188.8.131.52' is now responding
Quite often, these errors occur in clusters and the clusters can continue for minutes at a time, so it’s not surprising that devices on my network using DHCP configurations would exhibit poor connectivity.
I’ve previously theorized that the NVG-510’s implementation of the gateway’s DNS forwarder (which is based on the open source dnsmasq code) has a too-short timeout on failed queries. But why would a query fail?
I started up PingPlotter and found that both of the DNS servers that are defined and unchangeable in the gateway’s DHCP server (to wit, primary and secondary DNS servers at 184.108.40.206 and 220.127.116.11 respectively which, according to whatsmydns.net, don’t exist) are still, as I observed back in 2012, flakey. The problem now appears to be that they have become even flakier.
Here’s a sample of PingPlotter’s output from yesterday for AT&T’s default DNS servers:
The time between pings is 10 seconds and the narrowest red blocks indicate 1 lost packet. As you can see, there are frequent losses of packets for 20 or 30 seconds (the double and triple length blocks) or more. These clusters of lost packets are, themselves in regular clusters as can be seen from this trace over about three and half days:
Why the server’s responses should get routinely better once per day is pretty much a mystery for now.
Curiously, those aren't the only DNS servers AT&T has other primary and secondary DNS servers at 18.104.22.168 and 22.214.171.124 respectively. Tracing their behavior (the bottom two traces) as well as Google's DNS servers at 126.96.36.199 and 188.8.131.52 (the top two traces) alongside the default DNS servers (the middle two traces) shows that DNS servers can be reliably reached across the AT&T network:
My guess is that something is going on with AT&T's default DNS servers that isn't good, and I've got a request into the powers that be in an attempt to get an answer. I’ll let you know when I hear something. In the meantime, have you been experiencing problems caused by AT&T’s flaky DNS servers? If you have, please leave a comment below. I share your pain.