• United States
by Robert Gezelter

Toward a resilient DNS, Part 2

Apr 10, 20033 mins

* Tips for making sure your DNS is resilient

The most rudimentary rule for the continued functioning of an Internet site is that there should always be distinct primary and secondary DNS servers supporting the domain.

Production domains should have the minimum two name servers located in different locations. The answer to the common question “Can I use multihosting to get the two name servers I require for my domain?” is a resounding “NO.” The two-server rule (which is implemented with varying degrees of thoroughness by different domain registrars) ensures that there are at least two discrete sources for DNS data.

I have seen organizations circumvent their domain registrar’s safety checks by using two DNS names that resolved to the same address. However, a single cable fault isolates their single DNS server from the Internet. This easily results in a multihour outage at the service provider-located WWW servers whose DNS names were resolved by the now-unreachable DNS server. Switching to a different DNS server requires a change to the data loaded in the root name servers, which are updated on a less frequent basis, typically several hours (not including the propagation delay between an update made at a zone’s registrar and the root servers, depending upon the day of the week and the registrar). Therefore the disappearance of the DNS service was not correctable in a timely manner and the Web site remained down until the cable fault was repaired.

Production DNS servers should be geographically dispersed. A pair of workstations located next to each other and plugged into the same power strip is a fool’s dispersion; all but the most trivial incidents will render both servers unavailable. Achieving geographic diversity is neither difficult nor expensive. It does not require resorting to a DNS server provided by a separate hosting service or by an ISP (although a hosting- or ISP-provided DNS server is certainly a possible alternative). A field office or sister organization can easily provide the few cubic feet and kilobytes per hour – yes, per hour – required to domicile an alternate DNS server. The system can be managed remotely. Reciprocal arrangements between organizations (I will host a secondary on my name server if you host my secondary on yours) are even simpler. Providing a separate DSL circuit for the use of the alternate DNS server is much cheaper to an enterprise than losing its name-resolution services (i.e., effectively having one’s entire domain disconnected from the Internet).

If a site is a serious production site, with many concurrent users, more extensive monitoring is both justified and prudent. Each link of the chain connecting customers to the site should be monitored on some basis sufficient to alert the organization to a problem in a timely manner. In the case of DNS servers, regular verification that the name servers are online and responding properly is a prudent precaution.

Diversity of carriers, geographic location, and routing are important steps to ensuring that single-source errors (personnel accidents, natural or manmade disasters, or organizational errors) do not effectively terminate your domain’s DNS services and impair the overall Internet accessibility of domain members.

In summary, the analogy to a fabric or web is both simple and straightforward: an individual thread or moderate number of threads in a fabric may break, without compromising the ability of the fabric as a whole to perform its function. In addition, breaks in the fabric that can be detected without becoming apparent to customers can inherently be corrected without customer impact. Dispersion of functionality is far cheaper, and is far more resilient than attempts to harden facilities beyond the possibility of damage.