Toward a resilient DNS, Part 1

* Why Domain Name Systems should be resilient

On 9/11, many New York businesses disappeared from the Internet because their DNS services were fragile. This and similar fragility problems sparked concern about the robustness of network infrastructure. Advanced technologies have often been in the forefront, including fault-tolerant computing, failsafe systems, and nonstop operations. Discussions along these lines focus on making infrastructure robust, meaning hard to damage.

Although robustness is important, perhaps “resilience” - the ability to accept distortion under stress while continuing to support load - is a more fitting description of the most crucial aspects of planning for damage contingencies than robustness (which implies a philosophy of preventing distortion or shearing and subsequently failing under stress).

When an event occurs, the mission is maintaining ongoing operation without apparent interruption. Continuation of operations and containment of damage are the philosophical, policy, and strategic goals, preferably with no perceptible user impairment. As I noted in Chapters 21 and 22 of the “Computer Security Handbook, 4th Edition,” the goal is to avoid disruption of operations.

When managing the response to an event, user-reported difficulties indicate incomplete or insufficient resilience. The first reports of infrastructure problems should come from internal monitoring systems, not a flurry of telephone calls from users. This is particularly true in electronic commerce applications, where the majority of users are outsiders, likely to defect to other providers or suppliers and with a justifiable tendency towards going to some other organization, rather than reporting a problem and working with an organization to fix it. In some situations, the first indication of a problem may be a sudden, inexplicable drop in page views or customer transactions.

The Internet DNS is responsible for providing the translation between Internet names (e.g., and the IP addresses associated with the names. If a name cannot be translated into an IP address, the site cannot be accessed without knowing the exact IP address.

In the case of DNS, the most publicized serious concerns revolve around the root name servers, which are admittedly a government and large-scale carrier concern - that is, outside the scope and authority of virtually all Internet users. Less well publicized however, are issues at the enterprise level. Specifically, the organization and provisioning of the name servers for an enterprise’s domains are well within the control of the individual enterprise, and are often neglected.

One of the most common misconceptions is that your organization’s DNS resolution is the responsibility of your ISP. However, although almost every ISP provides DNS services for its customers, the degree of flexibility, resilience, and transparency varies greatly. Some ISPs will act as authoritative secondary name servers, downloading the actual DNS zones from a user-maintained DNS server; some will not. Some ISPs will provide inverse DNS services on the same basis, under RFCs 1034 and 2317, with the master data being provided by the user; some will not. Some ISPs have DNS servers at multiple sites directly connected to different backbone providers to provide resilience; some do not. And finally, the degree to which these issues are visible to the customer varies, as do the consequences for an ISP failing to provide contractually required (or for that matter, advertised) degrees of resilience. The old parachute packer’s joke applies: “The parachute has a money back guarantee; if it fails, bring it back.”

In the end, the resilience of an organization’s domains devolves to the steps that the organization is willing to undertake to ensure that its domain data remain available to the Internet. This assurance takes several forms:

* Multiple levels of (at least semi-independent) DNS servers.

* Monitoring to ensure that DNS results are available to the world.

* Geographic diversity of DNS servers.

* Routing diversity of DNS servers.

* Carrier diversity of DNS servers.

* Sufficient TTL (Time to Live) to ensure adequate reaction time in the event of a problem.

* * *

Next time: Practical advice on keeping your DNS services running.

Robert Gezelter, CDP, Software Consultant , guest lecturer and technical facilitator, has more than 25 years of international consulting experience in private and public sectors. Gezelter is a regular guest speaker at technical conferences worldwide such as HPETS (formerly DECUS).

Among his published work are articles appearing in Network World, Open Systems Today, Digital Systems Journal, Digital News, and Hardcopy. He is also a contributor to the Computer Security Handbook , 4th Edition, Wiley, 2002.

Learn more about this topic

Network Associates buys two start-ups

Network World, 04/07/03

Firm bullish on Web services

Network World, 04/07/03

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.
Must read: 10 new UI features coming to Windows 10