Sign up to receive this and other networking newsletters in your inbox.
I recently spoke with an IT executive at one of the most popular Internet destinations for information searching on the Web, Google.com.
Jim Reese, chief operations engineer at Google, had some interesting and valuable insight to provide on how to make a Web site that handles a high volume of traffic function reliably and efficiently.
At Google, which has about 10,000 servers, network managers use a homegrown suite of tools that provide automated updates and routine maintenance to servers. In other words, IT folks don't have to take on the seemingly impossible task of hands-on management. Automation tools not only save time - and of course, labor costs - but they also address human error.
Reese says Web site managers should find ways to let computers do what they are best at - repetitive tasks. To bring home the lesson, he points out that a single line command by a network manager at Google's network operations center can send out an update to about 10,000 servers at four data centers.
" We wrote these tools based on simple Shell and Perl scripts and Python - all freely available, " Reese says. He adds that the company opted for open-source code and homegrown applications for server management after looking at a number of off-the-shelf products. " All of them failed in scaling to the number of servers we have. "
A few years ago, Google was attending to only a few thousand servers; it has grown that server farm five times over since. That point being made, Reese notes that network managers should seriously consider how well the tools they choose to buy or build will scale to meet their future needs.
Google has also chosen to run its own customized version of Red Hat Linux instead of Windows or Unix. Additionally, says Reese, the company has chosen to use generic " White Box " servers instead of brand name servers from a Tier 1 vendor.
Reese says the ability to run many boxes that it configures itself with industry-standard parts has a few advantages. Because the servers are relatively cheap, the company can run many of them and replace them easily. The vast number of servers protects against single points of failure on the network. The site could lose hundreds of servers with little, or no effect on site performance.
VBrick adds Windows Media support
Network World, 09/10/01