Back when Google was just a gleam

Longtime Google watchers already know this "in the beginning" stuff, but because history remains fresh to those reading of it for the first time, the Pingdom blog does a service today with its post: "Before Google became Google: The original setup at Stanford University." 

My own reaction? It's hard to believe it's been only 12 years.

From the Pingdom post:

The original Google platform (Backrub) at Stanford University was written in Java and Python and ran on the following hardware:

  • Sun Ultra II with dual 200 MHz processors and 256MB of RAM. This was the main machine for the original Backrub system.
  • 2 x 300 MHz Dual Pentium II Servers (donated by Intel) with 512MB of RAM and 9 x 9GB hard drives between the two. The main search ran on these.
  • F50 IBM RS/6000 (donated by IBM) with 4 processors, 512MB of RAM and 8 x 9GB hard drives.
  • Two additional boxes included 3 x 9GB hard drives and 6 x 4GB hard drives respectively (the original storage for Backrub). These were attached to the Sun Ultra II.
  • IBM disk expansion box with another 8 x 9GB hard drives (donated by IBM). Homemade disk box which contained 10 x 9GB SCSI hard drives.

Once you take a seat in the not-so-way-back machine it can be difficult to disembark. Check out an early version of "Backrub" after the name was changed to Google and you'll find some interesting statistics.

Current Status of Google:

Web Page Statistics

  • Number of Web Pages Fetched: 24 million
  • Number of Urls Seen: 76.5 million
  • Number of Email Addresses: 1.7 million
  • Number of 404's: 1.6 million

Storage Statistics

  • Total Size of Fetched Pages: 147.8 GB
  • Compressed Repository: 53.5 GB
  • Short Inverted Index: 4.1 GB
  • Full Inverted Index: 37.2 GB
  • Lexicon 293: MB

There's a featured section called "Known problems," number one of which was: "We have only crawled U.S. looking domains so as not to congest international links. This makes the search engine somewhat incomplete."

How quaint.

Stanford students Sergey Brin and Lawrence Page helpfully explain it all in a paper entitled: "The Anatomy of a Large-Scale Hypertextual Web Search Engine."

In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available at

To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from three years ago. This paper provides an in-depth description of our large-scale web search engine -- the first such detailed public description we know of to date.

Apart from the problems of scaling traditional search techniques to data of this magnitude, there are new technical challenges involved with using the additional information present in hypertext to produce better search results. This paper addresses this question of how to build a practical large-scale system which can exploit the additional information present in hypertext. Also we look at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.

Want more?

This 2005 Wired interview that gets into the origin of the name Backrub, which will fascinate all of the bibliometrics fans in the audience.

Google's site provides a milestones page.

And, of course, there's a "History of Google" page on Wikipedia.

Welcome regulars and passersby. Here are a few more recent Buzzblog items. And, if you'd like to receive Buzzblog via e-mail newsletter, here's where to sign up.

2009's 25 Geekiest 25th Anniversaries.

40% of geeks surveyed admit to working ... how many hours?

A few clean words from the creator of Cursebird.

Want to close your LendingTree account? Sorry, no can do.

Wikileaks releases 6,780 quasi-secret Congressional reports

Girl's 22,795 text messages in a month nothing to celebrate

Twitter limits searches on its site ... Why?

Court rules Kentucky does not own the Internet.

YouTube takes a page from xkcd.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.

Copyright © 2009 IDG Communications, Inc.