The Internet Archive Wayback Machine, an indispensable chronicler of the Web for going on two decades now, late last week announced a major milestone.
From an Internet Archive blog post:
The Wayback Machine, a digital archive of the World Wide Web, has reached a landmark with 400 billion webpages indexed. This makes it possible to surf the web as it looked anytime from late 1996 up until a few hours ago.
The post lists a number of historical highlights, including:
- 2001 - The Wayback Machine is launched. Woo hoo.
- 2006 - Archive-It is launched, allowing libraries that subscribe to the service to create curated collections of valuable web content.
- March 25, 2009 - The Internet Archive and Sun Microsystems launch a new datacenter that stores the whole web archive and serves the Wayback Machine. This 3 Petabyte data center handled 500 requests per second from its home in a shipping container.
- October 26, 2012 - Internet Archive makes 80 terabytes of archived web crawl data from 2011 available for researchers, to explore how others might be able to interact with or learn from this content.
- October 2013 - New features for the Wayback Machine are launched, including the ability to see newly crawled content an hour after we get it, a "Save Page" feature so that anyone can archive a page on demand, and an effort to fix broken links on the web starting with WordPress.com and Wikipedia.org.
Not included in the timeline was mention of a fire on Nov. 6 of last year that did more than $600,000 to digitization equipment at the Internet Archive's scanning center in San Francisco.
The Wayback Machine has proven useful to me on a number of occasions, most memorably in assembling this collection of online news site images from Sept. 11, 2001, forever known as 9/11.
Four hundred billion is a lot of pages. In fact, the archive now serves up about 100 billion more pages than McDonald's has served hamburgers.
Welcome regulars and passersby. Here are a few more recent buzzblog items. And, if you’d like to receive Buzzblog via e-mail newsletter, here’s where to sign up. You can follow me on Twitter here and on Google+ here.
- Don't forget how close Microsoft came to losing Novell lawsuit.
- Catching up with the guy who live-blogged bin Laden raid.
- Band releases album as Linux kernel module
- Drone crashes into woman; operator blames 'hacker'
- 2014's 25 Geekiest 25th Anniversaries.
- Melissa virus turns 15 ... (age of the stripper still unknown).
- Snopes working overtime to debunk Flight 370 hoaxes.
- Journalists fall for phony "Facebook for Drunks" app.
- Teacher's online safety experiment takes trollish turn
- How Apple and Pepsi fumbled their 2004 Super Bowl ad.
- Electric car owner charged with "stealing' 5 cents worth of electricity.
- Geek-Themed Meme of the Week Archive.
- Judge orders patent troll to explain ‘Mr. Sham’ to jury
- Did you know Google could do this? I didn’t.