An already-working open source database project could let other web companies join Google for bragging rights as the owners of a thousand-node database cluster.
Event search firm Zvents is releasing a massively parallel database server, based on a published Google design, as an open source project. The new software, Hypertable, is designed to scale to 1000 nodes, all commodity PCs, said Doug Judd, principal search architect for Zvents, in a LinuxWorld.com podcast.
Moving the project from in-house to open source is a way for a relatively small company to get the infrastructure software it needs, Judd says. "We aren't in the database business. this is the kind of infrastructure that should be in open source. This is not company proprietary stuff," he says.
The current Hypertable version is a 0.9 alpha release, and has been tested on about 10 nodes so far, Judd says. But Yahoo developers have expressed in interest in "kicking the tires" and testing on more nodes. Yahoo developers are already involved in another way: Hypertable stores its data on a distributed filesystem, and the database developers are currently using the Apache Software Foundation's Hadoop, which Yahoo supports by employing lead Hadoop developer Doug Cutting and his team and with infrastructure.
The Google database design on which Hypertable is based, Bigtable, attracted a lot of developer buzz and a "Best Paper" award from the USENIX Association for "Bigtable: A Distributed Storage System for Structured Data" a 2006 publication from nine Google researchers including Fay Chang, Jeffrey Dean, and Sanjay Ghemawat. Google's Bigtable uses the company's in-house Google File System for storage.
The API for Hypertable is slightly different from Bigtable's, Judd says. Although it is not a full SQL database, it is more featureful than a simple key/value store such as Brad Fitzpatrick's memcached. Memcached is widely used along with a conventional SQL database in high-traffic web sites, to cache chunks of HTML and XML and save an application from having to query the main database.
Brian Aker, director of architecture for open source database supplier MySQL AB, says that he can see a development path that would bridge the gap from the Hypertable API to a full SQL database. In an email interview, he wrote, "Someone could turn this into a backend for MySQL without a lot of effort. You would gain an SQL interface by doing this." For Hypertable as is, Aker says he can see several applications. Besides log data, Hypertable could be useful for image and object servers, and for pre-rendering responses to Representational State Transfer (REST) queries produced by web applications.
REST, explained in one of last year's hot web development books, RESTful Web Services, is a design philosophy for web applications that exposes a web application as a large tree of URLs. Since a client could potentially request or post data to one of many URLs, each responsible for a small piece of information, Hypertable could be a useful way to scale the REST server to handle more traffic.
As a new software project, Hypertable is free to choose an all-star list of tools for development infrastructure. Judd says the project is using Git for version control, CMake as a cross-platform build tool, and Google's own Google Code for bug tracking.
This story, "Zvents releases open-source cluster database" was originally published by LinuxWorld-(US).