Donald Becker started the Beowulf Parallel Workstation Project in 1993 at NASA Goddard Space Flight Center. The project's goal was to cheaply mimic the computing power of expensive mainframes and supercomputers with clusters of commodity hardware and free operating systems. The effort at NASA was named after the eighth-century Danish poem Beowulf, who slew mighty beasts - in the case at NASA, those beasts were supercomputers and mainframes.
This evolved into the Beowulf Project, which became popular among researchers for linking together Linux and FreeBSD machines to take on huge computational tasks. Linux-based clusters are now frequently seen in the list of the top 5 supercomputers in the world. Becker founded clustering software firm Scyld Computing, which was acquired by Penguin Computer. He is now Penguin Computer's CTO. I recently spoke with Becker about how far clustering has come.
Q: How have Linux clusters evolved?
A: The original Beowulf clusters had a full [operating system] install on each machine. It was as difficult to run as a set of workstations. If you were running a slightly older version of software on your desktop, or had different versions of software across machines, it was a disaster. Often you ended up with computation results that failed with no useful error messages.
The size of the clusters has also increased. Clusters of 15 to 20 machines were the norm when we started out, now you typically have clusters of 50 to 100 machines, up to over 1,000.
Q: How has cluster management improved?
A: Now we're moving to scalable computing - being able to dynamically scale what you're doing ... We're leveraging the software [Scyld] introduced four years ago and refining it ... The key idea is that instead of a fixed-sized cluster ... You now have [a] machine that has [a] master, where you can control multiple servers from a single point.
Q: Will clusters totally replace supercomputers?
A: We're not aiming to replace high-end supercomputers. There are still applications that require those types of machines. But for 80% to 90% of the [computer-intensive] applications out there, you can do [the] same work on [clusters of] PC class machines for far less money.
Q: What are the things clusters can't do?
A: Certain math problems need a single machine, such as weather simulations, or other applications where there is a lot of interactivity between points ... But that is evolving. Gravitational simulation is another example. These seemed to be things that worked well on a supercomputer and not on clusters ... In these experiments, you have lots of interactions among objects ... But this is evolving. [Researchers] have since managed an approach that is just as accurate with new algorithms [for these types of problems] that scale on clusters.
Q: What about in enterprise applications?
A: One challenging area to run clusters are transactional databases. But we've seen a lot of investment in the past few years, such as IBM DB2, and Oracle, allowing them to run on multiple [Linux] machines.
There is a broad range of applications in enterprises for clusters, from financial computing or for managing server farms. [Enterprises] are setting up a number of machines to tackle a single problem, and [also setting up] multiple clusters to emulate independent servers, but with a single management interface.