A few days ago, we told you about Microsoft's surprising bid to join the petascale computing age. Windows HPC Server, it seems, was able to hit petaflop speeds on Japan's largest supercomputer, but the achievement was not recognized by the bi-annual Top 500 list because Linux performed better on the same machine.
All we knew at the time was that Tsubame 2.0, the HPC cluster at the Tokyo Institute of Technology, had tested the machine's speed with both Windows and Linux, with Linux coming out ahead because the Linux run was performed on a slightly larger number of nodes.
One reader who commented on the blog post joked that Tokyo officials "didn't have enough licenses to run [Windows] on that many."
But it turns out a software bug prevented the Windows HPC Server run from matching Linux's speed and ability to run across more nodes. The bug was not in Windows HPC Server itself but rather in a software package Microsoft designed to run the Top 500 benchmarking test.
Satoshi Matsuoka, professor at the Tokyo Institute of Technology, explained it to me today at the SC10 supercomputing conference in New Orleans, saying Linux's victory "was purely by chance."
Here's what happened. To submit scores to the Top 500 supercomputers list, cluster operators have to run the Linpack Benchmark, a software library designed to test a cluster's speed under extreme conditions.
It's like driving a Ferrari and "hitting the gas flat out for four hours," Matsuoka said.
Because Tsubame uses both Intel CPUs and Nvidia graphics processing units, Tokyo officials needed to run a custom implementation of the High-Performance Linpack Benchmark to take full advantage of the heterogeneity of the system. The Tokyo computer scientists wrote code for the Linux run themselves, and for the Windows run used Linpack code written by Microsoft employees.
While a full Linpack run takes a few hours, Tsubame's creators actually spent more than a week preparing and conducting the tests. The strategy is to start with small tests, and gradually ramp up, identifying problems that slow performance down as you go along.
"In actuality, it's an enormous effort," Matsuoka said. "Things break down. There's such a huge stress on the system. It's the sort of stress that this machine will never see in real production."
Ultimately, the Linux run was performed over 1,357 nodes, achieving speeds of 1.192 petaflops (one petaflop is equal to one thousand trillion calculations per second). This speed gave Tsubame the title of the world's fourth fastest supercomputer.
Windows was outperforming Linux at small workloads, and eventually hit 1.118 petaflops across just under 1,300 nodes, according to Matsuoka. But when a Windows run across 1,360 nodes was attempted, the Linpack software designed for the Windows run failed due to a memory initialization bug.
Microsoft has since fixed the bug, but it was enough to derail the Windows bid to top Linux.
"There was a small bug in the Windows code that basically did not let them complete their final run," Matsuoka said. "And we ran out of time. We had to use their second best number, which turned out to be slightly lower than Linux."
Whether Windows would have beaten Linux if not for the software bug is "a mystery that's engulfed in history, because they failed at the very last moment," he says.
Matsuoka is interested in why Windows was able to outperform Linux in running smaller problems. Since the hardware was the same for both runs, it must come down to either the operating system or differences between the customized Linpack software packages.
"We haven't had the time to do the side-by-side comparison," Matsuoka says. "We'll probably do that and publish a paper."
Tsubame is a remarkably energy efficient, general-purpose supercomputer with about 2,000 users in academic and industry research circles. Because Tsubame uses a KVM hypervisor and various cloud-like provisioning tools, it can run both Windows and Linux at the same time on different nodes, and offer users various types of processing configurations.
"We're very flexible," Matsuoka says. "We can switch certain subsets of nodes to Windows from Linux and vice versa." Running both operating systems at the same time is possible "because we run virtual machines on some of the nodes."
Naturally, Matsuoka's user base demands Linux more often than Windows. A little more than 80% of the machine's time is devoted to Linux, specifically Novell SUSE Linux 11, he says, and under 20% to Windows.
"Of course, we get more demand for Linux," Matsuoka says. "But we do get Windows demand too. Because we can do dynamic provisioning we will size our Linux vs. Windows accordingly to demand and load."
"This might be the first time this has been done at this scale," he adds, referring to the Windows/Linux flexibility.
Although most people in the supercomputing crowd might scoff at Windows, which accounts for only five of the Top 500 HPC clusters, Matsuoka says there seems to be little difference in performance. It should be noted that Microsoft has helped fund the Tokyo Institute of Technology's supercomputing programs.
"I was very curious to see which one would be superior, both in terms of the [Linpack] algorithm, and the underlying operating system," Matsuoka said. "It was very surprising, because they were very similar in performance."