• United States

Sun’s T2000 lets UltraSPARC ride again

Dec 19, 20057 mins
Computers and PeripheralsData CenterServers

Clear Choice Test evaluates Sun Fire T2000, showing the UltraSPARC chip is ready to ride again.

The Sun Fire T2000 server with CoolThreads technology attempts to squeeze enormous throughput from a 2U-high, 385-watt server. Powering the T2000 is the long-announced and first-ever eight-core UltraSPARC chip, based on Sun’s Reduced Instruction Set Chip UltraSPARC T1 architecture. We found the T2000 unusual (in a good way), with performance in several profiles unmatched for the power and space it consumes.

How we did it

Archive of Network World tests

Subscribe to the Network Product Test Results newsletter

Surrounding the T1 CPU is an intensely packed server box, with three PCI-E and two PCI-X slots. Added to the profile are twin redundant power supplies (which passed our single supply-failure test) and four Gigabit Ethernet ports. Capped with 32GB of memory (a maximum configuration, with two 16GB DDR sticks in two slots), the T2000 is a powerhouse.

Sun spots

We found some unusual characteristics with the Sun Fire server efficiencies. Some of them, such as requiring the server to be started from a serial cable, are odd and seem like throwbacks to an older era (but also a Sun legacy). Other vendor platforms start easily from bootp, netboot, PxE boot and other Ethernet-based remote boot schemes. Because of the captive Solaris 10/UltraSPARC architecture, a friendlier or faster method of the installation load would have been welcome.

Other applications have been added to the preload configuration of Solaris, including the Java Enterprise System and precompiled versions of Apache and other OSS-focused applications. In testing, we found these largely convenient, although they lacked configuration scripts to fast-start such applications as Apache, TomCat and the Lightweight Directory Access Protocol components needed to start Java Enterprise System. But these are problems related more to Solaris than to the Sun Fire system – it’s just that they’re indelibly connected. Sun’s Advanced Lights Out Manager (ALOM) was initially used to start the machine, which is preloaded with Solaris 10. After an initial configuration, Solaris boots its media, internal or storage-area network (we used only internal), with the Solaris load. ALOM isn’t as clever as other operating systems’ loading methods or monitor applications, but it was nice to get to HyperVisor, the core controller set. Unfortunately, it was accessible via telnet, which gave us security concerns. We also had to cobble together an RJ-45-to-D-connector serial cable, but Sun says it will include one with the servers soon (our test unit was a preproduction model).

Core competence

The UltraSPARC T1 chip, the engine that runs the Sun Fire T2000, includes eight discrete cores within a single chip housing. Think of each core as an autonomous CPU, although all eight share a single floating-point unit (FPU, or floating-point math processor). The single FPU instance is unusual, but it reduces the heat of the chip stack. The seemingly missing FPUs are offset somewhat, because the UltraSPARC T1 can perform 64-bit integer math internally, unless an application specifies a floating point at compile time. In testing, we got good integer-math performance from the system.

Each core can maintain four autonomous threads, which act like processors within processors. The four threads and eight cores amount to 32 virtual-processor concurrent-task capabilities within the 2U server frame. But this isn’t quite like having 32 discrete CPUs in the same machine, because the threads vie with each other for core CPU resources.

Stalled threads (processes waiting on other things to happen, such as memory access or other dependent processes) are pushed to the side and repolled occasionally until the thread restarts. Once restarted, the thread is put into the round-robin queuing of the T1 at each core. This method of thread control plays well to Sun’s Zone virtualization process control (see the Solaris 10).

Each core connects to the others in a full crossbar architecture, which means there are no intercore communication contention latencies. This is useful when multiple cores are used by the same application – it doesn’t have a treatment whereby the cores and threads communicate monolithically; rather, they communicate directly.

Performance: One, eight or 32?

For performance testing we used a well-known (and often controversial) benchmark, LMBench3. We compared the T2000’s performance with a recently upgraded HP 585 server. We configured tests to run the server as a single unit (one copy instance tested), then as an eight-CPU machine (eight copies running), and as a fully virtualized server, with 32 processes running.

The HP 585 contains four dual-core AMD 64 Opteron processors and a floating point processor on each core, so we didn’t compare floating point results, because they can’t be equal (see “How we did it” for other comparison of inequality issues). In integer math, the results are nearly equal when adjusted for the almost half-as-fast CPU clock of the Sun Fire T2000, as an example. You can’t get four dual-core Opteron processors in the same space.

In our tests, the LMBench3 couldn’t execute a 32-CPU load set. Because we can’t modify the LMBench3 source code, we were limited to testing one, then eight instances of the test suite. Also, the AMD CPUs clock at more than double the speed of the Sun UltraSparc T1 processor, which showed another advantage of the AMD CPU: Its process-forking speed is nearly four times the speed of the Sun T1. Additionally, each Opteron CPU has an onboard FPU. We modified LMBench3 to remove network I/O tests so we could focus on memory movement, pipelining and processor exercise.

Without otherwise optimizing the benchmark, we found that overall, the eight-core scenario was about eight times as fast as a single-instance result. This means that the single CPU can do eight times the work of a single instance with very few slow-ups measured. The T2000’s CPU, however, is significantly slower than the four-CPU (dual-core), eight-core configuration represented by the HP 585 server running Novell/SuSE Linux 10 in 64-bit SMP mode for the eight CPUs.

At eight cores, performance was good, and if optimized to 32 threads (logical or virtual CPUs), performance might have been spectacular, but there is little way of measuring this correctly, and few applications that can take advantage of the discrete threads. The performance marks we did measure were very good – especially given the small form factor of the T2000.

In production

In testing, we downloaded, recompiled and used quite a few open source applications on the T2000 (as a convenience of Solaris 10), despite its unusual CPU arrangement.

Most applications we’ve seen don’t have specific optimizations available for either the UltraSPARC architecture or the multicore, multithreaded arrangement used by the T2000 (or any other CPU for that matter). Multiprocessor-poised code is exceedingly rare. Optimizations that take advantage of the 32-processor threads will range from fairly simple (for example, applications that already are optimized for UltraSPARC architectures) to the very difficult (for example, retrofitting existing or ported applications to take advantage of the processor configuration).

It also is possible to use Solaris zoning techniques to start and execute applications on specific parts of the T1 CPU inside the T2000. This lets administrators partition and aggregate applications via UltraSPARC CPU threading techniques, although this isn’t a simple process. However, this virtualization technique adds significantly to the management articulation that is possible in the Sun Fire architecture.

Sun says the T2000 is a strong performer at a low operational cost. We agree, the Sun Fire T2000 is a strong value to Sun shops, and those willing to rework code (a simple to very expensive endeavor) to take advantage of the advanced muscles that the UltraSPARC T1 CPU provides.

Henderson is principal researcher at ExtremeLabs in Indianapolis. He can be reached at Laszlo Szenes of ExtremeLabs contributed to the testing.

Henderson is also a member of the Network World Lab Alliance, a cooperative of the premier reviewers in the network industry, each bringing to bear years of practical experience on every review. For more Lab Alliance information, including what it takes to become a member, go to