Skip Links

Big data analytics computing requires a 'maverick fabric' network

By Bob Fernander, CEO, Gnodal Ltd, special to Network World
September 25, 2012 04:14 PM ET

Network World - This vendor-written tech primer has been edited by Network World to eliminate product promotion, but readers should note it will likely favor the submitter's approach.

The high-performance computing (HPC) scientific/academic sector is accustomed to using commodity server and storage clusters to deliver massive processing power, but comparable large-scale cluster deployments are now found in the high-end enterprise as well.

Large Internet businesses, cloud computing suppliers, media and entertainment organizations, and high-frequency trading environments, for example, now run clusters that are on par and in some cases considerably larger than the top 100 clusters used in HPC.

CASE IN POINT: High-performance computing, the latest 'it' thing in the cloud

What differentiates the two environments is the type of networks allied to the application programming models and the problem sets used. In the scientific/academic sector, it is typical to use proprietary solutions to achieve the best performance in terms of latency and bandwidth, while sacrificing aspects of standardization that simplify support, manageability and closer integration with IT infrastructure. Within the enterprise the use of standards is paramount, and that means heavy reliance upon Ethernet. But plain old Ethernet won't cut it. What we need is a new approach, a new "maverick fabric."

Such a fabric should have a way to eliminate network congestion within a multi-switch Ethernet framework to free up available bandwidth in the network fabric. It also should significantly improve performance by negotiating load-balancing flows between switches with no performance hit and, use a "fairness" algorithm that prioritizes packets in the network and ensures that broadcast data or other large frame traffic, such as localized storage sub-systems, will not unfairly consume bandwidth.

Adaptive routing and loss-less switching

A fundamental problem with legacy Ethernet architecture is congestion, a byproduct of the very nature of conventional large-scale Ethernet switch architectures and also of Ethernet standards. Managing congestion within multi-tiered, standards-based networks is a key requirement to ensure high utilization of computational and storage capability. The inability to cope with typical network congestion causes:

• Fundamental collapses in network performance, with systems efficiency as low as 10%

• Networks that cannot scale in size to match application demands

• Slow and unpredictable network latency, reducing business responsiveness

• Unacceptably high cost of ownership due to bandwidth over-provisioning

But the latency of proprietary server adaptors and standard Ethernet is only one hindrance to achieving the performance necessary for a wider exploitation of Ethernet in HPC environments. Legacy Ethernet switches traditionally have not been conducive to exploitation at large scale given that:

• Underlying standards have not supported loss-less transmission; the main intent of TCP is to support packet re-transmission and best efforts

Our Commenting Policies
Latest News
rssRss Feed
View more Latest News