NextIO introduces new GPU consolidation appliance

Benefits high-performance computing, simulations, modeling and analysis

NextIO last Tuesday rolled out the vCORE Express 2090 GPU appliance, which allows customers to run GPUs outside their servers for high-performance computing applications

The new appliance uses NVIDIA Tesla M2090 GPUs and is targeted at seismic processing, biochemistry simulations, weather and climate modeling, signal processing and data analysis, among other applications. The 2090 GPU appliance allows 2,048 computing cores for accelerated scientific computations (or 512 cores per GPU), 2.6 teraFLOPS performance, ECC memory and a new L1/L2 cache.

Originally used for 3D gaming acceleration, GPUs have now come to the forefront of use for scientific computing, financial services' Monte Carlo simulations, oil and gas exploration and pattern recognition, among other applications - all embarrassing parallel workloads involving very large datasets and calling for floating point operations - that can be broken down into parallel processes and thus, saving organizations time and money.

Despite all the advantages of using GPUs for technical and scientific computing, also comes a wealth of challenges, mostly centered upon GPU maintenance and replacement. When workloads are broken up so that the sequential part of the application runs on the CPU and the computationally-intensive part runs on the GPU, if the GPU requires replacement, all workloads running on the server containing the GPU must be stopped, the server downed and the GPU replaced. The same thing happens when new firmware needs to be loaded or when a new version of the GPU needs to be added to the system - operations stop while labor-intensive and high touch-point GPU maintenance occurs. This in turn impacts all the applications running on the server, not just that application running on the failed GPU - an event that delays an organization's ability to do its work.

As the number of jobs and GPUs increase, so do the dependency and potential for resource contention. Jobs that don't require GPUs for computation may be assigned to servers containing GPUs, while workloads requiring GPUs wait for them to be available.

The result of GPU contention and maintenance factors is that system administrators tend to over-provision GPUs to ensure that one is always available when needed for a planned workload. This approach is clearly problematic - TCO increases, labor intensifies and ROI decreases.

The 2090 is available in a 1U form factor and contains 4 M2070 or 4 M2090 TESLA GPUs. It has a GPU memory speed of 1.85 GHz and a memory bandwidth of 177GB/sec.

Learn more about this topic

NextIO rolls out vCORE GPU appliance

NextIO launches virtual interconnect platform

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.
Related:
Take IDG’s 2020 IT Salary Survey: You’ll provide important data and have a chance to win $500.