Despite recent advancements and improved parallelism in multi-core CPU performance, there is still a big challenge to be solved relating to the scale-out of cloud applications.
Put simply, Linux application performance scales poorly as CPU core count increases. This is commonly experienced as typical Linux applications can be expected to see a 1.5X performance improvement for a 2-core CPU, but the scale quickly plateaus after that, with 4 core performance only improving around 2.5X. The performance further degrades as core counts rise. Given that, along with Intel’s announcement that its Xeon chips have up to 22 cores, scaling performance efficiently across cores is extremely important.
At the heart of this rapid plateauing lies the fact that both the process (or processes) and related I/O devices require core resources. But the likelihood that these processes and their dependent I/O both execute on the same core decreases as the core count goes up—resulting in excessive overhead due to inter-core communications, data copying and other operating system overheads.
From a hardware perspective, each processor core (including its associated L1 and L2 caches) is largely an independent computing unit, fully capable of executing compute tasks in parallel. That is not the issue. The real issue lies in how applications are architected to facilitate parallel computing and how the underlying operating system gains insight into application processes for tasks that can be executed in parallel without incurring a significant amount of shared-resource-locking and data-copying overheads. In a multi-core environment, this insight is critical in order to reduce or eliminate inter-core overheads so that the effective performance of each core can be fully realized.
What About Existing Linux Apps?
That all sounds good, but what about all of the existing Linux applications that have been written? Can those applications somehow leverage the available CPU cores as parallel resources without any application changes? Making that possible would seem like magic.
To find a solution, it pays to consider the experience of modern-day, large-scale cloud-native web companies such as Facebook and Uber. They have gone through some noticeable architectural shifts partly due to the pressing need to develop applications that can perform and scale out horizontally, agnostic of hypervisors, containers or operating systems.
The result has been the rise of new, cloud-native applications that feature a clear separation of control and data-path functions and have the ability to spawn new worker processes on demand. Moreover, each worker process is often designed to run to completion to simplify complex, inter-process and inter-thread locking and to achieve a better correlation between how applications are expected to execute in parallel and how the underlying infrastructure can adapt best to application needs in real time.
Remember how computer instructions were growing in complexity until Reduced Instruction Set Computers (RISC) came to the rescue? It feels like there is an incarnation of that simplistic mindset in applications today. Parallel computers are not new; they have long existed. But these were purpose-built proprietary computers with proprietary architectures and special MPI message passing protocols that were not designed for the masses. We don’t want that.
At the heart of the challenge are two additional key questions that we need to ponder:
First, can an application-aware parallel computing framework be constructed that is software defined, with the underlying servers, storage and networks already moving toward that?
Second, are today’s commodity multi-core processors and I/O virtualization framework sufficiently parallel from a hardware and infrastructure perspective that a software-defined, application-aware parallel computing framework is possible?
That is a multi-billion-dollar question.
This article is published as part of the IDG Contributor Network. Want to Join?