Off-the-shelf graphic processing units can perform SSL acceleration as fast as high-end commercial SSL hardware at a fraction of the cost, according to researchers in Korea and the U.S.
Called SSLShader, the proxy hardware handles termination of SSL sessions to and from servers and puts up impressive numbers for transactions per second and transfers of large files, according to a paper to be delivered Wednesday at the USENIX Symposium on Networked Systems Design and Implementation in Cambridge, Mass.
Use of SSL significantly increases the security of Web sites, but hasn't been as widely adopted as it might because of performance problems and the added costs. SSLShader could help address both, the researchers say.
FOR MORE ON ALGORITHMS: 15 genius algorithms that aren't boring
GPUs far outstrip hardware supported by CPUs, providing both high throughput and low latency, the researchers from Korea Advanced Institute of Science and Technology (KAIST) and the University of Washington say.
By sorting cryptographic operations of the same type and queuing them through the same GPU kernel, SSLShader takes advantage of the GPUs' massive parallel computing capacity. GPUs are called into play only when there is a sufficiently large batch of similar requests. Unless requests hit a certain threshold, the CPUs perform better, the researchers say, so they have developed an offloading algorithm within SSLShader to determine when the GPUs should kick in.
The team ran OpenSSL on server hardware fitted with two Intel Xeon X5650 CPUs 24GB of memory and two NVIDIA GTX580 cards. The box had a Ubuntu Linux operating system and lightpd Web server software. The researchers compared performance of lightpd with OpenSSL vs. SSLShader with a variety of traffic. For processing 1024-bit RSA keys, lightpd achieved 11,200 transactions per second vs. SSLShader's 29,000 transactions per second.
For more secure 2048-bit RSA, lightpd with OpenSSL achieved 3,600 TPS while SSLShader reached 21,800 TPS, the researchers say.
Using GPUs for cryptographic operations when load is low can lead to high latency, the researchers found, hence the need for the offloading algorithm, which minimizes latency when load is light and maximizing throughput when the load is high, they say.
The researchers compared price per operation for the components used in their experiment - Intel Xeon X5650 CPU ($996) and NVIDIA GTX580 card ($499) - vs. the same measurements for the commercial NITROX SSL accelerator board (2,129).
For RSA key processing, the CPU achieved 19.9 operations per second for a dollar, the GPU achieved 185.3 operations per second for a dollar and the NITROX board achieved 30.5 operations per second for a dollar.
For SHA-1 processing, the CPU achieved 20.2Mbps per dollar; the GPU achieved 62.3Mbps per dollar and the NITORX board achieved 2.8Mbps per dollar, the researchers say.
"SSL Shader handles [29,000]SSL TPS and achieves 13Gbps bulk encryption throughput on commodity hardware," the researchers say. "We hope our work pushes SSL to a wider adoption than today."