eBay scores cost savings and a bandwidth boost with white-box switches running SONiC

Open-source software plus switches built from commodity parts transform eBay's data-center networks to a 400Gbps Layer 3 fabric.

online auction keybaord button
Jane0606 / Shutterstock

For online auction powerhouse eBay, customer service is everything. Or, as Parantap Lahiri, vice president of network and data center engineering, puts it, “We want to make the network more like air or water, so our people don’t have to worry about network resources when creating magical services for our users.”

The demands on the eBay infrastructure are staggering: 1.8 billion active listings; 133 million active buyers. It’s main landing page gets 250 million visits per day. Unlike a static storefront site like Amazon, an eBay auction can entail multiple bidders from all over the world competing against each other as the clock ticks down to the end of the auction. And the eBay platform supports direct communication between sellers and buyers, with offers and counteroffers flying back and forth.

On top of the responsibility for maintaining a network infrastructure that can support current traffic levels, eBay developers are constantly coming up with new site features that leverage data intensive technologies like AI and machine learning.

In a bold move aimed at cutting costs, increasing bandwidth and providing network capacity for years to come, Lahiri built a 400Gbps Ethernet fabric for its on-prem data centers based on white-box switches running the open source SONiC operating system.

(SONiC, or Software for Open Networking in the Cloud, was developed by Microsoft for its Azure cloud data centers, then released to the open-source community in 2016. It is currently part of the Linux Foundation.)

Data-center modernization was needed.

eBay is one the original cloud companies; it conducted its first online auction in 1995. But in many ways eBay is no different from any company closing in on its 30th birthday: It has legacy on-premises data centers built on technology that needed to be updated.

Lahiri says that in 2015 he started thinking about replacing his traditional Layer 2 data-center network with a Layer 3 IP-based network, and to upgrade from 100Gbps Ethernet to 400Gpst. He says that the traditional Layer 2 domains weren’t stable enough or scalable enough for eBay’s growing requirements.

When he began pricing 400Gig technology, including the optics, silicon, and switch hardware, he found it was simply too expensive, so he decided to investigate the open-source option, which is firmly in line with the corporate culture.

The eBay approach to technology in general is do-it-yourself, innovative, and decidedly open source. In fact, eBay builds its own custom-designed servers. It built its own object data store called NuData. And it leans heavily on open-source tools like MongoDB and Hadoop for database, the Apache Kafka streaming platform, and Docker and Kubernetes for developing and orchestrating containerized applications.

In 2019, Lahiri put a SONiC box into the production network and monitored it for performance and stability for more than a year before going all-in on the white box/SONiC route. As of today, Lahiri has decommissioned what he calls the “old school” footprints and has deployed a SONiC-based network fabric in production in all active data centers. From here on out, all new capacity deployments will be SONiC-based, he says.

One of the common concerns associated with this approach is whether the IT team has the expertise and bandwidth to install, monitor, maintain, and troubleshoot the network.

Lahiri says it’s not as hard as it might look. “I do have a great networking team, but it’s not huge,” he says. “The overall entrance barrier has gone down a lot. We have a working configuration and just slap it on the white box, and it works.”

He points out that there is an active and helpful SONiC community, the standards are all there, and the hyperscalers have been building these types of Layer 3 BGP networks for years. “There was nothing new that we did on the architectural part,” Lahiri says.

eBay benefits from lower maintenance and costs.

As eBay developers shifted to containerized applications, the demands on the network increased significantly, particularly when AI-based workloads are added to the mix. “We needed to provide an infrastructure wide enough so that people don’t have to worry about where they are putting their workloads, where workloads are talking to each other with no conditions,” says Lahiri.

In addition, he wanted to build a network that did not require constant attention. In the past, if an application moved or needed more bandwidth, his team was out there manually upgrading line cards.

“One of our motivating factors was to build a chunk of network fabric, and we don’t want to touch it. We don’t want a custom network for applications that move. We wanted to break away from that cycle.” With the new network, “all changes are automated, nobody is making CLI changes.”

And that translates into business benefits. “Developers can come up with a new feature, and get it deployed fast,” says Lahiri. By taking the open-source/white box route, Lahiri has been able to slash operational expenses by an estimated 25%, while quadrupling bandwidth (100Gbps to 400Gbps).

Another benefit has been the ability to reduce the time and effort that his team used to spend trying to get networking gear from multiple commercial vendors to interoperate or tracking down the source of a bug, thereby freeing up staffers to take on higher level tasks.

Tips from eBay and a look ahead.

His advice to other IT leaders is to start with a small proof-of-concept. “You don’t need to have an army of people to work on the OS piece of it; you can get running with SONiC on a white box switch pretty easily.”

Beyond that, Lahiri says, “You have to know what is being asked from the protocol side, the bandwidth side, the silicon side; and you have to understand your problems really well and see what opportunities there are to innovate.”

Lahiri is already eyeing 800Gigabit Ethernet, and says, “When 800Gig becomes usable and reasonably priced, we will champion it.” In the meantime, his team is working on things like how to interconnect his 400Gig fabrics using dense mesh topologies. “The work goes on,” says Lahiri.

