Skip Links

Yahoo builds ultimate private cloud

Yahoo's private cloud expands and contracts computing resources nearly instantaneously and does not rely on a public cloud for extra capacity. Here's how they built it.

By , Network World
July 19, 2011 12:27 PM ET

Network World - Imagine the kind of infrastructure needed for a website fielding 1.5 million requests per second. That was one of the challenges faced by Yahoo's Todd Papaioannou, vice president of cloud architecture.

"What's my biggest pain spot? No, it's not Google," he recently quipped with attendees during his keynote speech at the Cloud Leadership Forum, held last month in Santa Clara, Calif. "My biggest problem is elasticity. VM spin-up time. Virtualization isn't there yet." Ten to 20 minutes is just too long to handle a spike in Yahoo's traffic when big news breaks such as the Japan tsunami or the death of Osama bin Laden or Michael Jackson.

RESEARCH: Public cloud vs. private cloud: Why not both?

That's why Yahoo has built itself the ultimate private cloud. And by private cloud, we don't mean just a cluster of virtualized servers -- we mean an infrastructure that can expand or contract as quickly as you can take a deep breath and exhale.

And failover to a public cloud won't cut it, either. By Papaioannou's estimates, it can take 20 to 40 minutes to spin up a VM instance relying on Amazon's Elastic Block Store storage.

True, Yahoo, based in Sunnyvale, Calif., is overshadowed by the 800-pound gorilla a short drive up the 101 in Mountain View -- at least in the U.S. Yet Papaioannou points out that in other nations in the world, like Taiwan, Yahoo is the most popular Internet destination. This means that the sun never sets on the page requests made of Yahoo's 400,000 servers (compare that to cloud-for-sale Rackspace's 70,000 servers, he notes). Yahoo supports more than 680 million registered users and stores more than 200 petabytes of data, much of that on 42,000 Hadoop servers. It collects and processes 100 billion events per day and those 11.5 million requests per second add up to 11 billion pages served per month.

Yahoo considers itself to be the cloud -- a personal cloud for consumers. It is the Internet service that stores consumer data like photos, email and other media, provides users with online services like search, news, games and TV. Its secret sauce is its Web of Objects or WOO. This is the customization engine that serves up related content as users use its services. Yahoo describes WOO as "semantic map of web entities." The more visitors use Yahoo, the more WOO can zero in on personalized related content. If a user searches for a band, WOO could show news stories, videos, lyrics and other deep content related to the band and associated with the person's online behavior.

It takes a big engine built on top of a hyper-flexible cloud to collect all of that big data and to analyze it and to keep it up when traffic spikes.

For Papaioannou this means that the private cloud isn't just a fancy marketing phrase. When a spike happens, "currently our only option is to do 'load shedding,'" says Papaioannou. This means the private cloud pauses or moves lower-priority workloads off those servers and dedicates them to the spike. Lower-priority workloads include servers that are running batch workloads, for instance.

Our Commenting Policies
Cloud computing disrupts the vendor landscape


Latest News
rssRss Feed
View more Latest News