With 5 petabytes of user data on Teradata equipment, eBay is one of the hardest-core analytic technology users around. Naturally, eBay thinks it gains much competitive advantage from this, and doesn’t want to share many details. But one tidbit eBay has let slip is that in less than a year, eBay exploited analytics to increase server utilization by more than 1.6X.
What I mean is this: There’s a simple metric called Parallel Efficiency (PE) = (total average server utilization)/(total maximum server utilization). To the extent your parallel efficiency on a parallel cluster is under 100%, you’re essentially wasting server purchase cost, server room floor space, and some amount of power. eBay measured the parallel efficiency of its 10,000+ servers – probably a lot more than 10,000 – and found a figure well under 50%. “Six months” or so later – I’m saying “less than a year” because such timelines are hard to measure precisely – parallel efficiency was up to over 80%. The resulting cost savings are obviously enormous.
eBay did this in the usual way, identifying and eliminating bottlenecks. The interesting part is how they found the bottlenecks. Just as they would with any other kind of data, they banged network event data into a data warehouse, then used standard analytic tools to extract results.
As I wrote recently,
Where there’s clickstream data, there’s usually also network event data – and the latter is in even higher volumes.
Now we know what the payoff from collecting all that data can be.