Last week I wrote about Red Hat CEO Jim Whitehurst's thoughts on private clouds vs. public clouds. This time, I want to share his comments on another big either-or controversy, this one regarding big data.
Basically, it boils down to this: for enterprises looking to exploit big data, is it better to capture all data you can now and figure out what to do with it later, or to decide what specific questions you need to answer and develop a big data strategy with a clear ROI right from the beginning?
The data comes first
"As a seller of storage products," Whitehurst said, "my advice is to capture everything you can." No surprise there, but Whitehurst went on to do a pretty good job of justifying that viewpoint.
"Seriously," he said, "the best insights emerge out of having the data and figuring out what do with it... If you have data and make it available and useful, people will figure out what to do with it... The ability of even brilliant minds to figure it all out in advance is limited."
As an example, Whitehurst related the story of Honda successfully importing small motorcycles into the United States. At the time, he said, people thought this was a brilliant strategy, but the company's original plan was to import large motorcycles, which didn't initially sell well. But people loved the small bikes that Honda engineers brought along with them to the U.S., so the company pivoted to sell those machines instead.
"Most things happen from the bottom up, as a ground swell," Whitehurst said, "That's how the value of data emerges."
What about the cost?
I actually agree with that assessment, but it's awfully hard to get most organizations to invest in a big data project when you can't articulate the payoff. The solution, Whitehurst said, is to use a modular approach: "if the cost of getting started is small, ROI is less of a concern."
To illustrate his point, Whitehurst used another example, this one from his personal experience at Delta Air Lines before joining Red Hat. Delta figured it could save 10,000 positions by using a single gate agent for each narrow-body departure -- and using information screens at each gate to answer passengers' questions about flight status, upgrades, and so on. But testing the system to see if it really enabled a single agent to handle the gate would cost a whopping $50 million! Instead, Delta used Red Hat's JBoss (see the connection?) to build a dramatically cheaper test, Whitehurst said. "It worked, so we did it."
The point, Whitehurst said, is that "if you can make things lower cost to test, you get more tests... Innovation requires trial and error, if it's expensive, you can't do that."
I can't speak to the impact of the JBoss-based tests, but I do believe that big data offers huge benefits that can't be fully understood until you have some actual data to analyze. Huge changes in sensors and storage technology mean that collecting and storing that data no longer has to be expensive, but there's no way to value the missed opportunities of collecting only the data you already know you're going to need. That's why it makes sense to capture all the data you can reasonably afford -- it's bound to come in handy down the road.