- Silicon Valley's 19 Coolest Places to Work
- Is Windows 8 Development Worth the Trouble?
- 8 Books Every IT Leader Should Read This Year
- 10 Hot Hadoop Startups to Watch
Network World - This vendor-written tech primer has been edited by Network World to eliminate product promotion, but readers should note it will likely favor the submitter's approach.
In IT we love creating new hype cycles and catchphrases. And like fashion trends, we seem to have a 20-year cycle where we go back to what we've done before but slap a new name on it and insist everybody must "have" it immediately. The latest hype: big data.
From Interop to cloud conferences and even to Dilbert, we are being told if we don't have a big data strategy -- that, by the way, aligns with our cloud strategy -- we are behind, and our company will crash and burn.
IN PICTURES: 'The Human Face of Big Data'
There are three important reality checks about big data. First, it's not really new. Companies like Amazon, Microsoft and Google have been doing big data work since the '90s. In fact, companies have been mining data for decades. It may have been only accessible or affordable to a few very large companies with big wallets and big main frame installations, but it has existed. Today, advanced data mining capability and algorithms are accessible to nearly everyone thanks to inexpensive computing and storage capacity as well as new tools and techniques.
In fact, many folks think big data is just a new name for business intelligence (BI). While there are similarities, big data goes beyond BI. I love how Stuart Miniman, a senior analyst at Wikibon, talks about the "bit flip" from BI to big data. Here is how I see that bit flip in action:
Second reality check: The "big" part is relative. We are absolutely dealing with a record level of digital data growth across all industries and organizations. According to IDC, we are creating more than 58 terabytes of data every second, and we expect to have some 35 zettabytes of digitally stored data by 2020. However, big data doesn't have to be massive. It's not so much the size but what you need to do with it and the time required to process it. A small company with 100 terabytes might have a big data problem, because it needs to extract, analyze and make decisions from multiple data sets about its product.
Third, the definition of data used in big data processes is broad. It can include both structured and unstructured data, and for some companies, the most vital big data is metadata, or the data about the data. Gartner does a good job of defining the data characteristics in big data as having volume, variety and velocity.
McKinsey defines big data as "datasets whose size is beyond the ability of typical database software tools to capture, store, manage and analyze." What I would add to this is: "that requires massively parallel software (systems) running on tens, hundreds or even thousands of servers (clouds)."
Beyond coming to a common understanding and definition of big data, the next big hurdle for most companies is how to get started. As with cloud computing, big data seems to require a massive investment and implementation of multiple solutions, new IT and business processes, and a new level of business agility. Here are seven steps to big data success: