During the next eight years, the amount of digital data produced will exceed 40 zettabytes, which is the equivalent of 5,200 GB of data for every man, woman and child on Earth, according to an updated Digital Universe study released today.
To put it in perspective, 40 zettabytes is 40 trillion gigabytes -- estimated to be 57 times the amount of all the grains of sand on all the beaches on earth. To hit that figure, all data is expected to double every two years through 2020.
The majority of data between now and 2020 will not be produced by humans but by machines as they talk to each other over data networks. That would include, for example, machine sensors and smart devices communicating with other devices.
So far, however, only a tiny fraction of the data being produced has been explored for its value through the use of data analytics. IDC estimates that by 2020, as much as 33% of all data will contain information that might be valuable if analyzed.
The Digital universe explained
The digital universe includes everything from images and videos on mobile phones uploaded to YouTube to digital movies populating the pixels of high-definition TVs to transponders recording highway tolls. It also, naturally, includes more traditional corporate data, such as banking data swiped in an ATM, security footage at airports and major events such as the Olympic Games, as well as subatomic collisions recorded by the Large Hadron Collider at CERN.
Using business intelligence to analyze data could reveal patterns in social media use, correlations in scientific data from discrete studies, medical information intersected with sociological data, as well as faces in security footage.
"Herein is the promise of 'Big Data' or MapReduce technology -- the extraction of value from the large untapped pools of data in the digital universe," IDC said in the study.
Additionally, data that would be mined has to be "tagged" with meta data to give it context. That would include, for example, adding a date stamp to video surveillance or geolocation information to smartphone photos or video --"basically, some data that puts context around the data we're creating," said Chuck Hollis, global marketing CTO at EMC.
"We're not only going to have to tag more of it, but we're going to have to tag it with better information over time if we want to extract data with value from it," he said.
That opens up a burgeoning career field for data scientists, who will be asked to extrapolate useable information from massive data stores such as consumer buying trends.
Picking up speed
The Digital Universe study, which is sponsored by EMC, was first launched in 2005. For the first three years, it was refreshed on an annual basis. This latest update, however, marks an 18-month lag between study results -- and a huge change in its predictions.
For example, the last version, released in June 2011, predicted the amount of data to be produced by 2020 would be 35 zettabytes, not 40 zettabytes.
Hollis said the new IDC study reveals that for every physical or virtual server corporations have today, they can plan on having 10 times that number by the end of the decade.
"Another way to look at it is that for every terabyte of data you own today, plan on 14 times more just like it by the end of the decade," he said. "But I think most of the people I meet in IT world know this is happening."
The number of servers (virtual and physical) worldwide will grow 10-fold and the amount of information managed directly by enterprise datacenters will grow by a factor of 14, the study showed. Meanwhile, the number of IT professionals is expected to grow by less than a factor of 1.5.
Hollis, whose company is heavily promoting the cloud and big data analytics products, said that in order to manage that data growth, companies will have to restructure to create automated service-oriented architectures (SOAs). SOAs allow business units to choose server, networking and storage capacity from online catalogs that automatically provision and then charge back for it.
"You can't do what you did five years ago and scale at that rate," Hollis said.
More efficiency needed
The Digital Universe study agreed with Hollis' assessment. IT managers must find ways to drive more efficiency in their infrastructures so that IT administrators can focus on more value-add initiatives such as "bring your own device" (BYOD) policies, Big Data analytics, customer on-boarding efficiency and security.
"One way this is likely to happen is through converged infrastructures, which integrate storage, servers, and networks," the study said.
In one area, the Digital Universe study contradicted one predominant line of thinking today: that most data in the future will be stored in the cloud.
While spending on public and private cloud computing accounts for less than 5% of total IT spending today, IDC estimates that by 2020, nearly 40% of the information in the digital universe will be "touched" by cloud computing, meaning that a byte will be stored or processed in a cloud somewhere in its journey from originator to disposal. Yet, only as much as about 15% of data will be maintained in a cloud, IDC said.
The investment in managing, containing, studying and storing the bits in the digital universe will only grow by 40% between 2012 and 2020. As a result, the investment per gigabyte during that same period will drop from $2 to 20 cents.
Entertainment and social media
A majority of the information in the digital universe is entertainment or social media. In 2012, 68% of all data created was used by consumers watching digital TV, interacting with social media or sending camera phone images and videos between devices and around the Internet. Yet enterprises have liability or responsibility for nearly 80% of the information in the digital universe.
As a result, corporations must deal with issues of copyright, privacy, and compliance even when the data zipping through their networks and server farms is created and used by consumers.
IDC's research paper estimates that about one-third of all data requires some type of protection, either to safeguard personal privacy, adhere to regulations, or prevent digital snooping or theft. However, currently, only about 20% of data now has these protections. The level of security varies by region, with much less protection in emerging tech markets, which include countries such as Brazil, Russia, India, Malaysia, and the United Arab Emirates.
Additionally, emerging market nations will go from creating a minority of data to creating the majority, IDC said. In 2005, for example, 48% of the digital universe came from the United States and Western Europe. Emerging markets accounted for less than 20%. However, the share of data attributable to emerging markets is now 36% and will be 62% by 2020. By then, China alone will generate 21% of the bit streams entering the digital universe.
Additionally, the study found:
The network is growing in importance. Latencies must get shorter, not longer. Data must be analyzed, security applied, and authentication verified -- all in real time and at levels yet to be seen. Network infrastructure will be a key investment over the next eight years.
The regulations governing information security must harmonize around the globe, though differences will remain. IT managers must realize that data will be requested outside geographic boundaries, and a global knowledge of information security may be the difference between approval and denial of a data request.
This story, "By 2020, there will be 5,200 GB of data for every person on Earth" was originally published by Computerworld.