Internet of Things helps fuel growth of data lakes

The need for increased business agility and accessibility are driving the market.

data lakes
Thinkstock

Data lakes, storage repositories that hold extremely large amounts of raw data in its native format until the data is needed by users, are becoming increasingly popular within enterprises.

Helping to fuel interest in data lakes are the digital transformation efforts underway at many enterprises, spurred by the emergence of the Internet of Things (IoT). The connected objects in the IoT will generate huge volumes of data.

As more products, assets, vehicles and other “things” are instrumented and data ingested, it’s important that IoT data sets be aggregated in a single place, where they can be easily analyzed and correlated with other relevant data sets using big data processing capabilities. Doing so is critical to generating the most leverage and insight from IoT data.

Growing market

Research firm Markets and Markets forecasts the data lakes market will grow from $2.53 billion in 2016 to $8.81 billion by 2021. Major forces driving the market are the need for increased business agility and accessibility, increasing adoption of IoT, potential for in-depth insights to drive competitive advantage, and growing volume and variety of business data.

Enterprises are deploying solutions such as Hadoop-based big data platforms and stream processing to build and maintain data lakes. These solutions make it possible for more organizations, even those with limited IT resources, to establish their own data lakes.

Among the advantages of data lakes is that they enable enterprises to retain all their data, instead of choosing what data to retain based on today’s needs. This is especially useful in the IoT, where the total volume of sensor data generated within an enterprise might be much higher than what is required for the identifiable early use-cases.

Higher fidelity sampling of IoT sensors can help ensure that an enterprise leverages the opportunity for future analytic insights that could lead to tangible benefits such as higher revenue and enhanced customer service. But simply pouring massive amounts of sensor data into a data lake doesn’t ensure that such insights can easily emerge.

Business intelligence vs. operational intelligence

When combined with well-governed data lakes, IoT data shines. The combination provides the ability to model complex systems, simulate scenarios and predict operational outcomes.

Many complex scenarios can be simulated if sufficient data is available and its context is well understood. In an IoT-enabled enterprise, the data lake isn’t simply a repository that supports more efficient traditional business intelligence (BI). It’s also the heart of a digitally-transformed enterprise’s ability to increase operational intelligence (OI), where near real-time optimizations may provide significant competitive advantages.

IoT-enabled enterprises will employ their data lakes in conjunction with machine learning and artificial intelligence to create innovative predictive models. These models will inform decision making from the factory floor to the executive suite, based on real-time sensor data and visibility into relevant large data sets.

Challenges ahead

Building and maintaining data lakes that deliver on the promises of IoT comes with challenges enterprises need to first address.

Some of these involve actually creating the data lake. For example, enterprises need to gather data from disparate parts of the organization into one place. Before doing so, they must cleanse the data so it’s reliable and usable. Sufficient metadata needs to be included to help provide context for users. Companies also should ensure the security, privacy and governance of data in the data lake.

Tools can simplify some of these tasks, but enterprises must choose the right solutions to address their particular challenges. Some of these products are quite costly.

Other challenges relate to structural and organizational issues. Companies have traditionally operated silos, where various teams maintain their own siloed data repositories. They’ve done this for good reasons. Each department is trying to answer different questions and finds it useful to impose sufficient structure on their data to make obtaining those answers more efficient.

The possibility of asking and answering more complex questions that might involve dipping into IoT sensor data generated in distant parts of an organization is enticing. But putting all sensor data into one place doesn’t magically break down existing silos in a manner that ensures these data sets can be understood and used to generate meaningful insight.

Enabling all users to get the most value from data lakes requires careful forethought and preparation. Without sufficient retained context, data may limit the insight that can emerge.

Decisions about how much metadata to store with sensor readings require careful consideration. Although some of these relationships can be reconstructed after the fact through correlation with other data sets, there’s a cost in doing so. The effort required may discourage other teams from using this data to their maximum advantage.

Maximum value

Without addressing these challenges through the proper technology solutions and organizational changes, enterprises won’t derive maximum value from their IoT initiatives or data lake. They’re likely to end up with expensive and under-used resources. But such an outcome can be avoided via a few simple steps:

  • Include input from representatives throughout the company when building a data lake. Because users from different departments will be impacted, having this input early will help ensure the data lake meets expectations of all key stakeholders.
  • Give the lines of business a prominent seat at the table, ensuring that data admitted into the data lake is conditioned to provide maximum long-term value.
  • Make sure IT and information security functions are well represented and take on a leadership role to provide guidance for technology selection and data governance.
  • Carefully select self-service analytics and data science tools to make sure different departments can easily obtain customized insight from the data they need, without incurring costly data validation, cleansing or visualization development effort.

With the right solutions and operational/cultural changes in place to build and maintain a data lake, IoT-enabled enterprises can provide business users with unprecedented insight and value from the huge volumes of information they are gathering.

With a well-designed and deployed data lake in place, enterprises will be well on their way to successful digital transformations.

This article is published as part of the IDG Contributor Network. Want to Join?

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.
Must read: 10 new UI features coming to Windows 10