How to determine if big data security analytics will produce useful outcomes

Five “sniff tests” to determine whether proposed big data security techniques really measure up

big data confusing overload spiral falling
Credit: Thinkstock

Although vendor-written, this contributed piece does not advocate a position that is particular to the author’s employer and has been edited and approved by Network World editors.

Big data is the hot buzzword in security analytics today, but buyers are skeptical because many companies have spent years building “data lakes” only to discover it was impossible to “drain the lake” to get something useful.

And unfortunately, today’s solutions often include expensive clusters coupled with static business intelligence reports and “sexy” dashboards that look good but add little to useful and productive security analytics. Focusing on the analytics and how to use the data (very valuable data) in order to make real time decisions, discover critical patterns, determine on-going and changing security policies and dramatically improve security – ah – that’s useful.

We only need to look at companies like Google, Amazon and Netflix to realize that big data can be a successful enabler for real time data mining techniques for complex data sets that have high velocity, variety and volume (3Vs). These companies use big data as key part of their business with predictive analytics that tell them what we want to buy or watch. This should be the model for truly useful security analytics.

Here are five “sniff tests” that will help you determine whether an approach being proposed will use big data techniques that will get you a useful outcome:

Sniff Test 1: Is your big data solution only about the “3Vs”?  If a vendor is only addressing the Velocity, Variety and Volume issues of big data, then your big data system may be more efficient than your SIEM (Security Information and Event Management) but it will end as a big data storage trap. Your vendor needs to be talking to you about Bayes theory, regression, classification algorithms, dimensionality issues, etc., as a means to make big data useful by making it predictive and truly actionable. Yes – it sounds like rocket science and it may be scary – but it is a must for the dynamic nature of security events.

Sniff Test 2: What answer do you get when you ask “what do you mean by security analytics”?  If you hear things like correlation, dashboards, queries, and alerts – it’s old school. You need to hear about machine learning libraries, cubes, cosine matrices, etc. Everything has to be based on laws of large numbers / outliers – i.e. techniques that make use of a lot of data and a lot of history to build things automatically (and constantly more precise) as opposed to a user that needs to stare at static aggregated data or manually define explicit security policies.

Sniff Test 3: Does your security analytics system have closed feedback loops?  Analytics are not reports. Analytics help make decisions. Security analytics are not “after the fact things” – they use historical information to improve things going forward. For example, look for analytics that modify your real time monitoring and that tell you what to exclude and, importantly, what to focus on – not that just send you alerts. When it comes to intelligent security analytics, increasing volumes of data with the appropriate algorithms significantly improves the analytics and decision making-and the usefulness of the system.

Sniff Test 4: Are you being led down the road to larger and larger clusters?  The big data world has partly gone crazy – building humungous clusters for doing very little (and adding lots of complexity). Even if you can get the money today, it doesn’t mean you’ll get the money tomorrow, and since the goal is to aggregate data from many periods and sources, you need to ensure that the cost does not scale with the data. Generally, more data yields better results, but if it breaks the bank then it’s useless. You should be looking for platforms that scale efficiently. Look for systems that use a NoSQL approach, columnar data fields and an in-memory distributed parallel processing architecture. An efficient system should not require one node for a few terabytes of data - the ratios must be much higher.

Sniff Test 5: Is your data management framework flexible to deal with the variety of data? Big data has many layers and many options, some which will help you and some that can cripple you with complexity.  Big data delivers a richness of information by supporting a variety of data types. Big data has gone through a number of generations very quickly, so, it is important for you to look for the modern data approaches that stress simplicity, e.g. those that merge big data with JSON (JavaScript Object Notation) as a flexible data format.

Understanding and using big data is crucial to security analytics, but big data is also full of hype and indistinguishable chatter. Hopefully these five simple sniff tests can help you sift through the noise and let you select solutions that can really deliver the security analytics you need.

jSonar develops big data Analytics Warehouses.  Bennatan has been a “data security guy” for 25 years at companies such as J.P. Morgan, Merrill Lynch, Intel, IBM and AT&T Bell Labs.  He has a Ph.D. in Computer Science and has authored 11 technical books.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.
Must read: 10 new UI features coming to Windows 10