Big Data is a phrase we hear over and over again. Yes it's obvious, Big Data means well, big data, lots of it. We all get that Facebook, Twitter and the other mega-web apps generate literally tons and tons of data. But beyond the mega web apps, what really is Big Data? What can we do with it and why does everyone get so excited by it? For help with this I went to my friends at LexisNexis, makers of HPCC Systems.
When we speak about big data, the problem is not amassing a lot of data, it is the analysis of the data to make something of value out of it that is the real trick. The folks at LexisNexis have been doing this for a long time. HPCC Systems is LexisNexis's own in-house big data solution, which they open sourced about a year ago.
For purposes of this article, though, whether we are speaking about HPCC or Hadoop or any other big data solution, is not important. I wanted to illustrate what you can do with good analysis of big data. I am going to share a case study by HPCC Systems on a proof of concept they did for the Office of the Medicaid Inspector Generation (OMIG) of a large Northeastern state.
HPCC Systems was given a large list of names and addresses. Overlapping thier own publicly available data, they sought to identify social clusters of Medicaid recipients living in expensive houses and driving expensive houses. Of course, it helps if you have 50Tb of public data and lots of experience building social graphs.
In any event these are the kind of tasks that HPCC and big data solutions are built for. Comparing Medicaid roles with purchases of cars and homes revealed some interesting results. Here is a map that was generated:
Not only did the analysis turn up lots of likely Medicaid fraud, but it also turned up connections that could be indicative of money laundering and mortgage fraud. This kind of result simply would not be possible without the power of a big data analysis engine like HPCC Systems.
I had a chance to speak with Jo Prichard of LexisNexis, who showed me some other examples of big data analysis. One involved taking the total page views of Wikipedia for the year, along with public mentions of specific personalities. So, tracking hits on Whitney Houston to her Wikipedia hits. Again, the results were pretty extraordinary. Another example was drug prescription abuse. Again overlaying public data on the initial data set shed some eye opening results.
This really only scratches the surface of what you can do with big data if you have the horsepower and analysis to use it. In this case it is HPCC Systems, but it could be Hadoop (though the LexisNexis folks say not as easily as you can wtih HPCC) or another big data solution. This kind of insight is what gets people really excited about big data beyond the Facebook-Twitter crowd.
As co-founder and Managing Partner at The CISO Group, Alan Shimel is responsible for driving the vision and mission of the company. The CISO Group offers security consulting and PCI compliance management for the payment card industry. Prior to The CISO Group, Alan was the Chief Strategy Officer at StillSecure. Shimel was the public persona of StillSecure as it grew from start up to helping defend some of the largest and most sensitive networks in the world.
Shimel is an often-cited personality in the technology community and is a sought-after speaker at industry and government conferences and events. His commentary about the state of security, open source and life is followed closely by many industry insiders via his blog and podcast, "Ashimmy, After All These Years" (www.ashimmy.com). Alan is now also a regular contributor to The CISO Group’s security.exe blog and podcast. Follow him on Google.
Alan has helped build several successful technology companies by combining a strong business background with a deep knowledge of technology. His legal background, long experience in the field, and New York street smarts combine to form a unique personality.
Disclosure: The CISO Group sells a software-as-a-service PCI compliance application called SAQPro. The company is independent and does not represent any other vendor's products as a reseller.
Policy on comments: Respectful discussion is welcomed! However comments that use inappropriate language, consist of name calling or personal attacks, or include accusations of wrongdoing are not appropriate. Those comments will be deleted or edited.