On May 9th, The International Consortium of Investigative Journalists will release a searchable database that will detail over 200,000 entities that are part of the Panama Papers investigation.
While this will be intriguing for most of us, if you’re in a financial organization of any kind and there’s the remotest chance that you might have dealings with any of these entities, or with parties who might be fronting for or involved with them, May 9th will be (or depending on when you read this, is or has been), shall we say, “a bad day” for you.
Your challenge, once you get your hands on the ICIJ data, will be to search your organization's data resources looking for these names. The problem will be that it’s pretty much guaranteed you’re not prepared for this. What most of you will do is assume that this is a Big Data problem and attempt to aggregate all of your data resources into what will amount to a Frankenbase so you can run analytics on it. This will be neither simple not quick.
The reasons it won’t be simple or quick are because financial institutions are some of the worst offenders when it comes to having multiple data silos. Moreover, even when you’ve negotiated and or bashed heads to overcome the politics of silo ownership, you’ll be faced with the not insignificant task of normalizing the data you’re hoping to stuff in your Frankenbase.
And even when you’ve got all of your ducks in a row, there’s a grim reality to what you’re trying to do: A 2013 survey by InfoChimps found that a remarkable 55% of Big Data projects are never completed and 39% of these were due to … yes, you guessed it … siloed data and non-cooperation (otherwise called “politics”) ... along with the 41% that were stymied by technical roadblocks.
Oh, and then there’s the cost: A 2014 Dell survey found “Budgets for big data projects are expected rise to an average of $6 million over the next two years.”
My friends over at Pneuron [Disclosure: In 2014 I wrote a short series of posts for the Pneuron blog] recently pitched me on their approach to analyzing the ICIJ data using their technology. I’ve been a big fan of Pneuron’s technology since I first wrote about the company back in 2013 and their strategy for doing the sort of data mining required for this near-Herculean task makes a lot of sense.
Simon Moss, Pneuron’s CEO, argues: “This is not a Big Data problem, it’s a diversity and distribution problem.” By diversity, Moss means that the range of formats and contexts in which the data is stored are going to be a big issue and by distribution, he's referring to the data being virtually and geographically dispersed. These are not trivial problems.
Mosse points out that the first step in addressing the problem of searching for entity names is sorting and matching; in the ICIJ data there will be names such as “Robert P. Jones” but in the various data silos in a financial institution that name might appear as “Jones, R.P.” or “Jones, Robert” or even “Bob Jones.” Moreover, data in one silo might identify his spouse as “Ethel Jones” who also has accounts in her own name only in other silos. The matching problem is even more tricky when it comes to company names and relationships. There’s also the issue of multiple entities partnering in an account and they all need to be connected and their transactions scrutinized both jointly and separately.
The whole idea of moving this massive, complex, distributed data into a centralized database or databases should not only be daunting it should really inspire horror at the scale of the task.
Pneuron’s solution is their eponymous technology that involves deploying modules, called “Pneurons”, on or near to, each of the the data resources. There are different Pneurons for various tasks: Data and Application Pneurons are used to access data sources including databases, files, spreadsheets, and web services, while Analysis Pneurons perform various types of matching (deterministic, probabilistic, Bayesian, etc. - a function that is obviously highly relevant in searching for the names of suspect entities), predictive modeling, and statistical analysis. All of these modules normalize the data before passing it up the chain to the Pneuron Cortex which manages, routes, and coordinates the activities of the various deployed Pneurons. Finally, output Pneurons persist, visualize, or deliver data to files, databases, or other destinations.
System design is done using Pneuron's Design Studio which provides a graphical drag and drop interface that makes configuration and modification about as intuitive as it gets.
Pneuron claims they can go from installation through deployment and configuration to displaying analytics results or handing data off to an external analytics systems in under one month for large scale projects and, in simple cases, in as little as four hours. In the case of the ICIJ entity search, you won’t have a lot of time to waste. So, if on May 9th you’re going to be having, are having, or had, a “bad day”, you might want to check out Pneuron’s strategy.
Microsoft removes and depreciates features in its Windows 10 Creators Update that apply to commercial...
Developers of the popular LastPass password manager rushed to push out a fix to solve a serious...
The U.S. government reportedly pays Geek Squad technicians to dig through your PC for files to give to...
Cisco security research breaks down the anatomy of a cyberattack throughout the exploit kit infection...
The internet has your number—among many other deets. Prevent identity theft and doxxing by erasing...
Underpaid? Unchallenged? Unhappy with the culture at work? It might be time to look for a new job
What steps you need to take – and what mistakes you need to avoid – if you want to launch an open...