When Big Data Goes Bad

Big Data has become an obsession in business for a good reason; it can, when collected, sanitized, and mined intelligently, produce incredibly useful information and insights that often could not be discovered any other way. But intelligent collection and sanitizing aren’t always applied and thus what looks to be useful data are often actually incorrect and misleading. 

sgi hadoop cluster

For example, I just got a call from some guy asking for a “Betsy Bellar.” When I said he had the wrong number he asked if a “Jim Ulhman” was available. I asked what company he was looking for and he said “Gibbs Universal Industries” which is the name of my corporation. I asked who he was and where he got those names from. He skipped his name but told me that he’d got the names from data.com

I immediately did a search and found the entry for Gibbs Universal Industries Company Directory of Business Contacts on data.com which lists 13 contacts of which only one is correct; me.

The caller then asked for the CTO and as he was trying to sell something I told him he was wasting his time so we amicably ended the call.

Curiously, out of the 13 contacts listed on data.com the only other contacts who are supposed to be in Ventura are a “Robert Boyer” as CEO and a “Paul Maznack” as Marketing and Promotions. Looking up "Robert Boyer" along with “Gibbs Universal” pointed me to ChamberOfCommerce.com which has some of the same incorrect information and again lists “Paul Maznack” as Marketing and Promotions and a “William Holland” as Executive Director. 

Still trying to figure out where all this bogus information came from I googled “Paul Maznack” and found a site called leadferret.com which shows him to be someone who works for International Data Group (the parent of Network World). His entry is accompanied by a list of people who also work at IDG including my name which is linked to an entry that claims I’m the Chief Executive Officer of IDG.

Two things immediately came to mind: First, if I’m CEO Of IDG, who’s been getting my paychecks? Second, where on earth does all this crap data come from?

leadferret.com (which I can’t help but pronounce as “led ferret”) claims on its About page:

LeadFerret is breaking barriers and turning the data industry on its edge … Founded in 2011 by one of the pioneers in data, LeadFerret is like no other as it includes complete information, including email addresses. Every record in LeadFerret is complete with company, name, title, address, phone number and most importantly email address.

Complete information, eh? Sure, it's complete to some abstract definition of "complete" but it's simply not accurate and therefore it's unreliable. ChamberOfCommerce.com says in its FAQ:

Data source providers supply business infofrmation to many online business directories like ChamberofCommerce.com.

Ah-ha! Garbage in, garbage out. I’m pretty sure that data.com (which is owned by Salesforce who I'd expect more business discipline from) got their data from ChamberOfCommerce.com and both of those may or may not have a relationship with leadferret.com … but whatever the connections are the data all of them have are to the greater extent wrong and misleading. Moreover the companies offering this “intelligence” must be doing so knowing full well that their methodologies and findings are deeply flawed but they apparently don’t care. 

How do I know they don’t care? Consider data.com: They had my email address but did they send me a message asking me to confirm my details? Nope. Did ChamberOfCommerce.com try to confirm my company’s details? Nope. Did leadferret.com? Nope.

What I take away from all of this is that all of this Big Data mining as used by these and other organizations to discover company details doesn’t apparently work very well. The end products are all full of mistakes that make the value of the data useless. Let the prospecting salesperson beware.

Sell your thoughts below then follow me on TwitterApp.net, and Facebook.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.
Must read: Hidden Cause of Slow Internet and how to fix it
Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.