How to explain to Big Data newbies why correlation doesn’t equal causation

It's easy to assume that because two data sets appear to be linked, they are.

With the explosion of interest in Big Data everyone in every department is looking for actionable intelligence. That’s great but there’s a downside: Trying to explain to, say, your VP of sales that the sales of barbecue sauce might appear to be connected to the selling price of beef but you can’t say that's true for certain and that it would be inadvisable to act on that conclusion without deeper analysis.

“What?!” she’ll say. “I can see with my own eyes that they curvey things go up and down together.” “Ah” you can reply, “let me show you something …” so you show her the Spurious Correlations web site.

This site is a treasury of examples that demonstrate, very clearly, that correlation does not prove causation. For example, the correlation between US spending on science, space, and technology and suicides by hanging, strangulation and suffocation is a remarkable 99.2% yet no one in their right mind would says that one causes the other.

Correlation Example Tyler Vigen

Similarly, the per capita consumption of cheese in the US correlates 94.7% with the number of people who died by becoming tangled in their bedsheets and is just as easily rejected as not causative even though there’s a very high degree of correlation.

Published by Tyler Vigen the Spurious Correlations site currently contains 27,724 correlations many of which are very amusing (for example, the marriage rate in New York has an 87.9% correlation with murders by blunt objects) and Tyler’s mini-lecture on correlation and causation (see below) is worth putting in front of the unwashed to get ‘em up to speed.

Correlate your thoughts in the comments below then follow me on, and Facebook.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.

Copyright © 2014 IDG Communications, Inc.