Statistician extraordinaire Nate Silver won fame for correctly predicting the outcome of the 2008 U.S. presidential election in 49 out of 50 states. And he followed that up in 2012 by nailing the winner in all 50 states.
How did he do in 2016? Well, let’s just say he wasn’t as wrong as most statisticians, as he gave Clinton a little more than a 70 percent chance of winning (not far from the Trump campaign’s own predictions), while others gave her odds of up to 99 percent.
So, in the wake of a continually surprising election season, what did the founder and editor-in-chief of FiveThirtyEight.com have to say to an audience of software and analytics professionals at New Relic’s FutureStack16 conference in San Francisco last week? Plenty, as it turns out. (Disclosure: My day job is editor-in-chief for New Relic, where I also wrote about Silver’s presentation.)
Silver addressed the surprise election results, citing the relatively large number of undecided voters and the fact that results from the swing states were correlated, not independent.
Beyond the election
But Silver focused on the strengths, weaknesses and persistent misunderstandings surrounding statistics and analytics, and he addressed the differences and similarities between the work he does and how the software industry functions. (Political polling is “moderately complex,” he said, but less so than what many software professionnals tackle every day.) Most important, he offered real-world tips on how best to use “big data” to make more accurate predictions.
Silver acknowledged the “wall of infamy” surrounding recent critical events that were not properly predicted, from the housing bubble and economic collapse in 2008 to the 9/11 terrorist attacks to the unexpectedly massive damage caused by the Fukushima earthquake and tsunami.
+ Also on Network World: Is Trump's unexpected victory a failure for big data? Not really +
He also noted high-profile false positives, including predictions of massive outbreaks of the bird flu and SARS, neither of which came to pass. Using skepticism of the computer revolution of the 1980s, Silver noted that it often takes longer than expected for new technologies to prove their worth. Many people give up at that point, he said, “but the companies out in front tend to really benefit” when those technologies do finally come of age.
Common data analysis problems
To explain the difficulties involved in making real-world predictions, Silver shared three key problems that should resonate in many software shops:
Problem 1: The more data you have, the more room there is for interpretation
Sure, more data is a good thing, but it also makes the situation more complex. In the old days, Silver said, there were only three to four election polls a week; now there are dozens of polls and aggregators, and the polling process has become politically charged.
“We’re still at a very infantile place in interpreting polls,” Silver said.
Problem 2: The signal-to-noise ratio
Playing off the title of his book (The Signal and the Noise: Why So Many Predictions Fail—but Some Don't), Silver’s point was that as data sets get larger, they undergo an exponential increase in complexity. As anyone who attends a lot of meetings knows, for example, increasing the size of a meeting from five people to 10 people far more than doubles the possible complications and obstacles. That is, the noise grows faster than the signal.
The problem, Silver explained, is that larger data sets vastly increase the chance of false positives, of correlations without causation. “And betting on correlations when you’re not sure of the causation … that’s a very dangerous bet,” he said.
Problem 3: Feature or bug?
Too often, Silver said, the role of common sense and gut instinct in predictions is misunderstood. If you apply “common sense” at the wrong time, you risk overriding the value of the data and the model through observer bias and other factors. By applying human insight at the right moment, you can correct your model as needed while still extracting maximum value from the data.
Possible solutions for making better predictions
Silver also shared suggestions on how to make better predictions:
Suggestion 1: Think probabilistically
Predictors tend to understate uncertainty, Silver said, but predictions that convey uncertainty are usually better forecasts. It can be confusing to deal with probabilities of binary outcomes, for example, but Trump is no less president because the election was close.
He added that visual aids—such as storm track maps—can often help people perceive uncertainty more clearly. “A picture can do a lot of good,” he said.
Suggestion 2: Know where you’re coming from
It’s important to understand your biases, Silver said, but most people don’t. For instance, when male and female candidates with identical qualifications apply for a job, tests show that people who say they have no gender bias actually show more gender bias.
“Avoid group think,” Silver warned. “Diversity of perspectives can mitigate risk.”
The right kind of experience can help. Silver suggested that one reason FiveThirtyEight.com was “less wrong” about the election than its competitors is because his outfit has more hands-on experience in this arena than many others. Predictors with experience making predictions with money on the line, such as poker players or sports analysts—both of which Nate Silver has been—tend to do better than academics who haven’t learned the hard way that they need to challenge their assumptions.
Suggestion 3: Try, and err
Learning and refining your models is critical to improving your predictions, Silver said, but that works best in a data-rich environment. Weather forecasting has improved dramatically, he said, because meteorologists get a lot of practice: “If you have a bad model, you get it beat out of you pretty fast. Presidential election forecasters only get one trial every four years.”
The key, Silver concluded, is to never be satisfied with where you stood before. While it’s easy for your models to be overconfident, “real data is a reality check.”