The Signal and the Noise: Why So Many Predictions Fail-but Some Don’t (Predictions)

People make predictions all the time. They predict that their team will win the Super Bowl, or they’ll win the lottery. These predictions are based on little more than hope. The Signal and the Noise: Why So Many Predictions Fail- but Some Don’t seeks to set us on the right path to understanding what we can learn from data, what we can infer from data, and what we can’t. By looking at the power and weaknesses of statistics, including both using the wrong model and supplying bad data, we can see how statistics has the power to improve our lives through productive forecasts and predictions.

In this part of the two-part review, we’ll look at predictions.

Forecasts and Predictions

Sometimes in our rush to be amazed at something, we simplify the questions we ask. We fail to recognize that our brain has simplified the thing that we’re trying to sort out (see Thinking, Fast and Slow for more on substitution). In the case of looking into the future, what we really want is prediction, and what statistics gives us most frequently is a forecast. Forecasts necessarily have a certain amount of error and involve statistical relationships. Forecasts become predictions when they become specific and precise.

Each day when we look at the weather, what we want is a soothsayer to predict what the weather will be like. However, what they offer us is a forecast based on models that result in a chance of rain somewhere between zero and 100%. We look at economists and seek the answer about whether we’ll make more money next year – or not. We want to know whether a risky investment will be worth it. However, economists and meteorologists are subject to the same rules as any other statistician.

While it’s true that statistics can predict – as long as we’re using this in a general sense of the word – events that are to happen in the future, there must always be some level of uncertainty as to whether the event will happen – or not. Predictions are just an attempt to refine forecasts into specific, tangible probable outcomes. Sometimes that process is successful but often it is not.

Falsifiable by Prediction

Karl Popper suggested that every forecast should be falsifiable via prediction. To test a model, you needed to be able to make some sort of a prediction with it that then could be proven false. In this way, you could create a test to ensure that your model was accurate and useful. A model that doesn’t forecast appropriately and that you can’t make a prediction from doesn’t do much good.

Everything Regresses to the Mean

One thing about statistics is that it can tell you with relative authority things you want to know with less precision than is useful. Statisticians can forecast the economy but not predict whether you will get a raise or not. The Black Swan artfully points out the challenges of statistics and modeling when the sampling size is insufficient. Until you’ve seen a black swan, you’ve not sampled enough to make the statistical models work. Until you’ve sampled enough, the noise will dramatically pull your results askew.

With large sample sizes, everything regresses to the mean. We no longer see the outlier, even as something that is distinct and that does happen, rather it gets lost in the law of averages. Tragic events like 9/11 are never forecast using the wrong model. They’re not perceived as possible if they’re averaged into the data. It’s like the proverbial statistician drowning in a river that is, on average, only 3 feet deep – all the depth of the data was averaged out.

Right Model, Right Results

Perhaps the most difficult challenge when working with data is not the data collection process. Collecting data is tedious and needs to be done with meticulous attention to detail; however, it’s not necessarily imaginative, creative, or insightful. It’s the work that must be done to get to the magic moment when the right model is uncovered for working with data. Though statisticians have ways of evaluating different models for their ability to predict the data, they must see some inherent signal in the noise.

For a long time, we couldn’t find planets outside of our solar system. One day, someone identified a detection model – that is, they discovered a theory for the strange oscillations in the light frequency from distant stars. The theory proposed that super-massive planets in close orbit were causing the star to move. This created a Doppler effect with the light from the star causing what we perceived as light frequency oscillations. Consensus coalesced, and the scientific community agreed that this was indeed what was happening. We had found the first extra-solar planet. Almost immediately, we found nearly a dozen more.

These super-massive planets were hiding in the data we already had. We had already captured and recorded the data to indicate the presence of other planets, but we didn’t have a model to process the data that we had to allow us to understand it.

There were plenty of ideas, thoughts, theories, and models which were tried to explain the light variations, but it wasn’t until the consideration of a super-massive planet that we settled on a model that was right.

The Failure of Predictions

We got lucky finding extra-solar planets. The right idea at the right time. It was a good fit model. It wasn’t a specific prediction. With predictions, our luck is very, very poor. The old joke goes, “Economists have predicted nine of the last six recessions.” They predicted a recession where none happened. Earthquakes and other disastrous cataclysmic events are predicted with startling frequency. It seems that everyone has some prediction of something. Sometimes the predictions are harmless enough, like whose team will win the super bowl. Sometimes the consequences are much direr.

Disease

When you think in systems, delays are a very bad thing. Delays make it harder for the system to react to a change in circumstances. In the case of the SR-71 Blackbird, the delays in a mechanical system made engine unstarts a regular occurrence. Reduce the delay with electronic controls and the unstart problem is dramatically reduced. (See The Complete Book of the SR-71 Blackbird for more.) In the creation of vaccines, the delay is great. To scale up production and get enough doses for the country, it takes six months.

What makes the vaccination “game” worse is that vaccines are designed to target specific viral strains. If the virus mutates, the hard work of creating the vaccine may be wasted, as it may become ineffective at protecting against the new strain. Each year, the vaccine makers attempt to predict which variations of influenza will be the most challenging. They start cooking up batches of vaccines to combat the most virulent.

What happens, however, when you get noise in the identification of the influenza that will be the most impactful? From 1918 to 1920, swine flu afflicted roughly one-third of humanity and killed over 50 million. So when there was an apparent outbreak of a strain of it at Fort Dix, who can blame President Ford for encouraging the vaccine industry to create a vaccine for it and encouraging every American to do their part in preventing the spread of the disease by getting vaccinated – and hopefully increasing the herd immunity?

It turns out it was all a bad call. Issues with the vaccine caused Guillain–Barré disease in some. The virus strain turned out to not be that virulent. The noise at Fort Dix that had produced the scare wasn’t a result of the virus’s potential but was instead a result of environmental and cultural factors that allowed the disease to spread at Fort Dix but weren’t generalizable to the population.

SIR

A classic statistical way of modeling diseases is the SIR model, which is an acronym for susceptible, infected, and recovered. The assumption is made that everyone who is recovered is not susceptible again, and everyone has an equal level of susceptibility. This simplified model works relatively OK for measles, but fails to account for natural variations in susceptibility in humans. More importantly, the model fails to account for the connections that we have with each other. It fails to account for how we interact.

Another classic example of disease was cholera in London, but it didn’t seem to have any connections. There was no discernable pattern – that is, until John Snow discovered a connection in the Broad Street well and removed the pump handle. The disease slowly dissipated, as Snow had correctly identified the root cause. However, his job wasn’t easy, because people who were far away from the pump were getting sick. Those who weren’t close to the Broad Street pump had hidden connections. Sometimes they lived near the pump in the past and still used it for their main water source; in other cases, they had relatives close by. The problem with forecasting diseases is the hidden patterns that make it hard to see the root cause. To correctly forecast, we need to find and then use a correct model.

An Inconvenient Truth

It’s an inconvenient truth that, in the decade when An Inconvenient Truth was released, there was no substantial change in temperatures across the planet – in truth, there was an infinitesimal reduction in temperature from 2001 to 2011. However, Gore wasn’t the first to claim that there were problems. In 1968, Paul and Anne Ehrlich wrote The Population Bomb. It was 1974 when Donella Meadows (who also wrote Thinking in Systems), Jorgen Randers, and Dennis Meadows first published Limits to Growth. (It’s still on my reading list.) These books both sought to predict our future – one with which the authors were most concerned. Of course, population is increasing, but it’s far from a bomb, and we’ve not yet reached the feared limits to growth.

These predictions missed what Everett Rogers discovered when working with innovations. In Diffusion of Innovations, he talks about the breakdown of society created by the introduction of steel axe heads in aboriginal tribes in Australia. They missed the counter-balancing forces that cause us to avoid catastrophe. However, presenting a balanced and well-reasoned point of view isn’t sensational, and therefore doesn’t sell books, nor does it make TV exciting. The McLaughlin Group pundits’ forecasts about political elections are not at all well-reasoned, balanced, or even accurate – but that doesn’t stop people from tuning into what amounts to be a circus performance every week.

So the real inconvenient truth is that our predictions fail. That we overestimate, and we ignore competing forces that attempt to bring a system into balance. In fairness to Gore, the global temperature on a much longer trend seems to be climbing at 1.5 degrees centigrade per year. It’s just that there’s so much noise in the signal of temperatures that it’s hard to see – even over the course of a decade. We need to be concerned, but the sky isn’t falling.

Watching the Weather

If you want to find a prediction that’s guaranteed to be wrong, it’s got to be the weather. The oft quoted remark “What job can you be wrong most of the time and still keep your job?” refers to meteorologists. However, in truth, forecasts are substantially better than they were even a decade ago. They’ve done a startlingly good job of eliminating the problems with the mathematical models that generate weather forecasts. Increases in processing power has made it more possible to create more accurate and more precise forecasts. And they’re still frequently wrong. A wise weatherman goes outside and looks at the sky before going on air to share their predictions, because they know that the computer models can be wrong.

The problem isn’t the model. The problem isn’t our ability to model what will happen with the forces of nature. The problem is in our ability to measure precisely the inputs for the model and the inherent dynamic instability of the systems. It was Lorenz that first started the conversation about the butterfly effect. That is, a butterfly in Brazil can set off a tornado in Texas. That’s a mighty powerful butterfly – or the result of an inherently unstable and dynamic system. A very small change in input has a very large change in output.

As a quick aside, this is where the hash algorithms have their roots. We use hash algorithms to ensure that messages aren’t tampered with. They work by small changes in input resulting in large changes in the output.

The problem with predicting the weather, then, isn’t that we don’t know how to process the signal and arrive at the desired outcome. The problem is that we can’t get a precise enough signal to eliminate all the noise.

Overfitting and Underfitting

In attempts to find the models that perfectly describe the data, we run the risk of two sides of the same coin. On the one hand, we can overfit the data and try to account for every variation in the dataset. Or we can look for mathematical purity and simplicity and ignore the outliers – this is “underfitting.”

“Overfitting” mistakes noise for signal. An attempt is made to account for the randomness of noise inside the signal we’re trying to process. The result is that our ultimate predictions try to copy the same randomness that we saw in our sample data. In other words, we’ve mistaken the noise for the signal and could not eliminate it.

Underfitting, on the opposite side of the coin, is the inability to distinguish the signal in the noise. That is, we ignore data that is real signal, because it looks like noise. In a quest for mathematical simplicity, we ignore data that is inconvenient.

Brené Brown speaks of her scientific approach to shame and vulnerability as grounded theory and the need to fit every single piece of data into the framework. (See The Gifts of Imperfection for more.) When I first read this, it stood in stark contrast to what I saw with scientists ignoring data that didn’t fit their model. It seems like too many scientists are willing to ignore the outliers, because their theory doesn’t explain it. In other words, most scientists, in my experience, tend to underfit the data. They are willing to allow data to slip through their fingers for the elegance of a simpler model. Brown and those who follow the grounded theory approach may be making the opposite error in overfitting their data.

Statistical Models

In the next part of this review, we’ll talk about models and statistics.