The Signal and the Noise: Why So Many Predictions Fail-but Some Don’t (Statistics and Models)

In the first part of this review we spoke of how people make predictions all the time. The Signal and the Noise: Why So Many Predictions Fail- but Some Don’t has more to offer than some generic input on predictions, it has a path for us to walk about the models and statistics we can use to make better predictions.

All Models are Wrong but Some are Useful

Statistician George Box famously said, “All models are wrong, but some are useful.” The models that we use to process our world are inherently wrong. Every map inherently leaves out details that shouldn’t be important – but might be. We try to simplify our world in ways that are useful and that our feeble brains can process. Models allow us to simplify our world.

Rules of thumb – or heuristics – allow a simple reduction of a complex problem or system. In this reduction, they are, as Box said, wrong. They do not and cannot account for everything. However, at the same time, they can be useful.

The balance between underfitting and overfitting data is in creating a model that’s more useful and less wrong.

Quantifying Risks

Financial services, including investments and insurance, are tools that humans have designed to make our lives better. The question is, making whose lives better? Insurance provides a service in a world where we’re disconnected and we don’t have a community mentality where we support each other. In Hutterite communities – which is a division of the Anabaptist movement like the Amish and Mennonites – all property is owned in community. In a large enough community, the loss of one barn or one building is absorbed through the community. However, that level of community support doesn’t exist in many places in the modern world.

Insurance provides an alternative relief for catastrophic losses. If you lose a house or a barn or something of high value, insurance can provide a replacement. To do this, insurance providers must assess risk. That is, they must forecast their risk. The good news is that insurance providers can write many insurance policies with an expected risk and see how close they get to calculating the actual risk.

Starting with a break-even point, the insurance company can then add their desired profit. For those people and organizations that believe there’s good value in the insurance, their assessment of risk or willingness to accept risk is such that they want the insurance buy it. Given that people are more impacted by loss than by reward, it’s no wonder that insurance is a booming business. (See Thinking, Fast and Slow for more on the perceived impact of loss.)

The focus then becomes on the ability of the insurance company to quantify their risk. The more accurately they can do this, and take reasonable returns, the more policies they can sell and the more money they can make. Risk, however, is difficult to quantify, ignoring for the moment black swan events (see The Black Swan for more). You still must first separate the signal from the noise. You must be able to tell what is the rate of naturally-occurring events, and which events are just normal random deviations from this pattern.

Next, the distribution of the randomness must be assessed. What’s the probability that the outcome will fall outside of the model? When referring to the stock markets, John Maynard Keynes said, “The market can stay irrational longer than you can stay solvent.” The same applies to insurance: you must be able to weather the impact of a major disaster and still stay solvent. Whether it’s a particularly difficult tornado season or a very bad placement of a hurricane, the perceived degree of randomness matters.

Then you have the black swan events, the events that you’ve never seen before. These are the events that some say should never happen. However, many of the times when this has been used, the risk was well-known and discussed. A hurricane hitting New Orleans was predicted and even at some level prepared for – though admittedly not prepared for well enough. This is not a true black swan, or completely unknown and unpredictable event. It and other purported black swan events were, in fact, predicted in the data.

When predicting risks, you have the known risks and the unknown risks. The black swan idea focuses on the unknown risks, those for which there’s no data that can be used to predict the possibility. However, when we look closely, many of these risks are predictable – we just choose to ignore them, because they’re unpleasant. The known risks – or, more precisely, the knowable risks – are the ones that we accept as a part of the model. The real problem comes in when we believe we’ve got a risk covered, but, in reality, we’ve substantially misrepresented it.

Earthquakes and Terrorist Attacks

Insurance can cover the threat of earthquakes and the threat of terrorist attacks. However, how can we predict the frequency and severity of both? It turns out that both obey a similar pattern. Though most people are familiar with Edward Richter’s scale for earthquake intensity, few realize that it’s an exponential scale. That is, the difference in magnitude between a 4.1 and a 5.1 earthquake isn’t 25% more energy released, it’s 10 times more. Thus, the difference between a magnitude 6.1 and an 8.1 earthquake is 100 times more energy released.

This simple base-10 power rule is an elegant way to describe the release of energy that can be dramatically different. What’s more striking is that there is a line that moves from the frequency of smaller earthquakes to larger ones on this scale. It forecasts several large earthquakes for a given period of time. Of all the energy released in all the earthquakes from 1906 to 2005, just three large earthquakes—the Chilean earthquake of 1960, the Alaskan earthquake of 1964, and the Great Sumatra Earthquake of 2004—accounted for almost half the total energy release of all earthquakes in the world. They don’t happen frequently, but these earthquakes make sense when you look at the forecast along the line of frequency of smaller earthquakes.

Strikingly, terrorist attacks follow the same power law. The severity rises as frequency decreases. The 9/11 attacks are predictable with the larger framework of terrorism in general. There will be, from time to time, larger terrorist attacks. While the specific vector from which an attack will come or the specific fault line will cause an earthquake will be unknown, we know that there’s a deceasing frequency of large events.

Industrial and Computer Revolutions

If you were to try to map the gross domestic product by person, the per-person output would move imperceptibly up over the long history of civilization, right up to the point of the industrial revolution when something changes. Instead of all of us struggling to survive, we started to produce more value each year.

Suddenly, we could harness the power of steam and mechanization to improve our lives and the lives of those we care about. We were no longer reduced to living in one-room houses as large, extended families and began to have a level of escape from the threat of death. (See The Organized Mind for more on the changes in our living conditions.) Suddenly, we had margin in our lives to pursue further timesaving tools and techniques. We invested some of our spare capacity into making our lives in the future better – and it paid off.

Our ability to generate data increased as our prosperity did. We moved from practical, material advances to an advance in our ability to capture and process data with the computer revolution. After a brief dip in overall productivity, we started leveraging our new-found computer tools to create even more value.

Now the problem isn’t capturing data. The Internet of Things (IoT) threatens to create mountains of data. The problem isn’t processing capacity. Moore’s law suggests the processing capacity of an individual microchip doubles roughly every 18 months. While this pattern (it’s more of a pattern and less of a law) is not holding as neatly as it was, processing capacity far outstrips our capacity to leverage it. The problem isn’t data and processing. The problem is our ability to identify and create the right models to process the information with.

Peer Reviewed Paucity

The gold standard for a research article is a peer-reviewed journal. The idea is that if you can get your research published in a peer reviewed journal, then it should be good. The idea is, however, false. John Loannidis published a controversial article “Why Most Published Research Findings Are False,” which shared how research articles are often wrong. This finding was confirmed by Bayer Laboratories when they discovered they could not replicate two-thirds of the findings.

Speaking as someone who has a published peer-reviewed journal article, the reviews are primarily for specificity and secondarily for clarity. The findings – unless you make an obvious statistical error – can’t be easily verified. While I have done thousands of pages of technical editing over the years where I would verify the author’s work, I could test their statements easily. For the most part, being a technical editor means verifying that what the author is saying isn’t false and making sure that the code they were writing would compile and run.

However, I did make a big error once. We were working on a book that was being converted from Visual Basic to Visual C++. The book was about developing in Visual Basic and how Visual Basic can be used with Office via Visual Basic for Applications. There was a section in the introduction where search and replace done by the author said that there was Visual C++ for Applications. Without anything to verify, and since the book was working on a beta of the software for which limited information was available, I let it go without a thought. The problem is that there is no Visual C++ for Applications. I should have caught it. I should have noticed that it wasn’t something that made sense, but I didn’t.

Because the ability to validate wasn’t easy – I couldn’t just copy code and run a program – I failed to validate the information. Peer-reviewed journals are much the same thing. It’s not easy to replicate experimental conditions. Even if you could replicate experimental conditions, you’re likely to not get exactly the same results. So, consequently, reviewers don’t try to replicate the results, and that means we don’t really know whether the results can be replicated – particularly, using the factors that the researcher specifies.

On Foxes and Hedgehogs

There’s a running debate on whether you should be either a fox – that is, know a little about many things – or a hedgehog – that is know a lot about one thing. Many places like Peak tell of the advantages of focused work on one thing. The Art of Learning follows this pattern in sharing Josh Waitzkin’s rise to both chess and martial arts. However, when we look at books on creativity and innovation like Creative Confidence, The Medici Effect, and The Innovator’s DNA, the answer is the opposite. You’re encouraged to take a bite out of life’s sampler platter – rather than roasting a whole cow.

When it comes to making predictions, foxes with their broad experiences have a definite advantage. They seem to be able to consider multiple approaches to the forecasting problem and look for challenges that the hedgehogs can’t see. I don’t believe that the ability to accurately forecast is a reason to choose one strategy over another – but it’s interesting. Foxes seem to be able to see the world more broadly than the hedgehogs.

The Danger of a Lack of Understanding

There’s plenty of blame to go around for the financial meltdown of 2008. There’s the enforcement of the Community Reinvestment Act (CRA) and the development of derivatives. (I covered correlation and causation and the impact on the meltdown in my review of The Halo Effect.) The problem that started with some bad home loans ended with bankruptcies as financial services firms created derivatives from the mortgages.

These complicated instruments were validated with ratings agencies, but were sufficiently complex that many of the buyers didn’t understand what they were buying. This is always a bad sign. When you don’t understand what you’re buying, you end up relying on third parties to ensure that your purchase is a good one – and when they fail, the world comes falling down, with you left holding the bag.

The truth is that there is always risk in any prediction. Any attempt to see if there’s going to profit or loss in the future is necessarily filled with risk. We can’t believe anyone that says that there is no risk.

Bayes Theorem

I’m not statistician. However, I can follow a simple, iterative formula to continue to refine my estimates. It’s Bayes theorem, and it can be simplified to:

Prior Probability	(Variable)	(Value)
Initial estimate of probability	X
New Event
Probability of event if yes	Y
Probability of event if no	Z
Posterior Probability
Revised Estimate	XY —— xy + z(1-x)

You can use the theorem over and over again as you get more evidence and information. Ultimately, it allows you to refine your estimates as you learn more information. It is, however, important to consider the challenge of anchoring, as discussed in Thinking, Fast and Slow and How to Measure Anything.

The Numbers Do Not Speak for Themselves

Despite the popular saying, the numbers do not, and never do, speak for themselves. We’re required to apply meaning to the numbers and to speak for them. Whatever we do, however we react, we need to understand that it’s our insights that we’re applying to the data. If we apply our tools well, we’ll get valuable information. If we apply our tools poorly, we’ll get information without value. Perhaps if you have a chance to read, you’ll be able to separate The Signal and the Noise.