Superforecasting: The Art and Science of Prediction

Nearly everyone has fantasized about getting a winning lottery ticket, even if we don’t play the lottery. The number one desire for a time traveling machine may be to go into the future and get the list of lottery numbers, so we can come back and win. While these are not the kinds of approaches that are addressed in Superforecasting: The Art and Science of Prediction (because time travel is still fictional), there are lessons to be learned and skills to be developed to allow us to more accurately predict non-random events.

Foxes and Hedgehogs

There’s an age-old debate about whether one should be focused in their knowledge becoming the penultimate expert about one thing – or whether one should become a generalist and know about multiple things. I wrote about this debate most recently in my post Should You Be the Fox or the Hedgehog? I challenged the notion that being the expert at one thing was the right answer. Other works, like Range, reached a similar conclusion that being more generally minded and flexible was a better strategy.

There is, however, embedded in this idea of being a generalist that you still develop some level of mastery, that you exceed the competence bar. Michelangelo wasn’t a mediocre sculptor or painter. He developed serial mastery. (See The Medici Effect for more on how what we now call the Renaissance man was shaped by the environment.)

Phillip Tetlock and Dan Gardner didn’t set out to answer the question about generalists or specialists. The specific goal was to find ways to improve judgement – or, said differently, predictions.

Expert Judgements and Dart-Throwing Chimpanzees

Imagine your average political pundit in one room and a dart-throwing chimpanzee in another. Put a set of possible outcomes on the wall and ask them to pick one – the pundit based on experience, knowledge, and foresight, and the chimpanzee based on dart-throwing abilities or lack thereof. The predictive capacity of both is roughly equivalent. This was the joke was made about Tetlock’s previous work trying to determine the predictive capacity of experts.

Of course, some political pundits did better than the hypothetical chimps, but not many of them. There were too few people who were able to demonstrate predictive capacity, and the degree of predictive capacity of a pundit was inversely correlated to their popularity. That is, the more likely to have a regular spot on the news, the less likely that the predictions would be newsworthy.

While this makes an interesting story, it doesn’t explain why we can’t predict, nor does it help us teach people how to predict better.

The Killer Butterfly

It was Edward Lorenz’s 1972 paper “Predictability: Does the Flap of a Butterfly’s Wings in Brazil Set Off a Tornado in Texas?” that awoke the consciousness that some things are not predictable – at least not in the infinite future. The problem is that the amount of knowledge necessary to predict scales exponentially as we try to predict the far future. The weatherman may be reasonably accurate about tomorrow, but predicting what the weather will be on a given day next year – or even next month – is well beyond our capacity at the moment.

Some situations are sufficiently complex that predicting their outcome at a distance is difficult or impossible. This is beyond the “black swan” effect – that we’ve never seen it. (See The Black Swan for more.) Consider the kinds of questions that the intelligence community has to answer every day to protect our nation’s interests.

Intelligence Community

Nestled on an Army base in Maryland is the National Security Agency (NSA) main campus. Here, many agencies that make up the intelligence community regularly congregate to discuss the matters that threaten our nation’s interests. Part of their job is to identify potentially threating situations and their probabilities.

Whether it’s the probability of The Soviet Union deploying missiles to Cuba or the probability of Iraq possessing weapons of mass destruction, it’s the intelligence community’s job to estimate and make recommendations based on those predictions. The problem is that the weight of the accuracy of these predictions is often measured in human lives.

The missiles were already in Cuba when we estimated they’d not be deployed and, as we found out, Iraq didn’t have any weapons of mass destruction – however, that didn’t stop the invasion and the resulting deaths of service men and women nor the disruption to a country, the ripples of which still continue.

At a logical level, it makes sense to improve predictions as much as possible, even if emotionally it’s threatening to the ego to find out that your best predictors aren’t that good. IARPA – Intelligence Advanced Research Projects Activity – funded a competition to assess the kinds of predictions they needed to make every day.

Measuring Predictions

Most predictions are remarkably vague. Statements like “I predict things will get better” lack both the criteria for what “better” means as well as the timeframe under which these results will appear. That’s why the questions that were framed to predictors were much more precise. What specific action will happen over exactly what timeframe? Will the ruler of Egypt be removed or replaced in the next six months? Estimate a probability that it will happen, from zero percent (not possible) to 100 percent (certain). These specific predictions could be proven successful –or unsuccessful. It’s an objective – rather than subjective – measure to hit.

With the help of a mathematician, Glenn Brier, there was a way to aggregate the results of probability-based predictions into a single score that can be used to compare the relative accuracy of various predictors over many predictions. A Brier score of zero is perfect and a score of one is the worst possible score. The goal is, of course, to get as close to zero as possible.

Good Judgement Project

What if you took a large number of volunteers who had nothing to gain from the project but a $250 gift card and asked them to make prediction after prediction about a variety of topics for which it would be impossible for them to be the expert on all of them? The answer is that some of them would do substantially better at predicting events than even seasoned predictors who had access to classified information.

This is, of course, the goal of the IARPA competition. In the short term, show how poor the predictions are, but, more importantly, learn from what others are doing right and become better.

Makeup of a Superforecaster

Of the forecasters who were exceptionally good, what characteristics could be identified? What was making the superforecasters so good? It wasn’t one single answer. There were many things it wasn’t about, including expertise. Instead, it seems as if the superforecasters were the very kind of people that Harry Truman wouldn’t have liked. He’s often quoted as having said, “Give me a one-handed economist. All my economists say ‘on one hand…’; then ‘but on the other…’” Truman’s economists were considering multiple perspectives, multiple factors, and multiple frameworks.

The superforecasters, it seems, were able to do this for the kinds of problems they were being asked to predict. They’d do the research, look at the news, and then they’d synthesize many different points of view into a single probability about how likely something was to happen. It’s like they were able to pull in a chorus of voices inside their head. (See The Difference for more about how diverse teams make better decisions.)

Working in Teams

If superforecasters were being effective on their own by bringing together a set of diverse perspectives, how then might they do in teams? As it turns out, even better. Though there’s always concern for social loafing – that is, living on the backs of others – that didn’t seem to happen much. (See The Evolution of Cooperation and Collaboration for more on social loafing.) To some extent, the idea that the Good Judgement Project (Tetlock’s project for the IARPA competition) didn’t see social loafing isn’t surprising, since most of the people participating were effectively doing it for fun not reward.

It seems that the team’s ability to gather more research and more completely flesh out paths of thought allowed the team to make better predictions, even if the team never had a chance to meet face to face. That’s great news for the kinds of remote teams we see in organizations today.

The Secret to Success

The real secret to the Good Judgement Project – and to the IARPA’s exercise – was finding a way to keep score. By addressing specific measurable predictions and avoiding the vagaries that exist in horoscopes and most television pundits’ proclamations, it was possible for everyone to see what was working and what wasn’t. It was possible to find the truth in the middle of the uncertain world we all operate in.

The Truth

So, can superforecasters predict the future? With some accuracy in the short term, yes. With accuracy in the long term, no. Where, then, does that leave us in terms of our best course of action? In short, we cannot hope to know the future. We must retain the capacities that have made us the dominant lifeform on Earth. We must ensure that we do not lose our ability to learn, grow, and adapt to the changing conditions of the future. Even Superforecasting isn’t enough to ensure future success. But it may be a start.