Monday, March 7, 2011

How To Measure The Accuracy of Expert Predictions?

Dan Gardner provides a thorough explanation in Future Babble (emphasis mine):
The first thing the experiment needs is a very large group of experts. The group should be as diverse as possible, with experts from different fields, different political leanings, different institutional affiliations, and different backgrounds. At the very beginning of the experiment, the experts should answer a battery of questions designed to test political orientation, world view, personality, and thinking style.  The experts must be asked clear questions whose answers can later be shown to be indisputably true or false. That means vague pronouncements about “weakening state authority” or “growing public optimism” won’t do. Even a question like “Will relations between India and Pakistan be increasingly strained?” – which is the standard language of TV pundits – isn’t good enough. Questions have to be so precise that no reasonable person would argue about what actually happened – which means asking questions like “Will the official unemployment rate be higher, lower, or the same a year from now?” and “Will India and Pakistan go to war within the next five years?” 
For each prediction, experts must state how likely they think it is to actually happen. If they are dead certain something will happen, that is a 100 per cent probability. If they are sure it won’t happen, it’s a zero per cent probability. In between these extremes, experts will be required to attach precise percentages to guesses rather than use vague terms like “improbable” or “very likely.” There’s no room for fudging when someone says, “There is a 30 per cent chance India and Pakistan will go to war within the next five years.”
The experiment must obtain a very large number of predictions from each expert in order to allow statistical analysis that can expose lucky hits for what they are. It also allows us to get past the problem of judging predictions in which the expert says the chance of something happening is, for example, “70 per cent.” If the expert is perfectly accurate, then a broad survey of his predictions will show that in 70 per cent of the cases in which he said there was a 70 per cent chance of something happening, it actually happened. Similarly, 60 per cent of the outcomes said to have a 60 per cent chance of happening should have happened. This measure of accuracy is called “calibration.” But there’s more to the story than calibration. After all, someone who sat on the fence with every prediction – “Will it happen? I think the odds are 50/50” – would likely wind up with a modestly good calibration score. We can get predictions like that from a flipped coin.  
What we want in a forecaster, ideally, is someone with a godlike ability to predict the future. The gods don’t bother with middling probabilities and they certainly don’t say, “The odds are 50/50.” The gods say, “This will certainly happen” or “This is impossible.” So there must be a second measure of accuracy to go along with calibration. Experts should be scored by confidence. This means that an expert who said there is a 100 per cent chance of something happening that actually did happen would score more points than another expert who had said there was only a 70 per cent chance of it happening. This measure is called “discrimination.”  
A third measure must also be generated by answering the same questions that are put to the experts using a variety of simple and arbitrary rules. For example, there is the “no change” rule: No matter what the question is, always predict there will be no change. These results will create benchmarks against which the experts’ results can be compared. And finally, the experiment must continue over the course of many years. That will allow for questions involving time frames ranging from the short term – one to two years – to longer-term predictions covering five, ten, even twenty years, ensuring that the experiment will require experts to make predictions in times of stability and surprise, prosperity and recession, peace and war.  
When the passage of time has revealed the correct answers, they should show how well the experts did. And be given the opportunity to explain the results. It’s difficult to exaggerate how demanding this experiment would be. It would be expensive, complicated, and require the patience of Job. But most of all, it would require a skilled and devoted researcher prepared to give a big chunk of his life to answering one question: How accurate are expert predictions?  
Fortunately, there is such a researcher. He is Philip Tetlock.

No comments:

Post a Comment