Developing expert political judgment: The impact of training and practice on judgmental accuracy in geopolitical forecasting tournaments (pdf)

Article PDF cannot be displayed. You can download it here:

http://journal.sjdm.org/16/16511/jdm16511.pdf

Developing expert political judgment: The impact of training and practice on judgmental accuracy in geopolitical forecasting tournaments

Judgment and Decision Making, Vol. 11, No. 5, September 2016, pp. 509–526 Developing expert political judgment: The impact of training and practice on judgmental accuracy in geopolitical forecasting tournaments Welton Chang* Eva Chen† Barbara Mellers† Philip Tetlock† Abstract The heuristics-and-biases research program highlights reasons for expecting people to be poor intuitive forecasters. This article tests the power of a cognitive-debiasing training module (“CHAMPS KNOW”) to improve probability judgments in a four-year series of geopolitical forecasting tournaments sponsored by the U.S. intelligence community. Although the training lasted less than one hour, it consistently improved accuracy (Brier scores) by 6 to 11% over the control condition. Cognitive ability and practice also made largely independent contributions to predictive accuracy. Given the brevity of the training tutorials and the heterogeneity of the problems posed, the observed effects are likely to be lower-bound estimates of what could be achieved by more intensive interventions. Future work should isolate which prongs of the multipronged CHAMPS KNOW training were most effective in improving judgment on which categories of problems. Keywords: forecasting, probability judgment, training, practice, cognitive debiasing 1 Introduction Research in judgment and choice has found numerous flaws in people’s intuitive understanding of probability (BarHillel, 1980; Kahneman & Tversky, 1973, 1984; Lichtenstein, Slovic, Fischhoff, Layman & Combs, 1978; Slovic & Fischhoff, 1977; Tversky & Kahneman, 1974). We often make errors in prediction tasks by using effort-saving heuristics that are either insensitive to factors that normative theories say we should take into account or sensitive to factors that we should ignore (Kahneman & Tversky, 1977, 1982; Morewedge & Kahneman, 2010; Tversky & Kahneman, 1974). These results have sparked interest in interventions that can improve judgments (Arkes, 1991; Croskerry, Singhal & Mamede, 2013a, 2013b; Fischhoff, 1982; Lilienfeld, Ammirati & Landfield, 2009; Miller, 1969), but it reThe authors thank Lyle Ungar and Angela Duckworth for their comments as well as Pavel Atanasov, Philip Rescober and Angela Minster for their help with data analysis. Pavel Atanasov, Terry Murray and Katrina Fincher were instrumental in helping us develop the training materials as well. This research was supported by the Intelligence Advanced Research Projects Activity (IARPA) via the Department of Interior National Business Center contract number D11PC20061. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions expressed herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DoI/NBC, or the U.S. Government. Copyright: © 2016. The authors license this article under the terms of the Creative Commons Attribution 3.0 License. * Department of Psychology, University of Pennsylvania, Philadelphia, PA 19104. Email: . † University of Pennsylvania mains true that significantly less attention has been paid to “debiasing” than to biases (Arkes, 1991; Graber et al., 2012; Lilienfeld et al., 2009). Moreover, few organizations have embraced the debiasing methods that have been developed (Croskerry, 2003; Graber et al., 2012; Lilienfeld et al., 2009). Accurate probability judgments are important in domains such as law, finance, medicine and politics (Croskerry et al., 2013b; Jolls & Sunstein, 2005). For example, the U.S. justification for invading Iraq in 2003 hinged on intelligence estimates that stated with high confidence that Iraq possessed Weapons of Mass Destruction (WMD) (Director of Central Intelligence, 2002). Two years later, a bipartisan commission determined that there were no WMD in Iraq. The prewar intelligence was “dead wrong,” putting the blame on the intelligence community and politicization of the available information by a subset of policymakers (Commission on the Intelligence Capabilities of the United States Regarding Weapons of Mass Destruction, 2005). The United States would continue its involvement in the country for over a decade at an estimated cost between $4 and $6 trillion and thousands of casualties, numbers which underscore the dangers of over-confident “slam-dunk” assessments of ambiguous evidence (Bilmes, 2014). The intelligence community responded, in part, by creating a research division devoted to exploring methods of improving intelligence analysis, IARPA. The research reported here was part of four years of forecasting tournaments in which our team, the Good Judgment Project, was a competitor. Five university-based teams competed to submit the most accurate daily probability forecasts possible 509 510 Judgment and Decision Making, Vol. 11, No. 5, September 2016 Effect of training and practice on geopolitical forecasting on a range of political and economic questions, which included improving human judgments with algorithms. Additional details on the forecasting tournament, competitors and Good Judgment Project’s winning methods, was previously reported in Mellers et al. (2014); Tetlock, Mellers, Rohrbaugh and Chen (2014). We experimentally tested the efficacy of a variety of tools for improving judgment, including a cognitive-debiasing and political knowledge training regimen called “CHAMPS KNOW”. vestigated individual-difference moderators. Our study also represents one of the most rigorous tests of debiasing methods to date. The open-ended experimental task, forecasting a wide range of political and economic outcomes, is widely recognized as difficult (Jervis, 2010; Tetlock, 2005). Some political experts and commentators have portrayed it as impossible (Atkins, 2015; Taleb & Blyth, 2011). Our work does not correct all of the aforementioned conceptual and methodological problems, but we can address a significant fraction of them. The analysis reported here builds on Mellers et al. (2014). The previous article examined the first two years of the forecasting tournament and discusses several drivers of performance. Here, we focus on the effects of training and include a more in-depth analysis of all four years of the experiment. We also examine mediational mechanisms and moderator variables to understand individual differences. 1.1 Literature review A number of studies have shed light on how probability estimates and judgments can be improved (Fischbein & Gazit, 1984; Fischhoff & Bar-Hillel, 1984; Stewart, 2001; Tetlock, 2005; Whitecotton, Sanders & Norris, 1998). However, past work suffers from at least six sets of limitations: 1) overreliance on student subjects who are often neither intrinsically nor extrinsically motivated to master the task (Anderson, 1982; Petty & Cacioppo, 1984; Sears, 1986); 2) oneshot experimental ta (...truncated)