The future of societal impact assessment using peer review: pre-evaluation training, consensus building and inter-reviewer reliability (pdf)

Article PDF cannot be displayed. You can download it here:

https://www.nature.com/articles/palcomms201740.pdf

The future of societal impact assessment using peer review: pre-evaluation training, consensus building and inter-reviewer reliability

ARTICLE Received 5 Apr 2016 | Accepted 25 Apr 2017 | Published 23 May 2017 DOI: 10.1057/palcomms.2017.40 OPEN The future of societal impact assessment using peer review: pre-evaluation training, consensus building and inter-reviewer reliability Gemma Derrick1 and Gabrielle Samuel1 ABSTRACT There are strong political reasons underpinning the desire to achieve a high level of inter-reviewer reliability (IRR) within peer review panels. Achieving a high level of IRR is synonymous with an efﬁcient review system, and the wider perception of a fair evaluation process. Therefore, there is an arguable role for a more structured approach to the peer review process during a time when evaluators are effectively novices in practice with the criterion, such as with societal impact. This article explores the consequences of a structured peer review process that aimed to increase inter-reviewer reliability within panels charged with assessing societal impact. Using a series of interviews from evaluators from the UK’s Research Excellence Framework conducted before (pre-evaluation) and then again after the completion of the process (post-evaluation), it explores evaluators’ perceptions about how one tool of a structured evaluation process, pre-evaluation training, inﬂuenced their approaches to achieving a consensus within the peer review panel. Building on lessons learnt from studies on achieving inter-reviewer reliability and from consensus building with peer review groups, this article debates the beneﬁts of structured peer review processes in cases when the evaluators are unsure of the criterion (as was the case with the Impact criterion), and therefore the risks of a low IRR are increased. In particular, this article explores how individual approaches to assessing Impact were normalized during group deliberation around Impact and how these relate to evaluators’ perceptions of the advice given during the pre-evaluation training. This article is published as part of a collection on the future of research assessment. 1 Centre for Higher Education Research and Evaluation, Educational Research, Lancaster University, Lancaster, UK Correspondence: (e-mail: ) PALGRAVE COMMUNICATIONS | 3:17040 | DOI: 10.1057/palcomms.2017.40 | www.palgrave-journals.com/palcomms 1 ARTICLE A PALGRAVE COMMUNICATIONS | DOI: 10.1057/palcomms.2017.40 Introduction s public expectations from research evolve to include societal outcomes as well as academic achievements, so too does the framework used to include the evaluation of these notions of research excellence. One method used to promote the legitimacy and authority of new and untested criteria such as societal impact (Derrick and Samuel, 2016b), is by employing traditional academic peer review for such assessments. As stated by the British Academy, “the essential principle of peer review is simple to state: it is that judgements about the worth or value of a piece of research should be made by those with demonstrated competence to make such a judgement” (Academy 2007). However, the employment of peer review comes at a risk that not all “experts” will be able to reach a consensus on their assessment of impact, as differing research traditions, worldviews and even methodologies alter an expert’s notion of excellence. Difﬁculty in reaching a group consensus results in low interreviewer reliability (IRR) and a longer, less efﬁcient peer review process as assessors spend more assessment time negotiating a consensus and building a committee culture (Olbrecht and Bornmann, 2010) which is necessary for guiding group evaluations. However, for evaluation processes that determine the allocation of public funds, a high degree of IRR gives the public reassurance that the evaluation process was sufﬁciently objective and accurately reﬂected agreed-upon priorities (Tan et al., 2015). Reaching a desirable level of IRR through group consensus is therefore both a political and public necessity for public research funding agencies keen to promote the legitimacy of their assessment process and criteria. The evaluation of “impact” under the UK’s Research Excellence Framework 2014 (REF2014) represented the world’s ﬁrst formal, ex-post (after the event) assessment of how research has had an impact on society beyond academia. Debate before the REF2014 focused on the variety of deﬁnitions of impact, and the consequent issues associated with its formal assessment, where there was a lack of consensus about what constituted excellence in impact. Furthermore, the inclusion of assessments of the societal outcomes from research has also necessitated that research extend its deﬁnition of who represents a research “peer” to include non academic actors. These new peers aim to contribute differing expertise and experience, and to provide a non-academic perspective about the assessment of how research inﬂuences society (Derrick and Samuel, 2016a). For many evaluation processes, the REF2014 included, there are a variety of new, non-academic actors incorporated into the panels, especially for societal impact assessment. However, extending the range of expertise available to panels to guide assessments also increases the amount of conﬂicting values and deﬁnitions of excellence, and therefore makes a group consensus more difﬁcult to achieve—especially if the criterion is new, such as with impact (Langfeldt, 2001; Samuel and Derrick, 2015; Derrick and Samuel, 2016b). Indeed, previous research has highlighted the difﬁculty associated with peer review panels resolving other, unfamiliar and potentially ambiguous criteria such as “interdisciplinary research” in light of different interpretations of the concept within the panel (Lamont, 2009; Huutoniemi, 2012). This increased level of “noise” (Danziger et al., 2011) during the evaluation process results in a low level of IRR that results in decisions taking a lot of time to achieve, inappropriate proxies being used in order to base evaluations, or assessments not being made (Cicchetti, 1991). Reviewer training before the assessment is one technique that reduces panel “noise” in session (Cicchetti, 1991) and ultimately improves IRR (Sattler et al., 2015) and the efﬁciency of the evaluation process. Very little research has focused on achieving improved IRR in peer review panels, and even less research has done so using qualitative 2 methods since previous research of IRR within peer review panels has focused solely on producing quantitative indicators of evaluator concord. In contrast to previous studies of IRR, this article explores the consequences of a structured peer review process that aims to increase IRR and ensure an efﬁcient evaluation. It does this by utilizing a series of interviews conducted with evaluators from the REF2014 prior to (per-evaluation interviews) and then again after the evaluation process had taken place (post-evaluation interviews). It explores evaluators’ perceptions about how a tool of a structured evaluation proce (...truncated)