The future of societal impact assessment using peer review: pre-evaluation training, consensus building and inter-reviewer reliability
ARTICLE
Received 5 Apr 2016 | Accepted 25 Apr 2017 | Published 23 May 2017
DOI: 10.1057/palcomms.2017.40
OPEN
The future of societal impact assessment using
peer review: pre-evaluation training, consensus
building and inter-reviewer reliability
Gemma Derrick1 and Gabrielle Samuel1
ABSTRACT There are strong political reasons underpinning the desire to achieve a high
level of inter-reviewer reliability (IRR) within peer review panels. Achieving a high level of IRR
is synonymous with an efficient review system, and the wider perception of a fair evaluation
process. Therefore, there is an arguable role for a more structured approach to the peer
review process during a time when evaluators are effectively novices in practice with the
criterion, such as with societal impact. This article explores the consequences of a structured
peer review process that aimed to increase inter-reviewer reliability within panels charged
with assessing societal impact. Using a series of interviews from evaluators from the UK’s
Research Excellence Framework conducted before (pre-evaluation) and then again after the
completion of the process (post-evaluation), it explores evaluators’ perceptions about how
one tool of a structured evaluation process, pre-evaluation training, influenced their
approaches to achieving a consensus within the peer review panel. Building on lessons learnt
from studies on achieving inter-reviewer reliability and from consensus building with peer
review groups, this article debates the benefits of structured peer review processes in cases
when the evaluators are unsure of the criterion (as was the case with the Impact criterion),
and therefore the risks of a low IRR are increased. In particular, this article explores how
individual approaches to assessing Impact were normalized during group deliberation around
Impact and how these relate to evaluators’ perceptions of the advice given during the
pre-evaluation training. This article is published as part of a collection on the future of
research assessment.
1 Centre for Higher Education Research and Evaluation, Educational Research, Lancaster University, Lancaster, UK
Correspondence: (e-mail: )
PALGRAVE COMMUNICATIONS | 3:17040 | DOI: 10.1057/palcomms.2017.40 | www.palgrave-journals.com/palcomms
1
ARTICLE
A
PALGRAVE COMMUNICATIONS | DOI: 10.1057/palcomms.2017.40
Introduction
s public expectations from research evolve to include
societal outcomes as well as academic achievements, so
too does the framework used to include the evaluation of
these notions of research excellence. One method used to
promote the legitimacy and authority of new and untested
criteria such as societal impact (Derrick and Samuel, 2016b), is by
employing traditional academic peer review for such assessments.
As stated by the British Academy, “the essential principle of peer
review is simple to state: it is that judgements about the worth or
value of a piece of research should be made by those with
demonstrated competence to make such a judgement” (Academy
2007). However, the employment of peer review comes at a risk
that not all “experts” will be able to reach a consensus on their
assessment of impact, as differing research traditions, worldviews
and even methodologies alter an expert’s notion of excellence.
Difficulty in reaching a group consensus results in low interreviewer reliability (IRR) and a longer, less efficient peer review
process as assessors spend more assessment time negotiating a
consensus and building a committee culture (Olbrecht and
Bornmann, 2010) which is necessary for guiding group evaluations. However, for evaluation processes that determine the
allocation of public funds, a high degree of IRR gives the public
reassurance that the evaluation process was sufficiently objective
and accurately reflected agreed-upon priorities (Tan et al., 2015).
Reaching a desirable level of IRR through group consensus is
therefore both a political and public necessity for public research
funding agencies keen to promote the legitimacy of their
assessment process and criteria.
The evaluation of “impact” under the UK’s Research Excellence
Framework 2014 (REF2014) represented the world’s first formal,
ex-post (after the event) assessment of how research has had an
impact on society beyond academia. Debate before the REF2014
focused on the variety of definitions of impact, and the
consequent issues associated with its formal assessment, where
there was a lack of consensus about what constituted excellence in
impact. Furthermore, the inclusion of assessments of the societal
outcomes from research has also necessitated that research extend
its definition of who represents a research “peer” to include non
academic actors. These new peers aim to contribute differing
expertise and experience, and to provide a non-academic
perspective about the assessment of how research influences
society (Derrick and Samuel, 2016a). For many evaluation
processes, the REF2014 included, there are a variety of new,
non-academic actors incorporated into the panels, especially for
societal impact assessment.
However, extending the range of expertise available to panels to
guide assessments also increases the amount of conflicting values
and definitions of excellence, and therefore makes a group
consensus more difficult to achieve—especially if the criterion is
new, such as with impact (Langfeldt, 2001; Samuel and Derrick,
2015; Derrick and Samuel, 2016b). Indeed, previous research has
highlighted the difficulty associated with peer review panels
resolving other, unfamiliar and potentially ambiguous criteria
such as “interdisciplinary research” in light of different
interpretations of the concept within the panel (Lamont, 2009;
Huutoniemi, 2012). This increased level of “noise” (Danziger
et al., 2011) during the evaluation process results in a low level of
IRR that results in decisions taking a lot of time to achieve,
inappropriate proxies being used in order to base evaluations, or
assessments not being made (Cicchetti, 1991). Reviewer training
before the assessment is one technique that reduces panel “noise”
in session (Cicchetti, 1991) and ultimately improves IRR (Sattler
et al., 2015) and the efficiency of the evaluation process. Very
little research has focused on achieving improved IRR in peer
review panels, and even less research has done so using qualitative
2
methods since previous research of IRR within peer review panels
has focused solely on producing quantitative indicators of
evaluator concord.
In contrast to previous studies of IRR, this article explores the
consequences of a structured peer review process that aims to
increase IRR and ensure an efficient evaluation. It does this by
utilizing a series of interviews conducted with evaluators from the
REF2014 prior to (per-evaluation interviews) and then again after
the evaluation process had taken place (post-evaluation interviews).
It explores evaluators’ perceptions about how a tool of a structured
evaluation proce (...truncated)