The emergence of a field: a network analysis of research on peer review

Scientometrics, Oct 2017

This article provides a quantitative analysis of peer review as an emerging field of research by revealing patterns and connections between authors, fields and journals from 1950 to 2016. By collecting all available sources from Web of Science, we built a dataset that included approximately 23,000 indexed records and reconstructed collaboration and citation networks over time. This allowed us to trace the emergence and evolution of this field of research by identifying relevant authors, publications and journals and revealing important development stages. Results showed that while the term “peer review” itself was relatively unknown before 1970 (“referee” was more frequently used), publications on peer review significantly grew especially after 1990. We found that the field was marked by three development stages: (1) before 1982, in which most influential studies were made by social scientists; (2) from 1983 to 2002, in which research was dominated by biomedical journals, and (3) from 2003 to 2016, in which specialised journals on science studies, such as Scientometrics, gained momentum frequently publishing research on peer review and so becoming the most influential outlets. The evolution of citation networks revealed a body of 47 publications that form the main path of the field, i.e., cited sources in all the most influential publications. They could be viewed as the main corpus of knowledge for any newcomer in the field.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://link.springer.com/content/pdf/10.1007%2Fs11192-017-2522-8.pdf

The emergence of a field: a network analysis of research on peer review

By collecting all available sources from Web of Science, we built a dataset that included approximately The emergence of a field: a network analysis of research on peer review Vladimir Batagelj 0 1 2 3 Anusˇka Ferligoj 0 1 2 3 Flaminio Squazzoni 0 1 2 3 Vladimir Batagelj 0 1 2 3 0 Faculty of Social Sciences, University of Ljubljana , Kardeljeva pl. 5, 1000 Ljubljana , Slovenia 1 Andrej Marusˇicˇ Institute, University of Primorska , Muzejski trg 2, 6000 Koper , Slovenia 2 Department of Theoretical Computer Science, Institute of Mathematics , Physics and Mechanics, Jadranska 19, 1000 Ljubljana , Slovenia 3 Department of Economics and Management, University of Brescia , Via San Faustino 74/B, 25122 Brescia , Italy Introduction Journals Authors Citation networks Main Peer review is key to ensure rigour and quality of scholarly publications, establish standards that differentiate scientific discoveries from other forms of knowledge and maintain credibility of research inside and outside the scientific community (Bornmann 2011) . Although many believe it has roots that trace back centuries ago, historical analysis indicated that the very idea and practices of peer review that are predominant today in scholarly journals are recent. Indeed, peer review developed in the post-World War II decades when the tremendous expansion of science took place and the ‘‘publish or perish’’ culture and their competitive symbolisms we all know definitively gained momentum (Fyfe et al. 2017) . Unfortunately, although this mechanism determines resource allocation, scientist reputation and academic careers (Squazzoni et al. 2013) , a large-scale quantitative analysis of the emergence of peer review as a field of research that could reveal patterns, connections and identify milestones and developments is missing (Squazzoni and Taka´cs 2011) . This paper aims to fill this gap by providing a quantitative analysis of peer review as an emerging field of research that reveals patterns and connections between authors, fields and journals from 1950 to 2016. We collected all available sources from Web of Science (WoS) by searching for all records including ‘‘peer review’’ among their keywords. By using the program WoS2Pajek (Batagelj 2007) , we transformed these data in a collection of networks to reconstruct citation networks and different two-mode networks, including works by authors, works by keywords and works by journals. This permitted us to trace the most important stages in the evolution of the field. Furthermore, by performing a ’main path’ analysis, we tried to identify the most relevant body of knowledge that this field developed over time. Our effort has a twofold purpose. First, it aims to reconstruct the field by quantitatively tracking the formation and evolution of the community of experts who studied peer review. Secondly, it aims to reveal the most important contributions and their connections in terms of citations and knowledge flow, so as to provide important resources for all newcomers in the field. By recognizing the characteristics and boundaries of the field, we aim to inspire further research on this important institution, which is always under the spotlight and under attempts of reforms, often without relying on robust evidence (Edwards and Roy 2016; Squazzoni et al. 2017) . For standard theoretical notions on networks we use the terminology and definitions from Batagelj et al. (2014) . All network analyses were performed using Pajek—a program for analysis and visualization of large networks (De Nooy et al. 2011) . Data Data collection We searched for any record containing ‘‘peer review*’’ in WoS, Clarivate analytics’s multidisciplinary databases of bibliographic information in May and June 2015. We obtained 17,053 hits and additional 2867 hits by searching for ‘‘refereeing’’. Figure 1 reports an example of records we extracted. We limited the search to the WoS core collection because for other WoS databases the CR-fields (containing citation information) could not be exported. Using WoS2Pajek (Batagelj 2007) , we transformed data in a collection of networks: the citation network Cite (from the field CR), the authorship network WA (from the field AU), the journalship network WJ (from the field CR or J9), and the keywordship network WK (from the field ID or DE or TI). An important property of all these networks is that they share the same set—the set of works (papers, reports, books, etc.) as the first node set W. It is important to note that a citation network Cite is based on the citing relation Ci w Ci z work w cites work z Works that appear in descriptions were of two types: • • Hits—works with a WoS description; Only cited works (listed in CR fields, but not contained in the hits). These data were stored in a partition DC: DC½w ¼ 1 iff a work w had a WoS description; and DC½w ¼ 0 otherwise. Another partition year contained the work’s publication year from the field PY or CR. We also obtained a vector NP: NP½w ¼ number of pages of each work w. We built a CSV file titles with basic data about works with DC ¼ 1 to be used to list results. Details about the structure of names in constructed networks are provided in ‘‘The structure of names in constructed networks’’ section. The dataset was updated in March 2016 by adding hits for the years 2015 and 2016. We manually prepared short descriptions of the most cited works (fields: AU, PU, TI, PY, PG, KW; but without CR data) and assigned them the value DC ¼ 2. A first preliminary analysis performed in 2015 revealed that many works without a WoS description had large indegrees in the citation network. We manually searched for each of them (with indegree larger or equal to 20) and, when possible, we added them into the data set. It is important to note that earlier papers, which had a significant influence in the literature, did not often use the now established terminology (e.g., keywords) and were therefore overlooked by our queries. After some iterations, we finally constructed the data set used in this paper. The final run of the program WoS2Pajek produced networks with sets of the following sizes: works jW j ¼ 721;547, authors jAj ¼ 295;849, journals jJj ¼ 39;988, and keywords jKj ¼ 36;279. In both phases, 22,981 records were collected. There were 887 duplicates (considered only once). We removed multiple links and loops (resulting from homonyms) from the networks. The cleaned citation network CiteAll had n ¼ 721;547 nodes and m ¼ 869;821 arcs. Figure 2 shows a schematic structure of a citation network. The circular nodes correspond to the query hits. The works cited in hits are presented with the triangular nodes. Some of them are in the following phase (search for often cited works) converted into the squares (found in WoS by our secondary search). They introduce new cited nodes represented as diamonds. It is important to note that the age of a work was determined by its publication year. In a citation network, in order to get a cycle, an ‘‘older’’ node had to cite a ‘‘younger or the same age’’ work. Given that this rarely happens, citation networks are usually (almost) acyclic. To acyclic network’s nodes, we can assign levels such that for each arc, the level of its initial node is higher than the level of its terminal node. In an acyclic citation network, an example of a level is the publication date of a work. Therefore, acyclic networks can be visualized by levels—vertical axis representing the level with all arcs pointing in the same direction—in Fig. 2 pointing down. In the following section, we look at some statistical properties of obtained networks. Distributions In the left panel of Fig. 3, we showed a growth of the proportion q—the number of papers on peer review divided by the total number of papers from WoS (DC [ 0) by year. Proportions were multiplied by 1000. This means that peer review received growing interest in the literature, especially after 1990. For instance, in 1950 WoS listed only 6 works on peer review among 97,529 registered works published in that year, q1950 ¼ 0:6152 10 4. In 2015, we found 2583 works on peer review among 2,641,418 registered works, q2015 ¼ 0:9779 10 3. In the right panel of Fig. 3, the distribution of all (hits þ only cited) works by year is shown. It is interesting to note that this distribution can be fitted by log normal distribution (Batagelj et al. 2014, pp. 119–121) : 1 dlnorm ðx; l; rÞ ¼ pffiffiffiffiffi 2prx e ðln x lÞ2 2r2 indeg distribution outdeg distribution 0 0 1 0 q 5 fre 02 0 1 5 0 0 5 2 1 1 2 5 20 Science, 1973; (97) Lock, S: A Difficult Balance, 1985; (72) Hedges, LV, Olkin, I: Statistical methods for meta-analysis, 1985; (173) Cohen, J: Statistical power analysis, 1988; (87) Chubin, D, Hackett, EJ: Peerless Science, 1990; (60) Boyer, EL: Scholarship reconsidered, 1990; (51) Daniel, H-D: Guardians of science, 1993; (55) Miles, MB, Huberman, AM: Qualitative data analysis, 1994; (64) Gold, MR, et al.: Cost-effectiveness in health and medicine, 1996; (53) Lipsey, MW, Wilson, DB: Practical meta-analysis, 2001; (58) Weller, AC: Editorial peer review, 2001; (69) Higgins, JPT, Green, S: Systematic reviews of interventions, 2008; (130) Higgins, JPT, Green, S: Systematic reviews of interventions, 2011. We also found that works having the largest outdegree (the most citing works) were usually overview papers. These papers have been mostly published recently (in the last ten years). Among the first 50 works that cited works on peer review most frequently, only two were published before 2000—one in 1998 and another one in 1990. However, none of them were on peer review and so we did not report them here. The boundary problem Considering the indegree distribution in the citation network CiteAll, we found that most works were referenced only once. Therefore, we decided to remove all ‘only cited’ nodes with indegree smaller than 3 (DC ¼ 0 and indeg\3)—the boundary problem (Batagelj et al. 2014) . We also removed all only cited nodes starting with strings ‘‘[ANONYM’’, ‘‘WORLD_’’, ‘‘INSTITUT_’’, ‘‘U_S’’, ‘‘*US’’, ‘‘WHO_’’, ‘‘*WHO’’, ‘‘WHO(’’. ‘‘AMERICAN_’’, ‘‘DEPARTME_’’, ‘‘*DEP’’, ‘‘NATIONAL_’’, ‘‘UNITED_’’, ‘‘CENTERS_’’, ‘‘INTERNAT_’’, ‘‘EUROPEAN_’’. The final ‘bounded’ set of works WB included 45,917 works. Restricting two-mode networks WA, WJ and WK to the set WB and removing from their second sets nodes with indegree 0, we obtained basic networks WAB, WJB and WKB with reduced sets with the following size jABj ¼ 62;106, jKBj ¼ 36;275, jJBj ¼ 6716. Unfortunately, some information (e.g., co-authors, keywords) was available only for works with a WoS full description. In these cases, we limited our analysis to the set of works with a description Cochrane handbook for systematic reviews of interventions. Cochrane Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. Plos Med Measuring inconsistency in meta-analyses. Brit Med J The reliability of peer-review for manuscript and grant submissions...Behav Brain Sci An index to quantify an individual’s scientific research output. Proc Natl Acad Sci Usa Publication prejudices: an experimental study of confirmatory bias...cognitive therapy and research Effect of open peer review on quality of reviews and on reviewers’ recommendations:...Brit Med J 1991 Publication bias in clinical research. Lancet Measurement of observer agreement for categorical data. Biometrics Effect on the quality of peer review of blinding reviewers and asking them to sign their reports—...JAMA The philosophical basis of peer-review and the suppression of innovation. JAMA Preferred reporting items for systematic reviews and meta-analyses: PRISMA. Ann Intern Med Assessing the quality of reports of randomized clinical trials: Is blinding necessary? Control Clin Trials The effects of blinding on the quality of peer-review—a randomized trial. JAMA Chance and consensus in peer-review. Science Improving the quality of reports of meta-analyses of randomised controlled trials: QUOROM. Lancet Does masking author identity improve peer review quality?—a randomized controlled trial. JAMA A difficult balance: editorial peer review in medicine. Nuffield Trust Effect of blinding and unmasking on the quality of peer review—a randomized trial. JAMA What makes a good reviewer and a good review for a general medical journal? JAMA Full publication of results initially presented in abstracts—a metaanalysis. JAMA Higgins, JPT 2002 Quantifying heterogeneity in a meta-analysis. Stat Med n 29 30 31 WD ¼ fw 2 WB : DC½w [ 0g Its size was jWDj ¼ 22;104. By restricting basic networks to the set WD, we obtained subnetworks WAD, WKD and WJD. It is important to note that we obtain a temporal network N if the time T is attached to an ordinary network. T is a set of time points t 2 T . In a temporal network, nodes v 2 V and links l 2 L are not necessarily present or active in all time points. The node activity sets T(v) and link activity sets T(l) are usually described as a sequence of time intervals. If a link l(u, v) is active in a time point t then also its endnodes u and v should be active in the time point t. The time T is usually either a subset of integers, T Z, or a subset of reals, T R. We denote a network consisting of links and nodes active in time, t 2 T , by N ðtÞ and call it the (network) time slice or footprint of t. Let T 0 T (for example, a time interval). The notion of a time slice is extended to T 0 by: a time slice N ðT 0Þ for T 0 is a network consisting of links and nodes of N active at some time point t 2 T 0. Here, we presented a simple analysis of changes of sets of main authors, main journals and main keywords through time (Tables 2, 3, 4, 5). Our analysis was based on temporal versions of subnetworks WAD, WKD and WJD—the activity times were determined by the publication year of the corresponding work. Because of an increasing growth of interest (see the left panel of Fig. 3) on peer review, we decided to split the time line into intervals [1900, 1970], [1971, 1980], [1981, 1990], [1991, 2000], [2001, 2005], [2006, 2010], [2011, 2015]. Most cited works, main works, journals and keywords The left panel of Table 2 shows the authors with the largest number of co-authored works (WAD indegree), while the right panel shows the authors with the largest fractional contribution of works (weighted indegree in the normalized WAD). If we compare authors from Table 2 with the list of the most cited works in Table 1, we see that the two rankings are very different. Only three out of 25 authors with the largest number of works published a work that is on the list of 31 the most cited works. These are J. Cohen, D. Moher with two publications, and R. Smith. This is in line with the classic study by Cole and Cole (1973) in which they analyzed several aspects of the communication process in science. They used bibliometric data and survey data of the university physicists to study the conditions making for high visibility od scientist’s work. They found four determinants of visibility: the quality of work measured by citations, the honorific awards received for their work, the prestige of their departments and specialty. In short, quantity of outputs had no effect on visibility. We did not check each listed author’s name for homonymity. Author BORNMANN_L DANIEL_H SMITH_R ALTMAN_D MARSHALL_E GARFIELD_E SMITH_J RENNIE_D SQUIRES_B CHENG_J THOENNES_M COHEN_J JOHNSON_C REYES_H LEE_J WELLER_A BJORK_B BROWN_D BROWN_C MERVIS_J CALLAHAM_M JONES_R MOHER_D HARNAD_S BEREZIN_A In order to calculate the author’s contribution that is shown in Table 2, we used the nPormalized authorship network N ¼ ½npv . A contribution of each paper p was equal to v npv ¼ 1. Because we did not have information about each author’s real contribution, we used the so called fractional approach (Gauffriau et al. 2007; Batagelj and Cerinsˇek 2013) and set npv ¼ outwdaepgvðpÞ : This means that the contribution of an author v to the field is equal to its weighted indegree X windeg ðvÞ ¼ p2W npv _LN _SL _M S D 51 _D _L J _K NAM LRO _Y EN _H _D _N R 02–110 J_EE RNOW NGAH _SEE _GNA ISTRU RNO EZA NGA ENOH GNA ERHO TLAM I_THM 2 L B Z L W C B M W T W M – A S 6 1 5 5 4 4 3 3 3 9 9 9 3 3 3 3 2 2 2 2 2 2 2 1 1 1 1 1 L _N _P _M _C 60–02010 RNOANM I_ELNAH _TLANDM _SLEENW SERDNON I_SECNKD _ERHOD I_SERAM I_STRUK SEEONHN J_EE STAAGNA I_TRHM 2 B D A H A R M K – C T L C S 4 0 6 0 8 7 7 7 2 1 0 9 9 3 3 2 2 1 1 1 1 1 1 1 M C E _D J_ _A _ _ D 3 3 2 2 1 0 0 9 9 9 9 9 8 8 8 5 5 4 4 1 1 1 1 1 1 1 E G _ _ R S –2000 I_EN _TRH _SEH SHA EBD FTO SNON _LO J_EN ECRH _SERN I_NH _ECRH _SERD ANM ISER _ER L _ D L RG _FR _D _D _B D 199 EN IM EY RA UN SO H R HO ETL YA BU ETL UH TL QU OH 1 R S R M L K JO EB C F H R F K – A S M 9 6 2 1 9 9 9 8 8 8 8 8 8 8 7 6 5 1 1 1 1 T E G B _S _E _L _L _G 0 _S ER _L _D D S D L R 9 N _E A H E –19 IER LM EN IBN IFEL _SK EG IN SH _H BD 891 QU HA OH HU RA CO RA EN RA ITM UN 1 S C C C G L H R M S – L 3 8 8 7 5 5 5 5 5 5 3 1 _P _P _P _K _H E _ T 08 IE M R ON RAM _B W _ELD _RN J_ _D SER e 19 ST RO EN ISR E A _ I O H IN M tihm 71–9 IEN ILG TEA RO CKU LKU EAD FRA TER SLA BUH LAH g 1 W M R M Z H R G M W – C C u o r th 6 6 6 6 6 5 5 5 4 4 2 2 s r o h t u a _K _LW E _AO S _LC _HO _FD _R 3ianM 0791 _LRKGA I_SERHH ISTELAD IJ_THM I_FELY ILEDON II_FFRN SETRBO _FELND J_ELA SLRHAA SLRVOA J_LROA IFELRA TERON e – C F M S W R G R A S M H C – G M l b a 3 2 T 1 1 9 9 8 8 8 8 7 7 7 6 6 4 2 BMJ OPEN JAMA-J AM MED ASSOC PLOS ONE NATURE SCIENTOMETRICS BRIT MED J SCIENCE ***** ACAD MED LANCET SCIENTIST LEARN PUBL J AM COLL RADIOL PHYS TODAY ARCH PATHOL LAB MED J UROLOGY J ASSOC OFF AGR CHEM CAN MED ASSOC J ANN INTERN MED ABSTR PAP AM CHEM S n The first rows of Table 3 indicate the top authors in each time interval. If we restrict our attention to the authors who remained in the leading group at least for two time periods, we found a sequence starting from R. Merton (–1980) and E. Garfield (–1990), followed by D. Chubin and T. Chalmers (1971–1990), B. Squires, E. Marshall and G. Lundberg (1981–2000), and D. Rennie (1981–2005) and H. Reyes (1991–2005). D. Altman, R. Smith and D. Moher remained in the leading group for four periods (1991–2015). C. Castagna and H. Daniel were very active in the period (2001–2010). Later, the leading authors were L. Bornmann (2001–2015), M. Thoennessen, J. Lee, and K. Curtis (2006–2015). The short names ambiguity problem started to emerge with the growth of number of different authors in the period 1991–2000 with Smith_R (R, RD, RA, RC) and Johnson_D (DM, DAW, DR, DL). In 2006–2015, we found an increasing presence of Chinese (and Korean) authors: Lee_J, Zhang_L, Lee_S, Wang_J, Wang_Y, and Wang_H. Because of the ‘‘three Zhang, four Li’’ effect (100 most common Chinese family names were shared by 85% of the population, Wikipedia (2016) all these names represent groups of authors. For example: Lee_J (Jaegab, Jaemu, Jae Hwa, Janette, Jeong Soon, Jin-Chuan, Ji-hoon, JongKwon, Joong, Joseph, Joshua,Joy L, Ju, Juliet, etc.) and Zhang_L (L X, Lanying, Lei, Li, Lifeng, Lihui, Lin, Lina, Lixiang, Lujun). More interestingly, our analysis showed that researchers in medicine were more active in studying peer review, though this can be simply due to the larger size of this community. Out of 47 top journals publishing papers on peer review, 23 journals were listed in medicine (see Table 4). Among these top journals, there are also Nature, Science, Scientist, but also specialized journals on science studies such as Scientometrics. The third one on the list is a rather new (from 2006) open access scientific journal, that is, PLoS ONE. J I D C S D D E C S IC E E M SO N R M 0 MM JD ERN SDA ST IRBA ETOM ED ILT DAY JLG 002– -JAA ERU ETM EECN ITN ETC EM ITN VA TN DM NO STO EN 199 I N N N IE H IE A C YH EW 1 JAM TNA IBR SC AN LA CA SC EB SC CA JE – P N 4 0 8 6 5 1 0 0 0 9 9 7 7 2 2 1 1 1 1 1 1 1 D E M J E T M E H C C O A S N A EM e M im E t H h C g u R o r G th A ls F Y a F J A FFO CH JW rn O D D u o j e l b a 5 1 5 T 7 2 1 9 7 6 4 4 3 3 3 L O E T B SC IOD -PR IEN PU LA I R A Y G M L T R AD EN CO OH E S A S E H R LT V M L M E L L I O BU SC TH C 0 P D 01 T PA ER S EDU EC EM 2 N R – R O H U I N 600 EA PS CR TA TU ED IEC ITR 2 L J A N – C M S B J T 025 LLA TNO LOO ED E C T 0 T E IS eud 001– NN IEC RU ED CNA IECN IECN itn 2 A S J M – L S S n o c 5 e l b a 2 8 6 6 4 3 2 T 2 1 1 1 1 1 1 A citation network is usually (almost) acyclic. In the case of small strong components (cyclic parts) it can be transformed into a corresponding acyclic network using the preprint transformation. The preprint transformation replaces each work u from a strong component by a pair: published work u and its preprint version u0. A published work could cite only preprints. Each strong component was replaced by a corresponding complete bipartite graph on pairs—see Fig. 6 and Batagelj et al. (2014, p. 83). We determined the importance of arcs (citations) and nodes (works) using SPC (Search Path Count) weights which require an acyclic network as input data. Using SPC weights, we identified important subnetworks using different methods: main path(s), cuts and islands. Details will be given in the referee peer review 1 2 3 4 5 6 7 following subsections. Alternative approches have been proposed by Eck and Waltman (2010, 2014); Leydesdorff and Ahrweiler (2014) . We first restricted the original citation network Cite to its ‘boundary’ (45,917 nodes). This network, CiteB, had one large weak component (39,533 nodes), 155 small components (the largest of sizes 191, 46, 32, 31, 18), and 5589 isolated nodes. The isolated nodes correspond to the works with WoS description, not connected to the rest of the network, and citing only works that were cited at most twice—and therefore were removed from the network CiteB. The network CiteB includes also 22 small strong components (4 of size 3 and 18 of size 2). Figure 7 shows selected strong components. In order to apply the SPC method, we transformed the citation network in an acyclic network, CiteAcy, using the preprint transformation. In order to make it connected, we added a common source node s and a common sink node t (see Fig. 8). The network CiteAcy has n ¼ 45;965 nodes and m ¼ JEFFERSO_T{2002}287:2786 PAZOL_K{2015}49:S46 ROTH_W{2002}32:215 EISENHAR_M{2002}32:241 Fig. 8 Search path count method (SPC) Search path count method (SPC) The search path count (SPC) method (Hummon and Doreian 1989) allowed us to determine the importance of arcs (and also nodes) in an acyclic network based on their position. It calculates counters n(u, v) that count the number of different paths from some initial node (or the source s) to some terminal node (or the sink t) through the arc (u, v). It can be proved that all sums of SPC counters over a minimal arc cut-set give the same value F—the flow through the network. Dividing SPC counters by F, we obtain normalized SPC weights wðu; vÞ ¼ nðu; vÞ F that can be interpreted as the probability that a random s-t path passes through the arc (u, v) (see Batagelj (2003) and Batagelj et al. (2014, pp. 75–81); this method is available in the program Pajek). In the network CiteAcy, the normalized SPC weights were calculated. On their basis the main path, the CPM path, main paths for 100 arcs with the largest SPC weights (‘‘Main paths’’ section), and link islands [20,200] (‘‘Cuts and islands’’ section) were determined. Main paths In order to determine the important subnetworks based on SPC weights, Hummon and Doreian (1989) proposed the main path method. The main path starts in a link with the largest SPC weight and expands in both directions following the adjacent new link with the largest SPC weight. The CPM path is determined using the Critical Path Method from Operations Research (the sum of SPC weights on a path is maximal). RODRIGUE_R{2016}273:645 MOUSTAFA_K{2015}105:2271 CRANE_D{1967}2:195 MERTON_R{1957}22:635 STORER_N{1966}: COLE_S{1967}32:377 POLANYI_M{1958}: BAYER_A{1966}39:381 CARTTER_A{1966}: CRANE_D{1965}30:699 DENNIS_W{1954}79:180 MELTZER_B{1949}55:25 Network=acyclicnetwork=createðsubÞnetwork=mainpaths with several suboptions for computing local and global main paths and for searching for Key-Route main path in acyclic networks (Liu and Lu 2012) . Here, the procedure begins with a set of selected seed arcs and expands them in both directions as in the main path procedure. Both main path and CPM procedure gave the same main path network presented in Fig. 9. Nodes with a name starting with = (for axample =JEFFERSO_T(2002)2872786 in Fig. 9) correspond to a preprint version of a paper. In Fig. 10, main paths for 100 seed arcs with the largest SPC weights are presented. The main path was included in this 123 RODRIGUE_R{2016}273:645 BORNMANN_L{2011}174:857 BORNMANN_L{2010}32:5 Fig. 10 were additional 47path (e.g., Rennie, main pathhsigfohuersttinmuems.bHere oisf Main paths for 100 largest weights Cicchetti, Altman, swOuopbtrnhkeostfww).oeIrrtkeisafnriondmttemhreeaorsusetttihncogirtstehdoatfa Muththoeohmresra’sainnpdubalmimcoaantiignonpasauatthhp.opresarwohno had the works on parallel paths. Many of these additional Bornmann, also among the not appear on the publications, but he did Main path publication pattern Our analysis found 48 works on the main path. After looking at all these works in detail, we classified them into three groups determined by their time periods: Before 1982: this includes works published mostly in social science and philosophy journals and social science books; From 1983 to 2002: this includes works published almost exclusively in biomedical journals; From 2003: this includes works published in specialized science studies journals. The main path till 1982 This period includes important social science journals, such as American Journal of Sociology, American Sociologist, American Psychologist and Sociology of Education, and three foundational books. The most influential authors were: Meltzer (1949), Dennis (1954), Merton (1957), Polany (1958), Crane (1965, 1967), Bayer and Folger (1966), Storer (1966), Cartter (1966), Cole and Cole (1967), Zuckerman and Merton (1971), Ingelfinger (1974), Cicchetti (1980), and Peters and Ceci (1982). The most popular topics were: scientific productivity, bibliographies, knowledge, citation measures as measures of scientific accomplishment, scientific output and recognition, evaluation in science, referee system, journal evaluation, peer-evaluation system, review process, peer review practices. The main path from 1983 to 2002 This period includes biomedical journals, mainly JAMA. It is worth noting that JAMA published many papers which were presented at the International Congress on Peer Review and Biomedical Publication since 1986. Among the more influential authors were: Rennie (1986, 1992, 1993, 1994, 2002), Smith (1994, 1999), and Jefferson with his collaborators Demicheli, Drummond, Smith, Yee, Pratt, Gale, Alderson, Wager and Davidoff (1995, 1998, 2002). The most popular topics were: the effects of blinding on review quality, research into peer review, guidelines for peer reviewing, monitoring the peer review performance, open peer review, bias in peer review system, measuring the quality of editorial peer review, development of meta-analysis and systematic reviews approaches. The main path from 2003 Here, the situation changed again. Some specialized journals on science studies gained momentum, such as Scientometrics, Research Evaluation, Journal of Informetrics and JASIST. The most influential authors were: Bornmann and Daniel (2005, 2006, 2007, 2008, 2009, 2011) and Garcia, Rodriguez-Sanchez and Fdez-Valdivia (4 papers in 2015, 2016). Others popular publications were Lee et al. (2013) and Moustafa (2015). Research interest went to peer review of grant proposals, bias, referee selection and editor-referee/author links. Cuts and islands Cuts and islands are two approaches to identify important groups in a network. The importance is expressed by a selected property of nodes or links. If we represent a given or computed property of nodes/links as a height of nodes/links and we immerse the network into a water up to a selected property threshold level, we obtain a cut (see the left picture in Fig. 11). By varying the level, we can obtain different islands—maximal connected subnetwork such that values of selected property inside island are larger than values on the island’s neighbors and the size (number of island’s nodes) is within a given range [k, K] (see the right picture in Fig. 11). An island is simple iff it has a single peak [for details, see (Batagelj et al. 2014, pp. 54–61) ]. Zaversˇnik and Batagelj (2004) developed very efficient algorithms to determine the islands hierarchy and list all the islands of selected sizes. They are available in Pajek. Fig. 12 SPC islands [20,200] BORNMANN_L{2008}2:217 BORNMANN_L{2007}73:139 BORNMANN_L{2010}32:5 BORNMANN_L{2011}174:857 SPC link Island 1 [100] BORNMANN_L{2011}45:199 SEN_C{2012}16:293 WALTMAN_L{2011}88:1017 FRANCESC_M{2011}5:275 Islands allow us also to overcome a typical problem of the main path approach, that is the selection of seed arcs. Here, we simply determined all islands and looked at the maximal SPC weight in each island. This allowed us to determine the importance of an island. When searching for SPC link islands for the number of nodes between 20 and 200 (and between 20 and 100), we found 26 link islands (see Fig. 12). Many of these islands have a very short longest path, often a star-like structure (a node with its neighbors). These islands are not very interesting for our purpose. We visually identified ‘‘interesting’’ islands and inspected them in detail. In the following list, we present basic information for each of selected island, i.e., the number of nodes for the selection of 20–200 nodes (and 20–100), the maximal SPC weight in the island and a short description of the island: Island 1. n ¼ 191ð99Þ, 0.297. Peer-review. Island 2. n ¼ 191ð96Þ, 0:211 10 8. Discovery of different isotopes. Island 3. n ¼ 178, 0:165 10 8. Biomass. Island 7. n ¼ 42, 0:425 10 8. Athletic trainers. Island 8. n ¼ 36, 0:191 10 4 Sport refereeing and decision-making. Island 9. n ¼ 32, 0:793 10 10. Environment pollution. Island 13. n ¼ 29, 0:451 10 10. Toxicity testing. Island 23. n ¼ 22, 0:344 10 8. Peer-review in psychological sciences. Island 24. n ¼ 21, 0:487 10 10. Molecular interaction. Only Island 1 and Island 23 dealt directly with the peer review. Other islands represented collateral stories. The Island 1 on peer-review was the most important because it had the maximal SPC weight at least 10.000 times higher than the next one, i.e., Island 8 on sport refereeing. For the sake of readability, we extracted from Island 1 a sub-island of size in range [20, 100], which is shown in Fig. 13. It contains the main path and strongly overlaps with the main paths in Fig. 8. The list of all publications from the main path (coded with 1), main paths (coded with 2) and SPC link island (20–100) (coded with 3) is shown in Table 6 in the ‘‘Appendix’’. We found 105 works in the joint list. Only 9 publications were exclusively on main paths and only 10 publications were exclusively in the SPC link island. The three groups typology of works also held for the list of all 105 publications. Conclusions This article provided a quantitative analysis of peer review as an emerging field of research by revealing patterns and connections between authors, fields and journals from 1950 to 2016. By collecting all available sources from WoS, we were capable of tracing the emergence and evolution of this field of research by identifying relevant authors, publications and journals, and revealing important development stages. By constructing several one-mode networks (i.e., co-authorship network, citation network) and two-mode networks, we found connections and collective patterns. However, our work has certain limitations. First, given that data were extracted from WoS, works from disciplines and journals less covered by this tool could have been underrepresented. This especially holds for humanities and social sciences, which are less comprehensively covered by WoS and more represented in Scopus and even more in GoogleScholar (e.g., Halevi et al. 2017) , which also lists books and book chapters (e.g., Halevi et al. 2016) . However, given that GoogleScholar does not permit large-scale data collection, a possible validation of our findings by using Scopus could be more feasible. Furthermore, given that data were obtained using the queries ‘‘peer review*’’ and refereeing and that these terms could be used in many fields, e.g., sports, our dataset included some works that probably had little to do with peer review as a research field. For example, when reading the abstracts of certain works included in our dataset, we found works reporting ’Published by Elsevier Ltd. Selection and/or peer review under responsibility of’. An extra effort (unfortunately almost prohibitive) in cleaning the dataset manually would help filtering out irrelevant records. However, by using the main path and island methods, we successfully identified the most important and relevant publications on peer review without incurring in excessive cost of data cleaning or biasing our findings significantly. Secondly, another limitation of our work is that we did not treat author name disambiguation, as evident in Table 3. This could be at least partially solved by developing automatic disambiguation procedures, although the right solution would be the adoption by WoS and publishers of the standards such as ResearcherID and ORCID to allow for a clear identification since from the beginning. To control for this, we could include in WoS2Pajek additional options to create short author names that will allow manual correction of names of critical authors. With all these caveats, our study allowed us to circumscribe the field, capture its emergence and evolution and identify the most influential publications. Our main path procedures and islands method used SPC weights on citation arcs. It is important to note that the 47 publications from the main path were found in all other obtained lists of the most influential publications. They could be considered as the main corpus of knowledge for any newcomer in the field. More importantly, at least to have a dynamic picture of the field, we found these publications to be segmented in three phases defined by specific three time periods: before 1982, with works mostly published in social sciences journals (sociology, psychology and education); from 1983 to 2002, with works published almost exclusively in biomedical journals, mainly JAMA; and after 2003, with works published more preferably in science studies journals (e.g., Scientometrics, Research Evaluation, Journal of Informetrics). This typology indicates the emergence and evolution of peer review as a research field. Initiatives to promote data sharing on peer review in scholarly journals and funding agencies (e.g., Casnici et al. 2017; Squazzoni et al. 2017) as well as the establishment of regular funding schemes to support research on peer review would help to strengthen the field and promote tighter connections between specialists. Results also showed that while the term ‘‘peer review’’ itself was relatively unknown before 1970 (‘‘referee’’ was more frequently used), publications on peer review significantly grew especially after 1990. Acknowledgements This work was partly supported by the Slovenian Research Agency (Research Program P1-0294 and Research Projects J1-5433 and J5-5537) and was based upon work from COST Action TD1306 ‘‘New frontiers of peer review’’—PEERE. Previous versions took advantages from comments and suggestions by many PEERE members, including Francisco Grimaldo, Daniel Torres-Salinas, Ana Marusic and Bahar Mehmani. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Appendix The structure of names in constructed networks The usual ISI name of a work as used in the CR field, e.g., has the following structure where AU1 is the first author’s name and SO[:20] is the string of the initial (up to) 20 characters in the SO field. In WoS records, the same work can have different ISI names. To improve precision, the program WoS2Pajek supports also short names [similar to the names used in HISTCITE output (Garfield et al. 2003) ]. They have the format: For example: TREGENZA_T(2002)17:349. From the last names with prefixes VAN, DE, etc. the space is deleted. Unusual names start with a character * or $. The name [ANONYMOUS] is used for anonymous authors. This construction of names of works provides a good balance between the synonymy problem (different names designating the same work) and the homonymy problem (a name designating different works). We treated the remaining synomyms and homonyms in the network data as a noise. If their effect surfaces into final results, we either corrected our copy of WoS data and repeated the analysis, or, if the correction required excessive work, simply reported the problem. A typical such case was the author name [ANONYMOUS] or combinations with some very frequent last names—in MathSciNet there are 85 mathematicians corresponding to the short name SMITH_R and 1792 mathematicians corresponding to the short name WANG_Y. The composed keywords were decomposed in single words. For example, ‘peer review’ into ‘peer’ and ‘review’. On keywords obtained from titles of works we applied the lemmatization (using the Monty Lingua library). The name ***** denoted a missing journal name. Details about important works In Tables 6, 7 and 8 a list of works on main path (1), main paths (2) and island (3) is presented. Only the first authors are listed. Priorities in scientific discovery—a chapter in the sociology of science Personal knowledge: towards a post-critical philosophy Scientists at major and minor universities Some correlates of citation measure of productivity in science The social system of science An Assessment of quality in graduate education Gatekeepers of science—some factors affecting selection of articles... Scientific output and recognition—study in operation of reward system... Patterns of evaluation in science—...of referee system Citation analysis as a tool in journal evaluation— journals can be ranked... Peer review in biomedical publication 70th annual-meeting of american-society-for clinical-investigation,... Evaluating psychological-research reports—...of quality judgments The fallacy of peer-review—judgment without science and a case-history Reliability of reviews for the american psychologist... Peer-review practices of psychological journals— the fate... Journal peer-review—the need for a research agenda Anonymous authors, anonymous referees—an editorial exploration Guarding the guardians—a conference on editorial peer-review The peer-review of manuscripts in need for improvement Peer-review in medical journals Blind versus nonblind review—survey of selected medical journals The effects of blinding on the quality of peer review—a randomized trial Editorial peer-review in biomedical publication— the 1st-international-congress A cohort study of summary reports of controlled trials 1986 123 1991 1991 1992 1992 1993 1994 1994 What makes a good reviewer and a good review for a general medical journal? Peer review for journals as it stands today— Part 1 Evaluating the BMJ guidelines for economic submissions... Peer review in Prague Perceived value of providing peer reviewers with abstracts and preprints... Does quality of reports of randomised trials affect estimates of intervention efficacy... Positive-outcome bias and other limitations in the outcome of research abstracts... Effect of open peer review on quality of reviews and on reviewers’ recommendations... Opening up BMJ peer review—a beginning that should lead to complete transparency Evidence on peer review—scientific quality control or smokescreen? Improving the quality of reports of metaanalyses of randomised controlled trials: QUOROM Open peer review: a randomised controlled trial Meta-analysis of observational studies in epidemiology—A proposal for reporting The revised CONSORT statement for reporting randomized trials... The CONSORT statement: revised recommendations for improving the quality of reports... Effects of editorial peer review—a systematic review Measuring the quality of editorial peer review Fourth international congress on peer review in biomedical publication The peer-review process The significance of the peer review process against the background of bias... Impartial judgment by the ‘‘gatekeepers’’ of science:... Post-publication filtering and evaluation: Faculty of 1000 Bornmann, L Bornmann, L Franceschet, M A multilevel modelling approach to investigating the...of editorial decisions:... The first Italian research assessment exercise: A bibliometric perspective Chan, AW Selecting manuscripts for a high-impact journal through peer review... The effectiveness of the peer review process: inter referee agreement... Latent Markov modeling applied to grant peer review Are there better indices for evaluation purposes than the h index?... The luck of the referee draw: the effect of exchanging reviews The Hirsch-index: a simple, new tool for the assessment of scientific output... Fraud and misconduct in science: the stem cell seduction The influence of the applicants’ gender on the modeling of a peer review... The manuscript reviewing process: Empirical research on review... Journal JAMA Batagelj , V. ( 2003 ) Efficient algorithms for citation network analysis . http://arxiv.org/abs/cs/0309023. Batagelj , V. ( 2007 ). WoS2Pajek. Manual for version 1 .4, July 2016 . http://vladowiki.fmf.uni-lj.si/doku. php?id=pajek:wos2pajek. Batagelj , V. , & Cerinsˇek , M. ( 2013 ). On bibliographic networks . Scientometrics , 96 ( 3 ), 845 - 864 . Batagelj , V. , Doreian , P. , Ferligoj , A. , & Kejzˇar , N. ( 2014 ). Understanding large temporal networks and spatial networks: Exploration, pattern searching, visualization and network evolution. London: Wiley series in computational and quantitative social science , Wiley. Bornmann , L. ( 2011 ). Scientific peer review . Annual Review of Information Science and Technology , 45 ( 1 ), 197 - 245 . Casnici , N. , Grimaldo , F. , Gilbert , N. , & Squazzoni , F. ( 2017 ). Attitudes of referees in a multidisciplinary journal: An empirical analysis . Journal of the Association for Information Science and Technology , 68 ( 7 ), 1763 - 1771 . Cole , S. , & Cole , J. R. ( 1973 ). Visibility and the structural bases of awareness of scientific research . American Sociological Review, 33 , 397 - 413 . De Nooy , W. , Mrvar , A. , & Batagelj , V. ( 2011 ). Exploratory social network analysis with Pajek. Structural analysis in the social sciences (Revised and Expanded Second ed.). Cambridge: Cambridge University Press. Edwards , M. A. , & Roy , S. ( 2016 ). Academic research in the 21st century: Maintaining scientific integrity in a climate of perverse incentives and hypercompetition . Environmental Engineering Science , 34 ( 1 ), 51 - 61 . Fyfe , A. , Coate , K. , Curry , S. , Lawson , S. , Moxham , N. , & Røstvik , C. M. ( 2017 ). Untangling academic publishing: A history of the relationship between commercial interests, academic prestige and the circulation of research. Zenodo project . doi:10 .5281/zenodo.546100. Garfield , E. , Pudovkin , A. I. , & Istomin , V. S. ( 2003 ). Why do we need algorithmic historiography? Journal of the American Society for Information Science and Technology, 54 ( 5 ), 400 - 412 . Gauffriau , M. , Larsen , P. O. , Maye , I. , Roulin-Perriard , A. , & von Ins , M. ( 2007 ). Publication, cooperation and productivity measures in scientific research . Scientometrics, 73 ( 2 ), 175 - 214 . Gross , C. ( 2016 ). Scientific misconduct . Annual Review of Psychology , 67 ( 1 ), 693 - 711 . Halevi , G. , Moed , H. , & Bar-Ilan , J. ( 2017 ). Suitability of Google Scholar as a source of scientific information and as a source of data for scientific evaluation-Review of the literature . Journal of Informetrics , 11 ( 3 ), 823 - 834 . Halevi , G. , Nicolas , B. , & Bar-Ilan , J. ( 2016 ). The complexity of measuring the impact of books . Publishing Research Quarterly , 32 ( 3 ), 187 - 200 . Hummon , N. P. , & Doreian , P. ( 1989 ). Connectivity in a citation network: The development of DNA theory . Social Networks , 11 , 39 - 63 . Leydesdorff , L. , & Ahrweiler , P. ( 2014 ). In search of a network theory of innovations: Relations, positions, and perspectives . Journal of the American Society for Information Science and Technology , 65 ( 11 ), 2359 - 2374 . Liu , J. S. , & Lu , L. Y. Y. ( 2012 ). An integrated approach for main path analysis: Development of the Hirsch index as an example . Journal of the American Society for Information Science and Technology , 63 , 528 - 542 . Squazzoni , F. , Bravo , G. , & Taka´cs, K. ( 2013 ). Does incentive provision increase the quality of peer review? An experimental study . Research Policy , 42 ( 1 ), 287 - 294 . Squazzoni , F. , Grimaldo , F. , & Marusˇic´, A. ( 2017 ). Publishing: Journals could share peer-review data . Nature , 546 ( 7658 ), 352 - 352 . Squazzoni , F. , & Taka´cs, K. ( 2011 ). Social simulation that 'peers into peer review' . Journal of Artificial Societies and Social Simulation , 14 ( 4 ), 3 . van Eck , N. J. , & Waltman , L. ( 2010 ). Software survey: VOSviewer, a computer program for bibliometric mapping . Scientometrics , 84 ( 2 ), 523 - 538 . van Eck , N. J. , & Waltman , L. ( 2014 ). CitNetExplorer: A new software tool for analyzing and visualizing citation networks . Journal of Informetrics , 8 ( 4 ), 802 - 823 . Zaversˇnik , M. , & Batagelj , V. ( 2004 ) Islands . XXIV. International sunbelt social network conference . Portorozˇ, May 12 -16. http://vlado.fmf.uni-lj.si/pub/networks/doc/sunbelt/islands.pdf.


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1007%2Fs11192-017-2522-8.pdf

Vladimir Batagelj, Anuška Ferligoj, Flaminio Squazzoni. The emergence of a field: a network analysis of research on peer review, Scientometrics, 2017, 503-532, DOI: 10.1007/s11192-017-2522-8