Identifying “hot papers” and papers with “delayed recognition” in large-scale datasets by using dynamically normalized citation impact scores

Scientometrics, May 2018

“Hot papers” (HPs) are papers which received a boost of citations shortly after publication. Papers with “delayed recognition” (DRs) received scarcely impact over a long time period, before a considerable citation boost started. DRs have attracted a lot of attention in scientometrics and beyond. Based on a comprehensive dataset with more than 5,000,000 papers published between 1980 and 1990, we identified HPs and DRs. In contrast to many other studies on DRs, which are based on raw citation counts, we calculated dynamically field-normalized impact scores for the search of HPs and DRs. This study is intended to investigate the differences between HPs (n = 323) and DRs (n = 315). The investigation of the journals which have published HPs and DRs revealed that some journals (e.g. Physical Review Letters and PNAS) were able to publish significantly more HPs than other journals. This pattern did not appear in DRs. Many HPs and DRs have been published by authors from the USA; however, in contrast to other countries, authors from the USA have published statistically significantly more HPs than DRs. Whereas “Biochemistry & Molecular Biology,” “Immunology,” and “Cell Biology” have published significantly more HPs than DRs, the opposite result arrived for “Surgery” and “Orthopedics.” The results of the analysis of certain properties of HPs and DRs (e.g. number of pages) suggest that the emergence of DRs is an unpredictable process.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://link.springer.com/content/pdf/10.1007%2Fs11192-018-2772-0.pdf

Identifying “hot papers” and papers with “delayed recognition” in large-scale datasets by using dynamically normalized citation impact scores

Identifying ''hot papers'' and papers with ''delayed recognition'' in large-scale datasets by using dynamically normalized citation impact scores Lutz Bornmann 0 1 2 Adam Y. Ye 0 1 2 Fred Y. Ye 0 1 2 0 Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing University , Nanjing 210023 , China 1 Center for Bioinformatics, School of Life Sciences, Peking University , Beijing 100871 , China 2 Division for Science and Innovation Studies, Administrative Headquarters of the Max Planck Society , Hofgartenstr. 8, 80539 Munich , Germany ''Hot papers'' (HPs) are papers which received a boost of citations shortly after publication. Papers with ''delayed recognition'' (DRs) received scarcely impact over a long time period, before a considerable citation boost started. DRs have attracted a lot of attention in scientometrics and beyond. Based on a comprehensive dataset with more than 5,000,000 papers published between 1980 and 1990, we identified HPs and DRs. In contrast to many other studies on DRs, which are based on raw citation counts, we calculated dynamically field-normalized impact scores for the search of HPs and DRs. This study is intended to investigate the differences between HPs (n = 323) and DRs (n = 315). The investigation of the journals which have published HPs and DRs revealed that some journals (e.g. Physical Review Letters and PNAS) were able to publish significantly more HPs than other journals. This pattern did not appear in DRs. Many HPs and DRs have been published by authors from the USA; however, in contrast to other countries, authors from the USA have published statistically significantly more HPs than DRs. Whereas ''Biochemistry & Molecular Biology,'' ''Immunology,'' and ''Cell Biology'' have published significantly more HPs than DRs, the opposite result arrived for ''Surgery'' and ''Orthopedics.'' The results of the analysis of certain properties of HPs and DRs (e.g. number of pages) suggest that the emergence of DRs is an unpredictable process. Hot paper; Paper with delayed recognition impact scores Field-normalized Introduction In most evaluations of researchers, research groups, and academic institutions, bibliometric indicators—especially citation impact scores—are used in an informed peer review process (Bornmann et al. 2014) . A frequent problem of the application of citation impact scores in these processes is that the evaluations focus—as a rule—on the recent performance of the evaluated units (e.g. the last 3 years). However, the ‘‘true’’ impact of a publication can be determined only after a longer time period in several disciplines: ‘‘A 3-year time window is sufficient for the biomedical research fields and multidisciplinary sciences, while a 7-year time window is required for the humanities and mathematics’’ (Wang 2013, p. 866) . Thus, the strength of bibliometrics entails identifying outstanding publications (or the corresponding outstanding researchers, research groups, and institutions, respectively) in the long term. In recent years, several bibliometric studies have dealt with the investigation of a subgroup of publications showing a specific long term citation impact: papers with delayed recognition (DRs). Publications are denoted as DRs if they received only a few or no citations over many years (e.g., 10 years after their appearance) and then experienced a significant boost in citations. For example, Van Calster (2012) shows that Charles Sanders Peirce (1884) note in Science on ‘‘The Numerical Measure of the Success of Predictors’’ is a typical case of a DR. The note received ‘‘less than 1 citation per year in the decades prior to 2000, 3.5 citations per year in the 2000s, and 10.4 in the 2010s’’ (p. 2342). Marx (2014) demonstrates that the initial reception of the paper ‘‘Detailed Balance Limit of Efficiency of P–N Junction Solar Cells’’ by Shockley and Queisser (1961) was hesitant; after several years, the paper has become a highly cited paper in its field. Gorry and Ragouet (2016) present a landmark paper in interventional radiology, which can be characterized as a DR. In ‘‘Literature review’’ section, we explain the different methods which have been introduced in scientometrics to identify these and other DRs in bibliometric databases. Based on these methods, Ye and Bornmann (2018) propose the citation angle, which can be used to distinguish between ‘‘hot papers’’ (HPs) and DRs. In contrast to DRs, HPs received a boost of citations shortly after publication (and not after several years as DRs). In this study, we searched for HPs and DRs among all papers published between 1980 and 1990. Since citation counts should be normalized with regard to publication year and subject category (of the cited publication), we generated dynamically normalized citation impact scores (DNIC), which are annually field-normalized impact scores based on OECD minor codes1 for field delineation. We used these scores for the search of HPs and DRs. The objective of this study is to analyze systematic differences between papers which became HPs or DRs later on. Factors which have been identified in recent years as correlates of citations (Bornmann & Leydesdorff 2017; Tahamtan, Safipour Afshar, & Ahamdzadeh, 2016) are used to determine different characteristics of both paper groups. As factors, this study focuses on the publication year, the number of authors, countries, references and pages of a publication as well as its inter-disciplinarity (measured by the number of subject categories). 1 see http://www.oecd.org/science/inno/38235147.pdf. Literature review where ct is the citation counts received in the tth year after publication and t the age of a paper. A paper reached the maximum number cm of annual citations at time tm. The equation of the straight line (l) which connects two points (0, c0) and (tm, cm) in the annual citation curve is defined as l : c ¼ cm c0 tm t þ c0: ð2Þ Cressey (2015) assumes that the coefficient B is an elegant and effective method for DRs retrievals in big datasets. Ye and Bornmann (2018) reveal its dynamic characteristics and extend B by a HP component. Furthermore, they introduced the citation angle for unifying the approaches of identifying instant and delayed recognition. The distinction between DRs and HPs follows Baumgartner and Leydesdorff (2014) who introduced two groups of papers: (1) ‘‘Citation classics’’ or ‘‘sticky knowledge claims’’ have a lasting impact on a specific field. DRs are a sub-group among citation classics, whose lasting impact is not combined with early citation impact. (2) The other paper group (‘‘transient knowledge claims’’) has an early boost of citation impact followed by a fast impact decrease shortly after publication. According to Baumgartner and Leydesdorff (2014) the papers in this group are contributions at the research front. Comins and Leydesdorff (2016) investigated the existence of both paper types empirically. van Raan (2015) demonstrated that many DRs are application-oriented and thus are potential ‘‘sleeping innovations’’. In a follow-up study, van Raan (2016) analyzed characteristics of DRs which are cited in patents. The results show that patent citations occur before or after the delayed recognition started. The citation rate during the period of sleep is not related to the later scientific or technological impact of the DRs. The comparison of DRs with ‘‘normal’’ papers reveals that DRs are more frequently cited in patents than ‘‘normal’’ papers. Methods Definitions of ‘‘hot papers’’ (HP) and papers with ‘‘delayed recognition’’ (DRs) Following the definitions of HPs and DRs hitherto, the typical DR is defined as a publication with a late citation peak, and prior annual citations which are much lower than the peak citations, while a typical HP is defined as a publication with an early citation peak and later annual citations which are much lower than the early peak. In contrast to the other studies, which used raw citation counts to identify DRs (see ‘‘Literature review’’ section), this study is based on (dynamically) field- and time-normalized citation impact scores—the standard impact measure in bibliometrics (Vinkler, 2010) . The dynamically normalized impact of citations (DNIC) is defined as DNICij ¼ ECkijj ; k ¼ f ðiÞ 1 Ekj ¼ Nkj ijk¼f ðiÞ X Cij ð3Þ ð4Þ where i = 1,2,… are publications, j = 1,2,… are citing years, and k = 1,2,… are different fields (here defined by OECD minor codes). Cij denotes received citations by publication i in year j, and Ekj denotes mean (received) citations of all publications in field k and year j (i.e. Ekj is the expected value). Nkj is the number of cited publications in field k and year j (note: Nkj is a variable which is based on non-zero citations), and k = f(i) means a certain field of a given publication. The indicator follows the standard approach in bibliometrics with both field- and time-normalized citations (Waltman 2016) . The only difference to the standard approach is that the calculation is based on annual citations (dynamically), but not on the citations between publication year and a fixed time point later on. If Cij = 0, then DNICij = 0. All points of DNICij = 1 in field k yield the field- and time- normalized line LN (see the distribution in theory of DNIC in Fig. 1). If DNICij [ 1, the citation impact of the publications is higher than the average in the corresponding fields and publication years, as shown with line LA. If DNICij \ 1, the impact is lower than the average, as shown with the line LU. In practical terms, however, citation counts Cij and expected values Ekj are variable terms. The DNIC distribution of many papers changes from year to year (see the distribution in practice in Fig. 1). Therefore, by using DNIC for impact normalization of papers in this study we need rules for identifying HPs and DRs. We oriented these rules towards the rules of thumbs defined by van Raan (2004a, 2008) for interpreting field-normalized citation scores. DNICij is a dynamic series of annually normalized impact scores. We suggest identifying HPs and DRs with the criteria given in Table 1. In Table 1, DNICpeak_t\th denotes that the peak is located in the early-half time span of the citation impact distribution (covering ± 2 years); DNICpeak_t[th denotes that the peak is located in the late-half time span (covering ± 2 years). DNICa_peak_t refers to all DNICij after the peak (? 2 years), and DNICb_peak_t refers to all DNICij before the peak (- 2 years). In this study, th = 13. We have data covering 36 citing years (1980–2015) and needed to compare the years 1980–1990 dynamically. Thus, we selected 16 years as the time span of citations for each publication, such as 1980–1995 for the papers from 1980 and 1981–1996 for the papers from 1981. Used datasets Thomson Reuters). From the in-house database, we selected only papers with the document type ‘‘article’’ to have comparable citable units. The DNIC scores for each paper refer to the period from its publication year until the end of 2015. Using the methods explained in ‘‘Definitions of ‘hot papers’ (HP) and papers with ‘delayed recognition’ (DRs)’’ section, we found the numbers of HPs and DRs in the dataset as reported in Table 2. Since HPs and DRs have been identified by using normalized impact scores within single fields and many papers belong to more than one field, there are duplicates among HPs and DRs. Thus, 191 duplicates were deleted of the 2636 DRs and HPs (147 papers were twice and 44 papers three times in the dataset). Figure 2 demonstrates clear differences in citation profiles of HPs and DRs following the definitions of both groups in ‘‘Definitions of ‘hot papers’ (HP) and papers with ‘delayed recognition’ (DRs)’’ section. Both, HPs and DRs are groups of papers with extreme citation profiles (see Fig. 2). In order to reveal how these extreme groups differ from ‘‘normal’’ papers in certain properties, we drew a random sample from the in-house database with n = 323 papers (date December 8, 2016). The random sample has been selected in those WoS subject categories in which most of the DRs and HPs were published (i.e., the ten subject categories in which most of the DRs and HPs were published). The population of the random sample (N = 1,198,843) contains papers from 1980 to 1990 and is restricted to the document type ‘‘article’’. The size of the random sample with n = 323 papers has been determined by a power analysis. Its results showed that we need 323 papers in each group to detect a very small effect, f = .1 (Cohen, 1988) , as statistically significant at the a = .05 level with a power of .8 (Acock 2016) . Considering the third group of randomly selected papers (RANs), the dataset (n = 2768) of this study consists of 2130 HPs (77%), 315 DRs (11%), and 323 RANs (12%). In order to have three groups of papers with a more or less balanced set of case numbers, we drew a random sample of 323 papers from the 2130 HPs—following the results of the power analysis. Thus, the final dataset (n = 961) consists of 323 HPs (33.6%), 315 DRs (32.8%), and 323 RANs (33.6%). Statistical methods This study tests whether the mean values (e.g., the mean number of authors or pages) from k groups (HP, DR, and RAN) are the same or not. With the analysis of variance (ANOVA) any overall difference between the k groups can be tested on statistical significance. The ANOVA separates the variance components into two parts: those due to mean differences and those due to random influences (Riffenburgh, 2012). There are three general assumptions for calculating the ANOVA: (1) The data are independent of each other. (2) The distribution of the data is normal. (3) The standard deviation of the data is the same for all groups (HP, DR, and RAN). Although these assumptions are violated here, the ANOVA is still applied: according to Riffenburgh (2012), the ANOVA ‘‘is fairly robust against these assumptions’’ (p. 265), especially in those studies in which the sample size is high. In order to counter-check the results of the ANOVA, the Kruskal–Wallis rank test (KW test) has been additionally applied as the non-parametric alternative (Acock 2016) . The effect size eta squared (g2) is additionally calculated to the ANOVA which is a measure of the practical significance of the results (Acock 2016) . Eta squared is the sum of squares for a factor (here: three groups of papers with different citation profiles) divided by the total sum of squares. The effect size shows how much of the variation in the sample of papers (e.g. with respect to the number of authors) is explained by the factor. According to Cohen (1988) , a value of g2 = .01 means a small effect, g2 = .06 a medium effect, and g2 = .14 a large effect. The consideration of the practical significance is especially important in studies in which the case numbers are high (Kline 2004) . There is a risk in these studies that the results of statistical tests are significant although the effects (e.g., mean differences between k groups) are small. Beyond the ANOVA, the t test is applied in this study to undertake pairwise comparisons of group means. Thus, it is not only tested whether the mean differences between all k groups (where k [ 2) are statistically significant, but also the mean differences between the specific pairs of groups. The t test is seen as a very robust statistic; for the t test, however, the same assumptions hold as for the ANOVA (see above). Since the assumptions are not fulfilled in each calculation here, the non-parametric alternative referred to as the Mann–Whitney two-sample rank-sum test is additionally used (Acock 2016) . For multiple pairwise comparisons, the chance of the likelihood of incorrectly rejecting the null hypothesis increases. Thus, the Bonferroni correction is used which compensates for that by testing each pairwise comparison at a significance level of .05/3 = .017 (.05 is the alpha level and 3 is the number of pairwise comparisons). As a measure of effect size in addition to the t test, Cohen’s d is applied. For Cohen (1988) , d = .2 is a small effect, d = .5 a moderate effect, and d = .8 a large effect. The Chi Square test of independence is used in this study to determine if there is a significant association between two nominal (categorical) variables. The frequency of a specific nominal variable is compared with different values of a second nominal variable. The required data can be shown in an R*C contingency table, where R is the row and C is the column. Factors with an influence on citation counts (FICs) In recent years, many different factors have been identified which may influence the number of citations a publication receives. Although these factors turn out to be correlated with citations and causality cannot be assumed (Bornmann and Leydesdorff 2017) , they are generally considered to be influencing factors. On a given time axis, the citations follow the appearance of a publication with specific characteristics (e.g., a specific number of authors or pages). However, one should have in mind for this perspective on the factors that moderating factors might exist. For example, the JIF might count as FIC; however, high citation counts for papers published in high-impact journals could be the result of the Results Before we come in ‘‘Factors with an influence on citation counts (FICs)’’ section to the FICs and their relationship to HPs and DRs, we show in ‘‘Publishing journals and overall citation impact’’ section possible differences between both groups concerning their publishing journal and overall citation impact. Publishing journals and overall citation impact Factors with an influence on citation counts (FICs) With publication year, number of pages, number of references, number of authors, number of countries, and number of subject categories, factors are considered here, which have been (frequently) investigated in former studies. Overviews on studies investigating FICs Journal Journal of Biological Chemistry Physical Review Letters PNAS Journal of Immunology Genomics Clinical Orthopaedics and Related Research Number of papers Percent MNCS (field): F(2, 960) = 125.61, p = .000, g2 = .21 [.16, .25], v2(2) = 518.91, p = .000 Pairwise comparisons: at(1, 636) = - 9.12, p = .000, d = - .72 [- .88, - .56], z = - 12.09, p = .000 bt(1, 644) = 9.92, p = .000, d = .78 [.62, .94], z = 17.46, p = .000 ct(1, 636) = 12.96, p = .000, d = 1.03 [.86, 1.19], z = 19.07, p = .000 MNCS (journal): F(2, 960) = 81.59, p = .000, g2 = .15 [.11, .19], v2(2) = 421.25, p = .000 Pairwise comparisons: at(1, 636) = - 8.38, p = .000, d = - .66 [- .82, - .50], z = - 13.48, p = .000 bt(1, 644) = 4.95, p = .000, d = .39 [.23, .55], z = 12.51, p = .000 ct(1, 636) = 10.00, p = .000, d = .79 [.63, .95], z = 17.80, p = .000 can be found in Peters and van Raan (1994), Onodera and Yoshikane (2014), Didegah and Thelwall (2013) , and Bornmann and Daniel (2008) . The results of the studies indicate that publication year, number of pages, number of references, number of authors, number of countries, and number of subject categories are regarded as possible FICs. The first FIC which we look at in this study is the publication year of the cited paper (Ruano-Ravina and Alvarez-Dardet 2012). Besides the journal or field, respectively, in which a publication appeared the publication year is generally considered in the normalization of citations (Waltman 2016) . Since DRs emerge in the long term, we expected an earlier mean publication year for DRs than for HPs. However, the results in Table 5 show that the empirical evidence looks differently: With M = 1985.2, HPs have been published similarly on average as DRs (M = 1985.6). Furthermore, the differences between the three groups (HP, DR, and RAN) are statistically not significant and the effect sizes are very low. The negligible differences in Table 5 are certainly the result of the use of normalized impact scores for the identification of HPs and DRs. Thus, the results in the table confirm the effectiveness of the normalization procedure used in this study. Table 6 shows the differences in the number of pages between HPs, DRs, and RANs. DRs (M = 9.6, MDN = 8) have more pages than HPs (M = 8.2, MDN = 7) and RANs (M = 7.3, MDN = 6). However, the reported effect sizes in the table are small in general. Standard deviation Median Minimum Maximum Standard deviation Median Minimum Maximum F(2, 957) = 40.15, p = .000, g2 = .08 [.05, .11], v2(2) = 104.85, p = .000 Pairwise comparisons: at(1, 636) = 6.80, p = .000, d = .54 [.38, .70], z = 9.37, p = .000 bt(1, 643) = 6.38, p = .000, d = .50 [.35, .66], z = 8.61, p = .000 ct(1, 635) = - 1.02, p = .31, d = - .08 [- .24, .07], z = - .64, p = .52 and in agreement with most other country-specific statistics including all publications (National Science Board 2016). It follows Great Britain (n = 76), Japan (n = 42), and Germany (n = 39). The USA is the only country in Table 9 with a statistically significant difference in the number of HPs and DRs: With n = 194, significantly more HPs have been published by authors from the USA than DRs (with n = 139). Table 10 shows mean differences in number of countries between HPs, DRs, and RANs. We tested the mean difference since there are evidences that the number of countries is related to the number of citations (see above). However, our results in Table 10 Standard deviation Median Minimum Maximum F(2, 943) = 4.24, p = .02, g2 = .01 [.000, .02], v2(2) = 1.63, p = .44 Pairwise comparisons: at(1, 632) = 1.85, p = .06, d = .15 [- .01, .30], z = 1.49, p = .14 bt(1, 633) = 2.69, p = .01, d = .21 [.06, .37], z = 2.35, p = .02 (n.s.) ct(1, 621) = .18, p = .35, d = .08 [- .08, .23], z = .87, p = .38 reveal that the number of countries does not discriminate between the three groups. The practical significances are small. As a last FIC in this study, we investigated the number of subject categories. The number of subject categories for a paper can be used as an indicator of inter-disciplinarity. We used the WoS subject categories which have been assigned by Clarivate Analytics to the papers on the base of the publishing journals. Table 11 shows the mean differences in number of subject categories between HPs, DRs, and RANs. As the results reveal, the differences are of no practical relevance. Table 12 reports the ten WoS subject categories with the most HPs and DRs: ‘‘Biochemistry & Molecular Biology’’ (n = 68) and ‘‘Physics, Multidisciplinary’’ (n = 42) are those categories where most of the papers from both groups belong to. Also, the table reports the results of statistical significance tests for subject category differences between HPs and DRs. There are five statistically significant results. ‘‘Biochemistry & Molecular Biology’’ (HP = 59, DR = 9), ‘‘Immunology’’ (HP = 34, DR = 6), and ‘‘Cell Biology’’ (HP = 22, DR = 4) published more HPs than DRs. In contrast, the subject categories ‘‘Surgery’’ (HP = 3, DR = 37) and ‘‘Orthopedics’’ (HP = 0, DR = 33) are stronger related to DRs than to HPs. Discussion and conclusions The existence of DRs has attracted a lot of attention in scientometrics and beyond. The people are fascinated by the fact that researchers publish results which are in advance of one’s time. Studies on DRs dealt either with specific cases of DRs (e.g., Marx 2014) or with methods of detecting DRs (e.g., Ke et al. 2015). Also, citation profiles showing other typical distributions than HPs have been proposed. For example, Ye and Bornmann (2018) define the citation angle distinguishing between HPs and DRs. HPs are highly-cited initially, but the impact decreases quickly. Based on a comprehensive dataset of papers published between 1980 and 1990, we searched for HPs and DRs for further analyses in this study. In contrast to many other studies on DRs, we calculated DNIC values and used these scores for the search of HPs and DRs instead of raw citation counts. In this study, we were interested in identifying systematic differences between HPs and DRs. HP DR RAN Total Mean F(2, 958) = 4.88, p = .01, g2 = .01 [.001, .03], v2(2) = 7.05, p = .03 (n.s.) Pairwise comparisons: at(1, 636) = - 3.10, p = .002, d = - .25 [- .40, - .09], z = - 2.78, p = .006 bt(1, 644) = - .93, p = .35, d = - .07 [- .23, .08], z = - .29, p = .78 ct(1, 636) = 2.04, p = .04 (n.s.), d = .16 [.01, .32], z = 2.40, p = .0165 The table shows absolute and relative numbers as well as Pearson v2 values with Bonferroni-adjusted p values (statistically significant results are printed in bold) The investigation of several variables brought about some interesting results. Since this is the first study investigating differences between HPs and DRs, the results cannot be compared with those of other studies. The investigation of the journals which have published HPs and DRs revealed that some journals (e.g. Physical Review Letters and PNAS) were able to publish significantly more HPs than other journals. This pattern did not appear in DRs in this study. Here, the distribution of papers across journals is similar to that in a random sample. However, this result does not agree to the results of van Raan (2015). He found specific patterns also for DRs. He identified institutions (e.g. MIT) that have more DRs than can be expected based on their relative contribution to the field (in his case: physics). The same was found for journals, particularly Physical Review B and Nuclear Physics B. Based on the results, van Raan (2015) stated that ‘‘a new and interesting question arises whether this type of observations could say something about institutions which are more prone than other institutions to accepting (and publishing) out-of-the-box work’’. In terms of the MNCS (based on single journals or fields), HPs and DRs received impact scores which are significantly above average. However, the citation impact of the DRs is significantly higher than that of the HPs. Many HPs and DRs have been published by authors from the USA; however, in contrast to other countries, authors from the USA have published statistically significantly more HPs than DRs. For other countries, the differences between HPs and DRs are statistically not significant. The WoS subject categories in which the most HPs and DRs have been published are ‘‘Biochemistry & Molecular Biology’’ and ‘‘Physics, Multidisciplinary.’’ Whereas ‘‘Biochemistry & Molecular Biology,’’ ‘‘Immunology,’’ and ‘‘Cell Biology’’ have published significantly more HPs than DRs, the opposite result arrived for ‘‘Surgery’’ and ‘‘Orthopedics.’’ The investigation of HPs and DRs with regard to FICs (e.g., the number of authors) show that HPs have significantly more authors and more (linked) references than DRs/RANs. The results of this study indicate that especially HPs are differently with respect to certain properties from RANs (e.g. the number of authors), but not necessarily DRs. Our results suggest therefore that the emergence of DRs is an unpredictable process which cannot be fixed by certain properties of the papers. With HPs, this prediction might be possible to a certain extent (Yu et al. 2014) . However, this study was a first initial step of analyzing HPs and DRs in comparison. It would be interesting, if future studies address the topic of differences between both groups by using data from other bibliometric databases (especially subject specific databases, as the chemistry-related CA database or the economics RePEc database). These studies could investigate similar variables as those in this study in order to test whether the results of this study can be confirmed. The inclusion of additional variables could reveal further insights in both phenomena: HPs and DRs. Of special interest are variables which cannot be gathered in WoS. So, it could be tested whether the publication of HPs and DRs are related to certain characteristics of authors (e.g. their gender or nationality) or their institutions. Are there certain groups of authors which have published more DRs in the past than can be expected? In this study, we used field-normalized scores to identify HPs and DRs. Many papers in the WoS database do not only belong to one but so several fields. Thus, it would be interesting to identify those papers in future studies, which are ‘‘normal’’ in one field, but DRs or HPs, respectively, in another. Acknowledgements Open access funding provided by Max Planck Society. We acknowledge the National Natural Science Foundation of China Grant No. 71673131. We thank Simon S. Li for support in program coding and computing. The bibliometric data used in this paper are from an in-house database developed and maintained by the Max Planck Digital Library (MPDL, Munich) and derived from the Science Citation Index Expanded (SCI-E), Social Sciences Citation Index (SSCI), Arts and Humanities Citation Index (AHCI) prepared by Clarivate Analytics, formerly the IP & Science business of Thomson Reuters. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Acock , A. C. ( 2016 ). A gentle introduction to Stata (5th ed .). College Station: Stata Press. Baumgartner , S. E. , & Leydesdorff , L. ( 2014 ). Group-based trajectory modeling (GBTM) of citations in scholarly literature: Dynamic qualities of ''transient'' and ''sticky knowledge claims'' . Journal of the Association for Information Science and Technology , 65 ( 4 ), 797 - 811 . https://doi.org/10.1002/asi. 23009. Beaver , D. B. ( 2004 ). Does collaborative research have greater epistemic authority? Scientometrics , 60 ( 3 ), 399 - 408 . Bornmann , L. , Bowman , B. F. , Bauer , J. , Marx , W. , Schier , H. , & Palzenberger , M. ( 2014 ). Bibliometric standards for evaluating research institutes in the natural sciences . In B. Cronin & C. Sugimoto (Eds.), Beyond bibliometrics: harnessing multidimensional indicators of scholarly impact (pp. 201 - 223 ). Cambridge: MIT Press. Bornmann , L. , & Daniel , H. -D. ( 2008 ). What do citation counts measure? A review of studies on citing behavior . Journal of Documentation , 64 ( 1 ), 45 - 80 . https://doi.org/10.1108/00220410810844150. Bornmann , L. , & Leydesdorff , L. ( 2017 ). Skewness of citation impact data and covariates of citation distributions: A large-scale empirical analysis based on Web of Science data . Journal of Informetrics , 11 ( 1 ), 164 - 175 . Cohen , J. ( 1988 ). Statistical power analysis for the behavioral sciences (2nd ed .). Hillsdale: Lawrence Erlbaum Associates, Publishers. Comins , J. A. , & Leydesdorff , L. ( 2016 ). Identification of long-term concept-symbols among citations: Can documents be clustered in terms of common intellectual histories? Retrieved January 5 , 2016 , from http://arxiv.org/abs/1601.00288. Costas , R., van Leeuwen, T. N. , & van Raan , A. F. J. ( 2010 ). Is scientific literature subject to a 'Sell-ByDate'? A general methodology to analyze the 'durability' of scientific documents . Journal of the American Society for Information Science and Technology , 61 ( 2 ), 329 - 339 . https://doi.org/10.1002/asi. 21244. Cressey , D. ( 2015 ). 'Sleeping beauty' papers slumber for decades. Research identifies studies that defy usual citation patterns to enjoy a rich old age . Retrieved April 26 , 2016 , from http://www.nature.com/news/ sleeping-beauty -papers-slumber-for-decades- 1 . 17615 . Didegah , F. , & Thelwall , M. ( 2013 ). Determinants of research citation impact in nanoscience and nanotechnology . Journal of the American Society for Information Science and Technology , 64 ( 5 ), 1055 - 1064 . https://doi.org/10.1002/asi.22806. Fok , D. , & Franses , P. H. ( 2007 ). Modeling the diffusion of scientific publications . Journal of Econometrics , 139 ( 2 ), 376 - 390 . https://doi.org/10.1016/j.jeconom. 2006 . 10 .021. Garfield , E. ( 1970 ). Would Mendel's work have been ignored if the Science Citation Index was available 100 years ago ? Essays of an Information Scientist , 1 , 69 - 70 . Garfield , E. ( 1980 ). Premature discovery or delayed recognition-why . Current Contents , 21 , 5 - 10 (Reprinted in: Garfield, E. Essays of an information scientist . Philadelphia: ISI Press, 1979 - 1980 , Vol. 4 , 488 - 493 ). Garfield , E. ( 1989a ). Delayed recognition in scientific discovery-citation frequency-analysis aids the search for case-histories . Current Contents , 23 , 3 - 9 . Garfield , E. ( 1989b ). More delayed recognition. 1. Examples from the genetics of color-blindness, the entropy of short-term-memory, phosphoinositides, and polymer rheology . Current Contents , 38 , 3 - 8 . Garfield , E. ( 1990 ). More delayed recognition. 2. From inhibin to scanning electron-microscopy . Current Contents , 9 , 3 - 9 . Gillmor , C. S. ( 1975 ). Citation characteristics of JATP literature . Journal of Atmospheric and Terrestrial Physics , 37 ( 11 ), 1401 - 1404 . Gla ¨nzel, W. , & Garfield , E. ( 2004 ). The myth of delayed recognition . Scientist , 18 ( 11 ), 8 . Gla ¨nzel, W., Schlemmer , B. , & Thijs , B. ( 2003 ). Better late than never? On the chance to become highly cited only beyond the standard bibliometric time horizon . Scientometrics , 58 ( 3 ), 571 - 586 . Gorry , P. , & Ragouet , P. ( 2016 ). ''Sleeping beauty'' and her restless sleep: Charles Dotter and the birth of interventional radiology . Scientometrics , 107 ( 2 ), 773 - 784 . https://doi.org/10.1007/s11192-016-1859-8. Haustein , S. , Larivie`re, V., & Bo¨rner, K. ( 2014 ). Long-distance interdisciplinary researchleads to higher citation impact . In P. Wouters (Ed.), Proceedings of the science and technology indicators conference 2014 Leiden ''Context Counts: Pathways to Master Big and Little Data'' (pp. 256 - 259 ). Leider, The Netherlands: University of Leiden. Hegarty , P. , & Walton , Z. ( 2012 ). The consequences of predicting scientific impact in psychology using journal impact factors . Perspectives on Psychological Science , 7 ( 1 ), 72 - 78 . https://doi.org/10.1177/ 1745691611429356. Huang , T. C. , Hsu , C. , & Ciou , Z. J. ( 2015 ). Systematic methodology for excavating sleeping beauty publications and their princes from medical and biological engineering studies . Journal of Medical and Biological Engineering , 35 ( 6 ), 749 - 758 . https://doi.org/10.1007/s40846-015-0091-y. Iribarren-Maestro , I. , Lascurain-Sanchez , M. L. , & Sanz-Casado , E. ( 2007 ). Are multi-authorship and visibility related? Study of ten research areas at Carlos III University of Madrid. In D. Torres-Salinas & H. F. Moed (Eds.), Proceedings of the 11th conference of the international society for scientometrics and informetrics (Vol. 1 , pp. 401 - 407 ). Madrid, Spain: Spanish Research Council (CSIC). Ke , Q. , Ferrara , E. , Radicchi , F. , & Flammini , A. ( 2015 ). Defining and identifying sleeping beauties in science . Proceedings of the National Academy of Sciences , 112 ( 24 ), 7426 - 7431 . https://doi.org/10. 1073/pnas.1424329112. Kline , R. B. ( 2004 ). Beyond significance testing: Reforming data analysis methods in behavioral research . Washington, DC: American Psychological Association. Vanclay , J. K. ( 2013 ). Factors affecting citation rates in environmental science . Journal of Informetrics , 7 ( 2 ), 265 - 271 . https://doi.org/10.1016/j.joi. 2012 . 11 .009. Vinkler , P. ( 2010 ). The evaluation of research by scientometric indicators . Oxford: Chandos Publishing. Waltman , L. ( 2016 ). A review of the literature on citation impact indicators . Journal of Informetrics , 10 ( 2 ), 365 - 391 . Waltman , L., van Eck , N., van Leeuwen , T. , Visser , M. , & van Raan , A. ( 2011a ). Towards a new crown indicator: An empirical analysis . Scientometrics , 87 ( 3 ), 467 - 481 . https://doi.org/10.1007/s11192-011- 0354-5. Waltman , L., van Eck , N. J., van Leeuwen, T. N. , Visser , M. S. , & van Raan , A. F. J. ( 2011b ). Towards a new crown indicator: Some theoretical considerations . Journal of Informetrics , 5 ( 1 ), 37 - 47 . https://doi. org/10.1016/j.joi. 2010 . 08 .001. Wang , J. ( 2013 ). Citation time window choice for research impact evaluation . Scientometrics , 94 ( 3 ), 851 - 872 . https://doi.org/10.1007/s11192-012-0775-9. Webster , G. D. , Jonason , P. K. , & Schember , T. O. ( 2009 ). Hot topics and popular papers in evolutionary psychology: Analyses of title words and citation counts in Evolution and Human Behavior, 1979 - 2008 . Evolutionary Psychology, 7 ( 3 ), 348 - 362 . Wesel , M. , Wyatt , S. , & Haaf , J. ( 2013 ). What a difference a colon makes: How superficial factors influence subsequent citation . Scientometrics . https://doi.org/10.1007/s11192-013-1154-x. Ye , F. Y. , & Bornmann , L. ( 2018 ). ''Smart Girls' ' versus ''Sleeping Beauties'' in the sciences: The identification of instant and delayed recognition by using the citation angle . Journal of the Association of Information Science and Technology , 69 ( 3 ), 359 - 367 . Yu , T. , Yu , G. , Li , P.-Y., & Wang , L. ( 2014 ). Citation impact prediction for scientific papers using stepwise regression analysis . Scientometrics , 101 ( 2 ), 1233 - 1252 . https://doi.org/10.1007/s11192-014-1279-6.


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1007%2Fs11192-018-2772-0.pdf

Lutz Bornmann, Adam Y. Ye, Fred Y. Ye. Identifying “hot papers” and papers with “delayed recognition” in large-scale datasets by using dynamically normalized citation impact scores, Scientometrics, 2018, 1-20, DOI: 10.1007/s11192-018-2772-0