Country over-citation ratios

Scientometrics, Aug 2017

There is a clear tendency for authors of scientific papers to over-cite the papers by their fellow countrymen (and countrywomen) relative to the percentage presence of their papers in world output in the same field. We investigated the Over-Citation Ratio (OCR) as a function of this percentage, and the effects of different scientific fields and publication years. For cancer research, we also compared clinical with basic research. We found that the OCR for a given percentage presence has been decreasing over the period 1980–2010, probably because of better communications. It is greater for fields of relatively more national interest (chemistry, ornithology) and less for those of international concern (astronomy, diabetes, cancer). It may also be slightly greater for basic cancer research than for clinical work. The OCR values given allow other types of citation, such as the references on clinical practice guidelines and papers featured in newspaper stories, to be put in context: are they unusually nationalistic, or typical of normal citation behaviour?

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://link.springer.com/content/pdf/10.1007%2Fs11192-017-2490-z.pdf

Country over-citation ratios

Victoria Bakare 0 1 Grant Lewison 0 1 Evaluation 0 1 0 Department of Cancer Studies, Guy's Hospital, King's College London , Great Maze Pond, London SE1 9RT , UK 1 & Grant Lewison There is a clear tendency for authors of scientific papers to over-cite the papers by their fellow countrymen (and countrywomen) relative to the percentage presence of their papers in world output in the same field. We investigated the Over-Citation Ratio (OCR) as a function of this percentage, and the effects of different scientific fields and publication years. For cancer research, we also compared clinical with basic research. We found that the OCR for a given percentage presence has been decreasing over the period 1980-2010, probably because of better communications. It is greater for fields of relatively more national interest (chemistry, ornithology) and less for those of international concern (astronomy, diabetes, cancer). It may also be slightly greater for basic cancer research than for clinical work. The OCR values given allow other types of citation, such as the references on clinical practice guidelines and papers featured in newspaper stories, to be put in context: are they unusually nationalistic, or typical of normal citation behaviour? Citing papers; Country data - Self-citations Introduction This paper is concerned with the practice whereby researchers from a given country tend to over-cite research by their fellow-countrymen (and, of course, their fellow-countrywomen). In practice, we investigate how often research by a given country is cited by papers from the same country. One might suppose that if a country’s research in a particular field of science represented (say) 1% of world output, then a ‘‘fair’’ quota of citations from the country would be 1% of the total. In practice it is almost always more than this, so we say that the percentage of the citing papers, divided by the country’s presence in the field, is the over-citation ratio, denoted as OCR. Why is this important? There are two main reasons. First, the opposite side of the coin, so to speak, is the percentage of citations that come from other countries, which could be an indicator of the international impact of the work. This indicator needs to be carefully normalised to take account of the various factors that can influence it, and the paper goes into some detail about these. The second reason for the study is that the OCR for research papers cited by papers in the serial literature provides a benchmark for the evaluation of other types of citations. For example, Grant et al. (2000) noted that a sample of 15 UK clinical practice guidelines (CPGs) had over-cited British research papers that contributed to their evidence base by a factor of 2.5. Was this unusually high or unusually low? The authors didn’t say, because there was no benchmark to guide them. Similarly, Lewison et al. (2008) found that the British Broadcasting Corporation’s website over-cited UK cancer research papers by a factor of six, but the time period involved and the subject matter were different from those of the papers found by Grant et al. In this paper we proceed to investigate the country OCR as a parameter of different sets of papers: how it depends on the country of authorship of the papers, their date, and their subject matter and research level (from applied or clinical to basic research). It is not meant to provide exhaustive data but rather to describe a methodology that can be applied to a given set of papers. It will also give enough results that the main features of OCR can be discerned, and used to show if any given value is higher or lower than expected. Two hypotheses will be explored: the first, the subjects of relatively local or national interest will lead to higher country OCRs than ones of international interest, and second, that in biomedical subjects, clinical papers will also give higher OCRs than ones on basic research. Self-citation can take place at many different levels. The one most commonly addressed in the literature is that of the individual author, who may be thought to boost her ranking by this means. But it appears that author self-citations may be correlated with higher citations by others (Fowler and Aksnes 2007) , and they do not account for the increase in the citation of multi-national papers (van Raan 1998) . Although some authors argue that they should be discounted (Ferrara and Romero 2013) there is little evidence that their removal would affect most indicators (Gla¨nzel et al. 2006) . Self-citation can also occur at the institutional level (Thijs and Gla¨nzel 2006; Hendrix 2009; Gul et al. 2017) and at the language level, especially in the social sciences (Yitzhaki 1998; Egghe et al. 1999) . It has been suggested that journals can improve their impact factors through the encouragement of journal self-citation or the publication of editorials and letters, whose citations count in the numerator but whose numbers do not count in the denominator of the impact factor calculation. For a few this may happen (Nisonger 2000; Campanario and Gonzalez 2006) , but the practice does not appear to be widespread (Andrade et al. 2009) or that journal selfcitation rates correlate negatively with overall impact factors (Motamed et al. 2002) . At the highest level of self-citation, namely country OCR, there have been very few papers so far (Minasny et al. 2010; Jaffe 2011) . The latter paper noted a recent increase in country selfcitation rates but, as we will show, there may be a good reason for this and it does not necessarily betoken a decline in international scientific impact. Methodology The analysis is based on data from the Web of Science (WoS) Clarivate Analytics for six scientific fields: astronomy, birds, cancer, chemistry, diabetes and engineering. The sets of papers included just articles, and were taken from seven publication years: 1980, 1985, 1990, 1995, 2000, 2005 and 2010. For each combination of field and year, papers from the top 20 countries (in 2010) were considered. For each cohort, the citation analysis was provided by the WoS, and the collection of citing papers was then examined. Those from the five years following the publication of the cited papers, including the publication year, were isolated, and the numbers from the given country were counted as a percentage of this total number. This percentage was compared with the country’s percentage presence in the field in the given year to give the over-citation ratio, all the counts being integer country counts of papers. However, only countries with 1% or more of the world total of papers in the given year and with an OCR \100 were retained for our analysis as there would have been too much scatter for countries with smaller outputs. The fields of astronomy and birds were defined by means of filters containing a simple list of specialist journals and another list of title words. Examples of title words for these two fields are given below. Astronomy : TI ¼ ðASTEROID or BOOTES or COMPACT-BINARY or DARK-MATTERÞ Birds : TI ¼ ðAVIAN or BIRDSONG or CORMORANT or DUCKS or EAGLEÞ These two fields were chosen to be as international as possible (astronomy), and relatively local or regional (birds). The filters for cancer and diabetes papers were similarly defined but with much longer lists: that for cancer had 185 specialist oncology journals and 323 title words or phrases. On the other hand, the filters for chemistry and engineering were based on the topic function in the WoS: TS = CHEMISTRY and TS = ENGINEERING. In cancer, a distinction was made between clinical papers and basic ones, based on words in their titles (Lewison and Paraje 2004) so that we could see if country OCR differed between three groups: papers classed as clinical (but excluding basic ones), papers classed as basic (but excluding clinical ones), and papers classed as both clinical and basic. So there were effectively eight different subject areas used in this study. The OCR for each of the 20 countries was plotted against the country’s presence in the field in the given year (n, %). These graphs were on log–log scales, and a best-fit regression line was added with a power law: OCR = M 9 nJ. For all these graphs, the correlation was usually very good, typically with r2 * 0.9. The multiplier M in the equation giving the best fit was then plotted against time: this represented the OCR for a country publishing 1% of world papers. From the best-fit equation we also calculated the estimated value of OCR for a country publishing 3 and 10% of world papers. Results We were initially somewhat surprised that the correlation between a country’s percentage presence in research in a given subject area and its OCR was so strong; for years in which several of the top 20 countries were publishing \1% of the total, the correlation was even Country Argentina Australia Belgium Brazil Canada Switzerland Chile China (P.R.) Germany Denmark ISO Country AR AU BE BR CA CH CL CN DE DK Spain France Hungary India Italy Japan Mexico Netherlands Norway New Zealand ISO ES FR HU IN IT JP MX NL NO NZ Country Poland Portugal Russia Sweden Singapore United Kingdom United States South Africa ISO PL PT RU SE SG UK US ZA stronger when these were omitted from the analysis. Figure 1 shows a fairly typical result, that for research on birds in 2010. The least-squares correlation line was given by a power law relationship of the type: OCR ¼ M pJ where M is a multiplier, and equal to the OCR for p = 1%, p is the country’s percentage presence and J is an index. It rapidly became apparent that although the index, J, varied somewhat from year to year, it tended to become less negative with time. However, M, the multiplier, was substantially greater for the earlier years than for the later ones, clearly indicating that country OCR was tending to become less pronounced in later years. This is probably because improved communication means that the diffusion of knowledge has become more international recently. These trends were found for all five of the different scientific fields (ASTRO, BIRDS, CHEM, DIABE, ENGR). Figures 2, 3 and 4 show the variation of M, the OCR for a country with 1, 3 and 10% of world output, respectively, with time for the five fields of research. (Note the scale change of the ordinate axis between the three graphs.) This decline is very evident, although there is some fluctuation from year to year. There is also a fairly consistent pattern: OCR is highest for two fields with particularly local interests (chemistry, birds) and lowest for two with a more international outlook (astronomy, diabetes). It also appears for the three sets of cancer research papers, see Figs. 5, 6 and 7. It is also shown in terms of the mean values of M (OCR at 1%) and J, the index, shown in Table 2. These three graphs show that there is surprisingly little difference in the over-citation ratio for the three groups of cancer research papers. In the 21st century, for which there are more countries, and many more papers, the pattern suggests that basic research is overcited by own countrymen slightly more than clinical work, but the differences are much smaller than for the different scientific fields discussed previously. The mean values of M and J shown in Table 2 indicate that cancer research is rather similar in terms of OCR to diabetes research, another medical specialty, and that there is no obvious correlation with the research level of the papers. Even with the large numbers of papers in the WoS in the various subject areas that we have considered, there is rather a lot of scatter in the results. This is seen in the figures where the lines do not form clear patterns either with time or with subject area. Nevertheless, there are three very clear conclusions: • • OCR is much greater for countries with small scientific output. There is a negative power relationship of the form, OCR = M pJ where p is the percentage presence of the country in the field, M is a multiplier between 10 and 30, and J is an index between -0.68 and -0.90; OCR has tended to decrease with time between the 1980s and 2005–2010, probably because of easier (and cheaper) international communications; OCR is higher for scientific fields of more national interest, such as ornithology (and chemistry) and lower for ones that are more universal, such as astronomy and medicine. The paper describes a methodology that can be applied to any selected field and it gives results that show the extent of own country OCR in scientific papers in the serial literature. These values will provide a baseline with which the over-citation ratio of other document types can be compared to show whether they are more, or less, nationalistic than the countries’ researchers. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Andrade , A. , Gonzalez-Jonte , R. , & Campanario , J. M. ( 2009 ). Journals that increase their impact factor at least fourfold in a few years: the role of journal self-citations . Scientometrics , 80 ( 2 ), 515 - 528 . Campanario , J. M. , & Gonzalez , L. ( 2006 ). Journal self-citations that contribute to the impact factor: documents labeled ''editorial material'' in journals covered by the science citation index . Scientometrics , 69 ( 2 ), 365 - 386 . Egghe , L. , Rousseau , R. , & Yitzhaki , M. ( 1999 ). The ''own-language preference'': measures of relative language self-citation . Scientometrics , 45 ( 2 ), 217 - 232 . Ferrara , E. , & Romero , A. E. ( 2013 ). Scientific impact evaluation and the effect of self-citations: mitigating the bias by discounting the h-index . Journal of the American Society for Information Science and Technology , 64 ( 11 ), 2332 - 2339 . Fowler , J. H. , & Aksnes , D. W. ( 2007 ). Does self-citation pay? Scientometrics , 72 ( 3 ), 427 - 437 . Gla ¨nzel, W., Debackere , K. , Thijs , B. , & Schubert , A. ( 2006 ). A concise review on the role of author selfcitations in information science, bibliometrics and science policy . Scientometrics , 67 ( 2 ), 263 - 277 . Grant , J. , Cottrell , R. , Cluzeau , F. , & Fawcett , G. ( 2000 ). Evaluating ''payback'' on biomedical research from papers cited in clinical guidelines: applied bibliometric study . BMJ , 320 , 1107 - 1111 . Gul , S. , Shah , T. A. , & Shafiq , H. ( 2017 ). The prevalence of synchronous self-citation practices at the institutional level . Malaysian Journal of Library & Information Science , 22 ( 1 ), 1 - 14 . Hendrix , D. ( 2009 ). Institutional self-citation rates: a three year study of universities in the United States . Scientometrics, 81 ( 2 ), 321 - 331 . Jaffe , K. ( 2011 ). Do countries with lower self-citation rates produce higher impact papers? or does humility pay? Interciencia , 36 ( 9 ), 694 - 698 . Lewison , G. , & Paraje , G. ( 2004 ). The classification of biomedical journals by research level . Scientometrics , 60 ( 2 ), 145 - 157 . Lewison , G. , Tootell , S. , Roe , P. , & Sullivan , R. ( 2008 ). How do the media report cancer research? A study of the UK's BBC website . British Journal of Cancer , 99 , 569 - 576 . Minasny , B. , Hartemink , A. E. , & McBratney , A. ( 2010 ). Individual, country, and journal self-citation in soil science . Geoderma , 155 ( 3-4 ), 434 - 438 . Motamed , M. , Mehta , D. , Basavaraj , S. , & Fuad , F. ( 2002 ). Self citations and impact factors in otolaryngology journals . Clinical Otolaryngology , 27 ( 5 ), 318 - 320 . Nisonger , T. E. ( 2000 ). Use of the Journal Citation Reports for serials management in research libraries: an investigation of the effect of self-citation on journal rankings in library and information science and genetics . College & Research Libraries , 61 ( 3 ), 263 - 275 . Thijs , B. , & Gla¨nzel, W. ( 2006 ). The influence of author self-citations on bibliometric meso-indicators . The case of European universities. Scientometrics , 66 ( 1 ), 71 - 80 . van Raan , A. F. J. ( 1998 ). The influence of international collaboration on the impact of research resultssome simple mathematical considerations concerning the role of self-citations . Scientometrics , 42 ( 3 ), 423 - 428 . Yitzhaki , M. ( 1998 ). The 'language preference' in sociology: measures of 'language self-citation', 'relative own-language preference indictor' and 'mutual use of languages' . Scientometrics , 41 ( 1-2 ), 243 - 254 .


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1007%2Fs11192-017-2490-z.pdf

Victoria Bakare, Grant Lewison. Country over-citation ratios, Scientometrics, 2017, 1-9, DOI: 10.1007/s11192-017-2490-z