Using Google Scholar institutional level data to evaluate the quality of university research
Using Google Scholar institutional level data to evaluate the quality of university research
John Mingers 0
Jesse R. O'Hanley 0
Musbaudeen Okunola 0
Musbaudeen Okunola Moo 0
0 Kent Business School, University of Kent , Canterbury CT2 7FS , UK
In recent years, the extent of formal research evaluation, at all levels from the individual to the multiversity has increased dramatically. At the institutional level, there are world university rankings based on an ad hoc combination of different indicators. There are also national exercises, such as those in the UK and Australia that evaluate research outputs and environment through peer review panels. These are extremely costly and time consuming. This paper evaluates the possibility of using Google Scholar (GS) institutional level data to evaluate university research in a relatively automatic way. Several citationbased metrics are collected from GS for all 130 UK universities. These are used to evaluate performance and produce university rankings which are then compared with various rankings based on the 2014 UK Research Excellence Framework (REF). The rankings are shown to be credible and to avoid some of the obvious problems of the REF ranking, as well as being highly efficient and cost effective. We also investigate the possibility of normalizing the results for the university subject mix since science subjects generally produce significantly more citations than social science or humanities.
Scientometrics; Research evaluation framework (REF); Google Scholar; Research excellence
In recent years, there has been a major impetus to measure the quality of research at all
levels, from the individual paper right up to the university and multiversity
(Daraio et al.
2015; Millot 2015; Mingers and Leydesdorff 2015)
. Looking particularly at the university
level, there are several approaches. First, there are global lists that rank the major
universities throughout the world, for example the Times Higher World University Rankings
(https://www.timeshighereducation.com/world-university-rankings), the Shanghai
Academic Ranking of World Universities (http://www.shanghairanking.com), and the
Quacquarelli Symonds World University Ranking
(https://www.topuniversities.com/qs-worlduniversity-rankings). These are all similar in collecting data on a range of factors such as
teaching, research, and status or reputation and then combining them, often in a subjective
way, to arrive at an overall quality metric
(Hazelkorn 2015; Holmes 2013)
Second, there are single country evaluations, usually by peer review. The most
wellknown and elaborate is the UK’s research assessment exercise, currently called the
Research Excellence Framework (REF) (http://www.ref.ac.uk)
(Broadbent 2010; Moed
, although there are similar exercises in Australia—Excellence in Research for
Australia (http://www.arc.gov.au/excellence-research-australia), New
Zealand—Performance Based Research Fund
(https://moetec.cwp.govt.nz/assets/Forms-templates-andguides/PBRF-staff-guide.pdf), and Italy (Rebora and Turri 2013) for example. These
generally involve university departments submitting samples of their research outputs,
which are then evaluated by peer review panels. The process is time-consuming and
extremely costly. The 2014 REF, for example, is believed to have cost over £250 m
All of these methods rely, to a greater or lesser degree, on subjective judgement,
whether it is the peer review of individual papers, judging the status of universities or how
to combine different measures together. Also, they are all to some extent opaque in their
methodologies—for example the REF does not give out grades by individual paper or
person1 and the worldwide exercises do not reveal their background data. Bibliometric
data, such as citations, is only used to a limited extent
(Gla¨nzel and Debackere 2009)
In this paper, we explore the extent to which readily available and free bibliometric data
can be used to generate a credible evaluation of university research in a way that is
transparent, non-subjective, and reasonably speedy. One could argue that the whole idea of
producing such rankings should be avoided
(Adler and Harzing 2009)
, but our view is that
they do exist and are unlikely to go away, so we should try and ensure that they are as fair,
transparent, and harmless as possible.
The data we will use is the institutional level data on citations that is available, although
not widely known about, from Google Scholar (GS). Most academics will be aware that
they may have a profile in GS which documents their institutional affiliation, their papers,
and the citations they have received—if you type your own name into the GS search box
you will see if you have a user profile. But it is also possible to search for an institutional
domain name, e.g., ‘‘kent.ac.uk’’, and then GS will produce data on all the researchers that
are registered to that domain. This, in effect, produces citation data for the whole
institution, which can be used to compare institutions in aggregate.
We collected this data for 130 UK university level institutions, all of which were
included in the 2014 REF. We explore various metrics based on this data and compare it
1 Indeed, after the 2008 exercise, the reviewers were explicitly told to destroy any notes they may have
made during the process to avoid FOI requests or litigation.
with three metrics available from the REF—GPA, power, and intensity. We compare both
in absolute terms and in terms of rankings. We then consider the issue of normalization. It
is well-known that the sciences, especially medicine and biology, cite much more
frequently than the social science or humanities
(Bornmann and Marx 2015; Waltman and
van Eck 2013)
. To account for this, citation data should be normalized in some way. For a
paper, this would be relative to the level of references or citations in a given field. At the
university level, the problem is that universities have different mixes of subjects, some
being almost exclusively science-based, others having virtually no science at all. The latter,
therefore, would be disadvantaged in terms of absolute citation numbers. We explore a
method of correcting the data for this effect.
Methodology and data collection
There are two primary sources of citation data—Web of Science (WoS) and Scopus, which
collect and validate data from a subset of journals, and Google Scholar, which searches the
web looking for citations to specific papers and books. The strengths and weaknesses of the
two sources have been well documented
(Adriaanse and Rensleigh 2013; Bornmann et al.
2016; Delgado-Lo´ pez-Co´ zar and Cabezas-Clavijo 2012; Harzing and Alakangas 2016;
Mingers and Lipitakis 2010; Prins et al. 2016)
. Broadly speaking, WoS and Scopus
produce high-quality and comprehensive data for the journals that they cover, but they do not
generally include books and their coverage is incomplete, especially in the social sciences
(around 50%) and arts and humanities (around 30%). GS has a very good coverage (up to
90%) and is roughly the same for all disciplines. On the other hand, its data can be
unreliable; often generating multiple versions of the same paper, and it sometimes includes
non-research outputs such as teaching notes and home pages.
The user interface to GS is simplistic and it offers few facilities, for example field lists
of journals, but because of its coverage it is an important source of data for the social
sciences and arts and humanities. It is also the case that there is little documentation
available and, in fact, this paper utilizes a search facility that is little known because it is
undocumented.2 With a web browser, one can type the following search into the address
search_authors where xxxxxx.xxx is the domain name of the institution one is interested
in. For example, ‘‘kent.ac.uk’’ would return the University of Kent in the UK.3
The result is a list of the academics affiliated with the institution (through their email
domain name) that GS has recorded. The results appear in order of total number of GS
citations for each academic, ten per page, but further pages can be searched until the list is
exhausted. Moreover, if one clicks on an individual, you will access his/her GS profile,
which includes a list of papers and further citation statistics, including h-index
et al. 2008; Egghe 2010; Franceschini and Maisano 2010; Xu et al. 2015)
and i10-index (a
GS-specific metric that is the number of papers with at least 10 citations), together with
alternative versions of each for the past 5 years.
2 We are grateful to Isidro Aguillo for this information.
3 One can achieve a similar but not identical result by typing just the domain name into the GS search box
and then click ‘‘user profiles’’.
We should at this point consider the accuracy of the Google Scholar data. Numerous
studies have shown that there is a significant degree of error in the data although this is
counterbalanced by the greater coverage in terms of both type of output and discipline
(Delgado-L o´pez-Co´ zar and Cabezas-Clavijo 2012; Garc´ıa-Pe´rez 2010; Harzing
2013, 2014; Mart´ın-Mart´ın et al. 2014; Prins et al. 2016)
. The main studies have concerned
errors in the citations themselves—multiple records for essentially the same paper and
citations from a range of non-research sources. This is of some concern but in this study we
are working at a high level of aggregation—whole institutions—and there is no reason to
suspect that it will affect particular institutions differentially. We are concerned with the
relative number of citations, not the absolute number.
Of more concern is a different source of error—the accuracy of the list of academics in
the institutional profile. This depends on scholars actually having a Google profile, which
in turn is related to their email address. There can be errors of two kinds—commission and
omission. The former is when an academic appears in the list but should not, perhaps
because they have left the institution; the latter is when they do not appear in the list but
should, perhaps because they do not have a Google profile. It is not possible to evaluate the
level of error in general because we cannot access institutions’ HR databases but we can do
so for our own institution—the University of Kent.
In terms of commission errors, two staff had retired but were Emeritus Professors, two
had left but still had honorary status and thus valid emails, and one had indeed left but a
search showed no subsequent post to replace Kent. In terms of omission errors, this is
harder to spot as you need to find researchers at an institution who were not in the list but
might have high citations. We used Scopus SciVal to identify researchers at Kent together
with their citations
(which were only between 2014 and 2016)
which gave us a list of
highly cited staff. We then looked each one up Google Scholar to estimate if they should
have been included in the list (this was only an estimate because the error checking was
carried out some time after the original data collection). We identified two researchers who
we thought should have been included.
Overall, the error rate on this admittedly small sample seems to be only around 5%. We
feel that this is not a high error rate and there is no reason to suppose that there will be
systematic differences between universities. In time, as Google Scholar becomes used
more extensively for research evaluation, we would expect universities to monitor this and
ensure both that their staff have profiles, and that out of date profiles are removed thus
improving the quality of data.
The idea of this paper is to calculate a central tendency of these metrics, e.g., the mean
or median, as a measure of the research impact of the institution as a whole and then
compare the resulting rankings with rankings produced by the REF. There are several
decisions in the data collection that one must consider:
How many academics should be used from each institution? Taking a small sample of
universities showed that the number of academics varied widely, from many hundreds
for a large research intensive institution, to only 20 or 30 for a small teaching
institution. There is, in fact, a test version of a worldwide university ranking based on
GS data (http://www.webometrics.info/en/node/169). This, however, uses only the first
page of data, i.e., the top 10 academics, while ignoring the first academic for reasons of
‘‘representativeness’’. It then uses the total of the remaining academics’ GS citations.
Our view was that ten academics were insufficient to give proper representation,
especially for universities that did not have much science since it could be dominated
by a small group in a field of very high citations.
On the other hand, going into the hundreds would not work for those institutions with
only a small number of academics. As a compromise, we used the top 50 academics.
What should be done with institutions having less than 50 academics? The issue here is
whether to examine less than 50, or to add zero entries to reach 50. Given that we will
be using the central tendency, we opted for the second option, which is also what the
REF did when less than the required minimum of four papers was submitted.
Which primary metric should be used? There are six choices—citations, i10-index and
h-index, either total or over the previous 5 years. We rejected the i10 indicator as it is
not well researched and only by GS. We collected data for the others and the final
choice is discussed in the results section.
Which measure of central tendency should be used—mean or median? Again, we
investigated both and discuss this in the results.
We began the data collection manually, but it turned out to be a significant task, as there
were approximately 130 universities and each one required 50 9 6 data items to be
recorded. It was subsequently decided to automate the process by coding a bespoke
program in R to scrape the data from GS. This turned out to be far quicker once the program
was written and tested.
The REF data was obtained from the HEFCE data site (http://www.ref.ac.uk) and from
the Times Higher
, which produced the ranking tables.
In terms of the REF data, there are three different metrics available. The major one,
which is used to create the main ranking table, is known as the grade point average (GPA).
The 2014 REF methodology was complex. Institutions submitted selected research active
staff (the number decided by each institution) and for each staff member up to four
research outputs were provided. They also submitted a research environment statement and
a set of impact case studies that depended on the number of staff submitted. To give an idea
of the size of the 2014 REF, there were 1911 submissions (departments, school, research
centers, etc.), 50,000 staff, 191,000 research outputs (articles, monographs, books, book
chapters, etc.), and 7000 impact case studies. Each research output was read and graded on
a 5-point scale (0*–4*), where 0* indicated no research content and 4* indicated ‘‘world
leading’’ quality. The case studies and the environment statements were graded on the
same scale. A profile was then created for each unit showing the proportion of 1*–4*
outputs in each of the three categories. These were combined into a weighted average to
arrive an overall profile for each unit. Unit-level profiles were independent of the number
of staff submitted. The Times Higher subsequently calculated a mean value across units to
give a final GPA for each institution.
Profiles were highly sensitive to the proportion of staff submitted. If a unit or institution
submitted only its very best researchers with top quality research outputs, it could easily
increase its GPA. Conversely, if a unit submitted a large proportion of its staff, it usually
resulted in a lower GPA. A considerable number of departments and universities played
this game very seriously, which our results clearly show. Because of the weakness with
GPA, two other measures were calculated and used for rankings. One, called ‘‘power’’, is
the GPA multiplied by the actual number of staff submitted. This serves as a measure of
the research contribution of a given unit/institution. The other, called ‘‘intensity’’, is the
GPA multiplied by the percentage of eligible staff submitted. Units/institutions with full
submissions, i.e., non-selective, did well on intensity and power if they were large, but
usually poorly on GPA. Those that were highly selective did well on GPA but poorly on
power and intensity. Clearly, this is not an ideal situation, given that there was confusion
about which ranking to use and institutions could cherry-pick which one was best for them
(Mingers and White 2015)
Exploration of the data
We have two main types of data, that from the REF which has three variables—GPA,
intensity, and power—and that from GS citations—mean and median total citations, mean
and median 5-year citations, and mean and median 5-year h-index. Further explanation and
summary statistics are shown in Table 1.
It is noticeable that the GS citation variables are all highly skewed even though the
values are the means for 50 academics. Median citations are significantly lower than the
Table 2 shows the Pearson correlation coefficients and Fig. 1 the matrix of scatter plots
for each research quality indicator. For reasons of clarity, we only show in Fig. 1 one of the
GS citation measures (mean citations) together with the REF metrics. One obvious feature
of many of the plots is their non-linearity since citation data is generally highly skewed.
The exception is Power which is itself exponential resulting in an overall linear
relationship to citations. This will be examined further later.
Looking at the correlations, we can see the following main features:
The correlations among GS citation measures are all very high (the lowest being 0.90
between median citations and mean 5-year h-index).
Correlations among the REF measures are less strong (0.625–0.772).
REF quality score (1*–4*)
GPA 9 no. of staff submitted
GPA 9 % staff submitted
Mean total citations for top
Median of above
Mean citations for top 50
researchers over last 5 years
Median of above
Mean h-index of top 50
over last 5 years
Median as above
The Power measure shows a strong correlation with all GS citation metrics since all
these metrics factor in the number of high-quality research outputs.
Intensity shows a stronger correlation with the citation metrics than it does with GPA
indicating that this metric better accounts for quantity than pure GPA.
The h-indices are more highly correlated than the other citation metrics with the REF
6. In spite of generally high correlations observed (0.625–0.996), the different metrics
nonetheless yield very different rankings as is generally the case
(Mingers and Yang
We have three REF indicators and six GS indicators so in order to make a sensible
comparison we need to focus on selected ones. The REF indicators each measure different
things—pure research quality independent of size (GPA), research quality relative to the
proportion of research staff (intensity), and total research contribution (power)—while the
GS metrics are essentially different measures of the same thing so it makes the most sense
to reduce these. The h-index is a derived indicator and subject to various criticisms
2010; Franceschini and Maisano 2010; Gingras 2014, 2016)
so we chose to drop these. It
also seems sensible to pay more attention to recent research (at least for the purposes of the
REF) and so we chose to drop total citations, leaving the 5-year citations. Finally, because
of the extreme skewness for 5-year citations, it was deemed better to use the median rather
than the mean, thus leaving 5-year median citations as the primary GS indicator.
Table 3 shows the rankings for the top 25 UK universities according to the three
research indicators (the full Table is available from the online version of the paper).
Table 3 is ordered with respect to the GS ranking produced using the median 5-year
citations. Prima facie, having Cambridge and Oxford as the top two universities seems
reasonable, as almost all world-wide rankings have these as the top UK universities and it
would seem rather strange for the REF rankings not to have them located there. Next come
Imperial and UCL, which are two major science-based universities in the UK. The rest of
the top 10 more or less follow the power ranking, which is what one would expect from the
correlation results presented earlier. In 5th place is Southampton, another science-intensive
university, which also 8th for intensity and 11th for power based on its large submission to
the REF. Manchester, a very large university, was significantly lower on both GPA and
intensity. This is slightly anomalous but suggests that its sheer size overcomes some
weakness with regard to research quality and the percentage of staff submitted.
The London School of Economics (LSE) is an interesting case which we will discuss
later when we consider normalizing for disciplinary mix. This is a high-quality institution
but has no science, and science subjects tend to generate higher levels of citations. It was
3rd for GPA and 7th for intensity but only 28th for power, reflecting its small size (only
532 staff submitted, one of the lowest in the top 30). In terms of GS citations, it came 12th,
which is much higher compared to its power ranking. Looking further down the Table, we
can find other instances where the power measure is affected by sheer size, for instance
Nottingham which is 7th on power but in the 20’s for the other indicators.
There are some interesting cases that do not appear in Table 3 because they are not in
the top 30 for GS citations. Perhaps the most glaring is Cardiff, which had a very selective
cites adjusted for
submission (as the VC made clear in his statement about REF strategy). This led to them
coming 6th out of all UK universities based on GPA. More realistically, it was 18th for
power and only 50th for intensity. Cardiff came 34th for GS citations. Conversely, Brunel
did very poorly on GPA, coming 75th because it had a full submission, in spite of being
40th for intensity and 33rd for both power and citations.
The position of very specialized institutions is also interesting. Although the main REF
results exclude pure single-subject institutions such as art and drama schools or veterinary
colleges, they do include, for example, the Institute of Cancer Research (ICR) and the
London School of Hygiene and Tropical Medicine (LSHTM), both of which are very small
and only entered in two REF units for assessment. ICR was actually top overall on the GPA
measure, which does seem peculiar given its limited coverage of subjects. It also came top
for intensity, although it only submitted 103 staff. In contrast, it came 87th for power and
52nd for GS citations.
The final row of the Table shows the mean absolute difference in ranks (among all 130
universities) between the REF and GS citation metrics. It further confirms that the GS
citation metric is closest to power.
Taking account of subject mix
As mentioned previously, it is a notable feature of citations that volumes differ
significantly between different subjects with science, especially medicine and biology, producing
very much greater levels of citations that social science and even more so arts and
humanities. A separate but related problem is that citation sources such as Web of Science
and Scopus have a greater coverage of the sciences, a problem that Google Scholar does
not suffer from.
This means that in order to be able to compare papers, journal or people from different
disciplines, the data has to be normalized to account for the disciplinary differences. When
looking at an institutional level, normalization does not usually take place on the
assumption, at least presumably, that any university will have a wide coverage. In truth,
this is not the case. For instance, within the UK, Imperial College is almost entirely science
with no arts or social science, while LSE is the opposite with only social science. It might,
therefore, be expected that indicators based on citations will be biased in favor of the
science-intensive universities and that the data ought to be corrected for this bias.
The first issue that arises is determining the extent of the mix of science and
nonscience. One could easily estimate this from the REF data. Overall, the REF was split into
36 field-based sub-panels that were amalgamated into four main panels—Panel A covered
medicine, biology and agriculture, Panel B covered science and engineering, Panel C
covered social science, and Panel D the arts and humanities. So, broadly speaking, Panels
A and B were science and C and D were non-science. Since the REF data included the
number of staff submitted by each university to each panel, we calculated the percentage of
each submission that was classified as science.
We could then see what proportion of the GS citations of an institution could be
explained by the percentage of science using linear regression. A high degree of
nonlinearity was observed between percent science (independent variable) and GS citations
(dependent variable), owing to the large underlying skewness of citations. To correct for
this, we took the natural logarithm of GS citations, i.e., loge(citations). This, however, gave
rise to a separate problem. Specifically, based on median citations, a number of institutions
had values of zero, which is undefined for natural log transform. Consequently, it was
is more similar to the GS citation ranking. What does seem clear from all this is that the
REF peer review appears to favor science-intensive universities over others.
In this paper, we explore the extent to which readily available citation data from Google
Scholar can be used to form a credible evaluation of a university’s research and compare
this with the UK Research Excellence Framework results.
We argue that the GS citations ranking is credible in that there are no glaring instances
where a university is clearly misplaced. More than that, we suggest that in many ways the
resultant ranking is actually superior to that produced by the REF, as well as hugely less
costly and time-consuming.
The REF produced three distinct rankings, each of which has very definite biases that
can and were exploitable by universities. GPA favored those who were very selective in
their submissions, a strategy that has the serious potential to affect the careers of many
staff. Moreover, it generated highly anomalous results in some cases, including Cardiff
University, which came 6th overall. Intensity went in the opposite direction, favoring those
institutions who submitted a high proportion of staff even when overall research quality
was less strong. Finally, power favored those with a large submission, whether or not
quality was high or very intensive. Examples of these are detailed in the paper.
In contrast, the ranking based on GS citations steers a middle path between GPA,
intensity, and power. Although most similar to power, because it does take into account the
absolute amount of research, it does not favor pure size to such an extent, as the examples
of LSE and Nottingham show. In fact, it could be argued that all of the REF rankings,
produced at such a cost, have significant flaws.
Additionally, using GS citations does not allow the kind of game playing that the REF
does in terms of the proportion of staff submitted. No doubt citations can be massaged, but
not in such a direct way that would lead to the anomalies observed with Cardiff. It also
does away with the need for subjective decisions to be made about which staff and outputs
to select and by extension the pernicious effects of using journal ranking lists, like the
Chartered Association of Business Schools, CABS
(Mingers and Willmott 2013)
. More to
the point, the overall approach is highly transparent, being based on publicly available data
and easy to understand metrics.
Our current work also investigated the possibility of normalizing the data based on the
mixture of subjects at each university through the use of linear regression, i.e., to account
for much higher citation rates observed in the fields of science and medicine. The resulting
ranking did correct for this although the results could be considered extreme.
In terms of limitations, this paper is only a preliminary investigation of the feasibility of
using GS citation data for judging research quality. Further research should look at the
effects of using different numbers of staff—more or less than 50; a wider, possibly
international, comparison involving a larger number of universities, and other non-REF
ranking systems; and a detailed investigation into the accuracy and biases of GS data itself.
It might also be possible to obtain lists of staff from Universities to avoid the GS errors.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution,
and reproduction in any medium, provided you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license, and indicate if changes were made.
Adler , N. , & Harzing , A.-W. ( 2009 ). When knowledge wins: Transcending the sense and nonsense of academic rankings . Academy of Management Learning and Education , 8 ( 1 ), 72 - 95 .
Adriaanse , L. , & Rensleigh , C. ( 2013 ). Web of Science, Scopus and Google Scholar . The Electronic Library , 31 ( 6 ), 727 - 744 .
Bornmann , L. , & Marx , W. ( 2015 ). Methods for the generation of normalized citation impact scores in bibliometrics: Which method best reflects the judgements of experts? Journal of Informetrics , 9 ( 2 ), 408 - 418 .
Bornmann , L. , Mutz , R. , & Daniel , H. D. ( 2008 ). Are there better indices for evaluation purposes than the h index? A comparison of nine different variants of the h index using data from biomedicine . Journal of the American Society for Information Science and Technology , 59 ( 5 ), 830 - 837 .
Bornmann , L. , Thor , A. , Marx , W. , & Schier , H. ( 2016 ). The application of bibliometrics to research evaluation in the humanities and social sciences: An exploratory study using normalized Google Scholar data for the publications of a research institute . Journal of the Association for Information Science and Technology , 67 ( 11 ), 2778 - 2789 .
Broadbent , J. ( 2010 ). The UK research assessment exercise: Performance measurement and resource allocation . Australian Accounting Review , 20 ( 1 ), 14 - 23 .
Daraio , C. , Bonaccorsi , A. , & Simar , L. ( 2015 ). Rankings and university performance: A conditional multidimensional approach . European Journal of Operational Research , 244 ( 3 ), 918 - 930 .
Delgado-Lo´ pez-C o´zar, E., & Cabezas-Clavijo , A ´ . ( 2012 ). Google Scholar metrics: An unreliable tool for assessing scientific journals . El profesional de la informacio´n , 21 ( 4 ), 419 - 427 .
Egghe , L. ( 2010 ). The Hirsch index and related impact measures . Annual Review of Information Science and Technology , 44 ( 1 ), 65 - 114 .
Else , H. ( 2015 ). REF 2014 cost almost £250 million . Times Higher Education, 13 July 2015 . https://www. timeshighereducation.com/news/ref-2014 -cost- 250 -million. Accessed 25 Sept 2017 .
Franceschini , F. , & Maisano , D. A. ( 2010 ). Analysis of the Hirsch index's operational properties . European Journal of Operational Research , 203 ( 2 ), 494 - 504 .
Garc´ ıa-Pe´rez, M. A. ( 2010 ). Accuracy and completeness of publication and citation records in the Web of Science, PsycINFO, and Google Scholar: A case study for the computation of h indices in Psychology . Journal of the American Society for Information Science and Technology , 61 ( 10 ), 2070 - 2085 .
Gingras , Y. ( 2014 ). Criteria for evaluating indicators . In B. Cronin & C. Sugimoto (Eds.), Beyond bibliometrics: Harnessing multidimensional indicators of scholarly impact (pp. 110 - 125 ). London: MIT Press.
Gingras , Y. ( 2016 ). Bibliometrics and research evaluation: Uses and abuses . Cambridge, MA: MIT Press.
Gla ¨nzel, W. , & Debackere , K. ( 2009 ). On the 'multi-dimensionality'of rankings: some methodological and mathematical questions to be solved in university assessment . Paper presented at the international symposium on university ranking. Leiden, Netherlands.
Harzing , A.-W. ( 2013 ). A preliminary test of Google Scholar as a source for citation data: A longitudinal study of Nobel prize winners . Scientometrics , 94 ( 3 ), 1057 - 1075 .
Harzing , A.-W. ( 2014 ). A longitudinal study of Google Scholar coverage between 2012 and 2013 . Scientometrics, 98 ( 1 ), 565 - 575 .
Harzing , A.-W. , & Alakangas , S. ( 2016 ). Google Scholar, Scopus and the Web of Science: A longitudinal and cross-disciplinary comparison . Scientometrics , 106 ( 2 ), 787 - 804 .
Hazelkorn , E. ( 2015 ). Rankings and the reshaping of higher education: The battle for world-class excellence . berlin: Springer.
Holmes , R. ( 2013 ). Watching the rankings . Seattle: Createspace Independent Publishing Platform.
Mart´ın-Mart´ın, A. , Ordun˜a- Malea , E. , Ayllo´n, J. M. , & L o´ pez-Co´zar , E. D. ( 2014 ). Does Google Scholar contain all highly cited documents ( 1950 -2013)? arXiv preprint arXiv: 1410 . 8464 .
Millot , B. ( 2015 ). International rankings: Universities vs higher education systems . International Journal of Educational Development , 40 , 156 - 165 .
Mingers , J. , & Leydesdorff , L. ( 2015 ). A review of theory and practice in scientometrics . European Journal of Operational Research , 246 ( 1 ), 1 - 19 .
Mingers , J. , & Lipitakis , E. ( 2010 ). Counting the citations: A comparison of Web of Science and Google Scholar in the field of management . Scientometrics , 85 ( 2 ), 613 - 625 .
Mingers , J. , & White , L. ( 2015 ). Throwing out the baby with the bathwater: The undesirable effects of national research assessment exercises on research . arXiv preprint arXiv:1502 . 00658 .
Mingers , J. , & Willmott , H. ( 2013 ). Taylorizing business school research: On the ''one best way'' performative effects of journal ranking lists . Human Relations , 66 ( 8 ), 1051 - 1073 .
Mingers , J. , & Yang , L. ( 2017 ). Evaluating journal quality: A review of journal citation indicators and ranking in business and management . European Journal of Operational Research , 257 ( 1 ), 323 - 337 .
Moed , H. ( 2008 ). UK research assessment exercises: Informed judgements on research quality or quantity? Scientometrics , 74 ( 1 ), 153 - 161 .
Prins , Ad A M, Costas , R., van Leeuwen, T. N. , & Wouters , P. F. ( 2016 ). Using Google Scholar in research evaluation of humanities and social science programs: A comparison with Web of Science data . Research Evaluation , 25 ( 3 ), 264 - 270 .
Rebora , G. , & Turri , M. ( 2013 ). The UK and Italian research assessment exercises face to face . Research Policy, 42 ( 9 ), 1657 - 1666 .
Waltman , L. , & van Eck , N. ( 2013 ). A systematic empirical comparison of different approaches for normalizing citation impact indicators . Journal of Informetrics , 7 ( 4 ), 833 - 849 .
Xu , F. , Liu , W. B. , & Mingers , J. ( 2015 ). New journal classification methods based on the global h-index . Information Processing and Management , 51 ( 2 ), 50 - 61 .