The h-index as an almost-exact function of some basic statistics

Scientometrics, Sep 2017

As is known, the h-index, h, is an exact function of the citation pattern. At the same time, and more generally, it is recognized that h is “loosely” related to the values of some basic statistics, such as the number of publications and the number of citations. In the present study we introduce a formula that expresses the h-index as an almost-exact function of some (four) basic statistics. On the basis of an empirical study—in which we consider citation data obtained from two different lists of journals from two quite different scientific fields—we provide evidence that our ready-to-use formula is able to predict the h-index very accurately (at least for practical purposes). For comparative reasons, alternative estimators of the h-index have been considered and their performance evaluated by drawing on the same dataset. We conclude that, in addition to its own interest, as an effective proxy representation of the h-index, the formula introduced may provide new insights into “factors” determining the value of the h-index, and how they interact with each other.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://link.springer.com/content/pdf/10.1007%2Fs11192-017-2508-6.pdf

The h-index as an almost-exact function of some basic statistics

Journal ranking The h-index as an almost-exact function of some basic statistics Lucio Bertoli-Barsotti 0 Tommaso Lando 0 Lambert 0 JEL Classification C 0 0 Mathematical Subject Classification 62P99 As is known, the h-index, h, is an exact function of the citation pattern. At the same time, and more generally, it is recognized that h is ''loosely'' related to the values of some basic statistics, such as the number of publications and the number of citations. In the present study we introduce a formula that expresses the h-index as an almost-exact function of some (four) basic statistics. On the basis of an empirical study-in which we consider citation data obtained from two different lists of journals from two quite different scientific fields-we provide evidence that our ready-to-use formula is able to predict the h-index very accurately (at least for practical purposes). For comparative reasons, alternative estimators of the h-index have been considered and their performance evaluated by drawing on the same dataset. We conclude that, in addition to its own interest, as an effective proxy representation of the h-index, the formula introduced may provide new insights into ''factors'' determining the value of the h-index, and how they interact with each other. h-Index; W function - Department of Management, Economics and Quantitative Methods, University of Bergamo, Via dei Caniana 2, 24127 Bergamo, Italy Department of Finance, VSˇ B -TU Ostrava, Sokolska` 33, 70121 Ostrava, Czech Republic Introduction The purpose of this paper is to present a formula with which to determine (estimate) the hindex, h, under incomplete information conditions (IIC). By IIC we mean the situation in which, for different kinds of reasons, we do not know the whole set of citation data, the entire citation profile that would allow us to obtain the actual exact value of the h-index. This is the case, for example, when only few ‘‘basic’’ citation statistics (other than the hindex) are published, or known to us. To be concrete, we will refer to simple citation indicators—to use the words of Hirsch (2005) , ‘‘single-number criteria commonly used to evaluate scientific output’’—as: 1. total number of citations C; 2. total number of citations for the t (t 2 f1; 2; 3; . . .g) most-cited publications, Ct; thus, Ct ¼ Pit¼1 cðiÞ, where cðiÞ represents the number of citations to publication i, and where publications are ranked in decreasing order of the number of citations: cð1Þ cð2Þ cðT Þ. 3. total number of publications T ; 4. total number of ‘‘significant’’ publications, that is, those with at least a predetermined number of citations k each (k 2 f1; 2; 3; . . .g), Tk. In this paper we focus on these indicators in their simplest versions, that is: C, C1, T and T1. The purpose of the analysis is twofold: to estimate the h-index (when it cannot be determined directly from the data) and hence at the same time to identify the main factors which influence the level of the h-index. A crucial question is therefore the extent to which the h-index can be satisfactorily predicted from knowledge of only the above basic statistics—i.e. under IIC. More formally, we are searching for a formula h ¼ h^ðS1; . . .; SrÞ; ^ 1 r 4, Sj 2 S, 1 j r, where S ¼ fC; C1; T ; T1g. To be noted is that the formula h^ can be interpreted as a genuine estimator of the h-index, h, i.e. h^ ffi h, because it does not depend on values of unknown parameters. Possible estimators under IIC of the h-index can be found in the literature: A very simple proxy for the h-index is given by hH ¼ pffiCffiffiffi=ffiffiaffiffi. This model, which can be traced back to Hirsch (2005) , is not a genuine estimator of the h-index because hH is still a function of an unknown parameter, a, and it is not specified (by the formula itself) how to estimate this parameter in terms of the above basic statistics. Nevertheless, an estimator for the h-index can be obtained by substituting the unknown parameter a with a fixed constant (Hirsch found ‘‘empirically’’ that a lay between 3 and 5). Redner (2010) found that ‘‘pffiCffiffi is essentially equivalent to the hindex, up to an overall factor that is close to 2’’ (put otherwise, he found that the distribution ratio pffiCffiffi=2h has an empirical distribution ‘‘sharply peaked about 1’’). This suggests the approximating formula h^ ¼ hR ¼ pffiCffiffi=2 with r ¼ 1, S ¼ fCg, which we could then call the Redner formula—probably the simplest estimator of the h-index, under IIC. ð1Þ ð2Þ – While hR is a model-free proxy for the h-index, more elaborate solutions has been attempted in the literature by assuming specific probabilistic distributions for the citation rate. For example, a formula that follows model (1), with r ¼ 4, has been recently introduced by Bertoli-Barsotti and Lando (2017) , h ¼ h~ð1Þ ^ W ¼ log 1 1 where m~1 ¼ ðC C1Þ=ðT1 1Þ is nothing but a ‘‘trimmed’’ version of the simple sample mean C=T1, and where Wð Þ represents the so-called Lambert-W function (Corless and Jeffrey 2015) . The Lambert-W function is the function W ðzÞ satisfying z ¼ W ðzÞeWðzÞ, and can be currently computed using mathematical software, for example the Mathematica software package (Wolfram Research, Inc. 2014) , or the R statistical computing environment (R Development Core Team 2012) . The use of a ‘‘trimmed’’ version of the sample mean is a simple technique with which to make the sample mean more robust with respect to a single outlier—a single highly-cited paper that could substantially inflate the mean, as is well known. Formula h~ð1Þ r ¼ 4; S ¼ fC; C1; T ; T1gÞ is based on the assumption that the citation W ð rate of papers (cited at least once) follows a shifted-geometric distribution (SGD) with parameter Q ðQ [ 1Þ with probability function pðyÞ ¼ Q yðQ 1Þy 1, y ¼ 1; 2; . . .; pðyÞ represents the probability of observing the number of citations y of a paper (cited at least once), while Q represents the expectation of the SGD. Then, n^ðyÞ ¼ TpðyÞ expresses the ‘‘expected’’/estimated number of articles with y citations. – – As an alternative approach, an important class of models is the one defined by the formula ^ h ¼ c0C2=3T 1=3 ð4Þ where c0 is a fixed and known positive constant (Schubert and Gla¨nzel 2007) . From model (4), specific ready-to-use formulas are obtained by taking, in particular: (a) c0 ¼ 4 1=3 (Iglesias and Pecharroman 2007; see also Ionescu and Chopard 2013; Panaretos and Malesios 2009; Vinkler 2009, 2013) , (b) c0 ¼ 0:75 (Schubert and Gla¨nzel 2007), (c) c0 ¼ 1 Prathap (2010a, b). Following the notation of Bertoli-Barsotti and Lando (2017) , let hSGðc0Þ ¼ c0C2=3T 1=3. Note that these formulas are functions of the data only through two out of the four basic statistics (r ¼ 2, S ¼ fC; T g), and they are based on the assumption of a continuous-type distribution. The formula hSGð1Þ is also known as the ‘‘p-index’’ (Prathap 2010a, b) . Another approach which deserves mention for completeness, even if it does not yield a ready-to-use formula, is that proposed by Iglesias and Pecharroman (2007) . Adopting a different perspective, i.e. the rank-size formulation, and starting from the assumption that the number cðkÞ of citations of the paper of rank k, is approximately distributed following a stretched exponential type PDF f ðk; g; bÞ ¼ Cg1=bC 1 þ b 1 1exp gkb ; k [ 0; ð5Þ (not to be confused with a Weibull PDF, see below), Iglesias and Pecharroman suggest deriving a formula for the h-index as the solution of the equation Interestingly, the solution may be derived in closed form (even if authors did not realize this) by means of the Lambert-W function. Unfortunately, this solution still depends on the value of an unknown free parameter, specifically b [see their Eqs. (16) and (17)]. Hence, their formula could become a genuine estimator of the h-index—of the form h^ ¼ h^ðC; T; T1Þ, r ¼ 3—only by constraining the unknown parameter b to assume a fixed (but arbitrary) value b0. ð6Þ ð7Þ ð8Þ ð9Þ ð10Þ ð11Þ A new formula for the h-index under the Weibull assumption Let NðyÞ be the empirical citation distribution function, i.e. the function giving the number of papers which have been cited y times at most. Then, in particular, nðyÞ ¼ NðyÞ Nðy 1Þ, for y ¼ 1; 2; . . ., nð0Þ ¼ Nð0Þ, is the number of papers that have been cited exactly y times. We assume that the citation rate of a paper is a random variable X that is distributed as a two-parameter Weibull distribution, with CDF Fðx; a; bÞ ¼ 1 exp axb , x [ 0, and 0 otherwise, where a [ 0 and b [ 0. The probability density function is then f ðx; a; bÞ ¼ abxb 1 exp axb ; for x [ 0, and 0 otherwise. The Weibull distribution is a rather flexible model: the PDF is reverse J-shaped for b 1 and bell-shaped otherwise. Since our assumption involves a continuous distribution, a suitable discretization rule is needed. In particular, for every y, y ¼ 0; 1; 2; . . ., let T exp ayb express the ‘‘expected’’ number of articles with at R yþ1 f ðx; a; bÞdx ¼ T ðFðy þ 1; a; bÞ T y least y citations. Hence, n^ðyÞ ¼ Fðy; a; bÞÞ represents the expected number of articles with y citations exactly, and N^ðyÞ ¼ TFðy þ 1; a; bÞ the expected number of papers which have been cited y times at most. As a special case, Fð1; a; bÞ Fð0; a; bÞ ¼ 1 e a can be interpreted as a model for the so-called uncitedness factor, T TT1 ¼ nðT0Þ (Hsu and Huang 2012; see also Egghe 2013; Burrell 2013) . A Weibull model for the h-index is then yielded by the solution of the equation Replacing axb with t in the equation, we have Thus, replacing bt with s, we obtain the equivalent equation T exp axb ¼ x; s ¼ W abTb and, since x ¼ Hence, by definition of the above mentioned Lambert-W function, we find the solution asb 1=b, we finally arrive at the formula x ¼ W abT b ab 1=b : An empirical counterpart of the above theoretical model for the h index may now be obtained by substituting the parameters a and b with estimates, a and b , based on suitable functions of the citation data only through the basic statistics C; C1; T and T1. This can be done firstly by using the uncitedness factor to derive the equation 1 e a ¼ T TT1, that can be solved (under the assumption 0\T1\T ) for the variable a as as an estimate of parameter a, and secondly, by using the trimmed sample citation rate, as an estimate of the expectation of X, that is E X 1 ð Þ ¼ gða; bÞ ¼ a 1=bC 1 þ b [ 0. Note that, by construction, our approximation slightly overestimates the true average number of citations, so that a correction for continuity by one-half is needed. We then find b as the solution (method of moments) of the equation that can be solved numerically. It should be noted that the existence and uniqueness of the solution of Eq. (15) are not always warranted a priori. Indeed, it can be proved that the necessary and sufficient condition for existence and uniqueness of the solution is m [ 1 (see ‘‘Appendix’’). We should then consider ‘‘out of range’’ the cases where m 1, and exclude them from the analysis. With a and b replaced by a ¼ a ðT ; T1Þ and b ¼ b ðC; C1; T Þ in formula (12) one finally obtains (r ¼ 4, S ¼ fC; C1; T ; T1g) ^ h ¼ hWW ¼ W a b T b a b !1=b ; ð12Þ ð13Þ ð14Þ ð15Þ ð16Þ where the suffix WW is motivated by the fact that the formula is based on a Weibull distribution and on the Lambert-W function. Analysis Two datasets This section empirically investigates the effectiveness of formula hWW as an estimate of the actual value of the h-index, h. We will compare estimates derived from hWW with the real values of the h-index. In order to facilitate possible comparisons with other formulas (see below), we choose to use the same two datasets as in Bertoli-Barsotti and Lando (2017) , where the authors present an empirical study based on citation data obtained from two different sets of journals belonging to two different scientific fields: (1) the S&MM list and (2) the EE&F list. S&MM list The former dataset includes the 231 journals as selected from a former list of 568 journals identified as important (in the opinion of a group of experts) in the area ‘‘Statistics and Mathematical Methods’’ (S&MM). Overall, the S&MM dataset included 485,628 citations of 99,409 publications from these journals (for details see Bertoli-Barsotti and Lando 2017) . For each journal, the actual value h of the h-index was computed—on the basis of citations retrieved from the Scopus database in last week of December 2015—as the largest number of papers published in the journal between 2010 and 2014 and which obtained at least h citations each, from the time of publication until December 2015. Thus, citation data referred to a 6-year citation window, 2010–2015, and a 5-year publication window, 2010–2014. The four basic statistics C, C1, T and T1 were derived as well. The list of the 231 journals in the S&MM dataset is reported in Table 1. EE&F list The second dataset included the 100 journals (with a minimum number of 50 publications) top ranked according to the Scopus Impact per Publication (IPP; the IPP is defined as the ratio of citations in a year to papers published in the three previous years divided by the number of papers published in those same years) in 2014, within the Scopus subject area of ‘‘Economics, Econometrics and Finance’’ (EE&F). The citation data of all 100 journals in the EE&F list were retrieved during the last week of April 2016. The dataset obtained included 19,889 publications receiving a total of 74,096 citations. In this case, differently from the above dataset, in order to obtain citation and publication windows as similar as possible to those employed for the computation of the IPP 2014 by Scopus, the citations used were those received during 2014 of papers published within the previous 3 years 2011–2013 (for further details see Bertoli-Barsotti and Lando 2017) . For each journal the actual value h of the h-index was then computed as the largest number of papers published in the journal between 2011 and 2013 and which obtained at least h citations each in the year 2014. The list of the journals in the EE&F dataset is reported in Table 2. Estimation of the h-index with the formula hWW where b c is the floor function (recall that the floor function of x gives the greatest integer less than or equal to x). Note that, from an operational point of view, all estimating formulas (1) generate real numbers. However, for estimation purposes, these numbers should be rounded-off to the nearest integer, not only in order to produce numbers in the same range of values as the h-index but also to avoid ‘‘false precision’’. (Hicks et al. 2015) . To give an example illustrating the calculation of this estimate, let us consider the case of the Journal of the American Statistical Association (ISSN 0162-1459, from the S&MM list). We have C ¼ 5231; C1 ¼ 156; T ¼ 663 and T1 ¼ 519. Hence which yields the solution b we finally conclude that A comparative analysis of the accuracy To verify the accuracy of formula hWW , comparatively, we considered, among several possible ready-to-use formulas, the following ones among those defined above: h~ðW1Þ, hhWW i 22,5 20,0 17,5 10,0 7,5 ð23Þ hhRi 1 and Redner (2010) , for formula hR]. To measure the magnitude of the observed accuracy, for each of the six estimation formulas respectively numbered as: (1) hWW , (2) h~ðW1Þ, (3) hSGð0:63Þ, (4) hSGð0:75Þ, (5) hSGð1Þ, (6) hR, we calculated the absolute relative error (ARE) of the estimator h^jðiÞ of the actual h-index, hj, for each journal j, j ¼ 1; . . .; J, AREjðiÞ ¼ h^jðiÞ hj hj ; where h^jðiÞ ¼ bh^jðiÞ þ 0:5c is the rounded-off version of formula i, i ¼ 1; 2; . . .; 6, then, as a criterion with which to assess the overall quality of the formula, we computed the mean absolute relative error (MARE), hhSGð0:63Þi hhSGð0:75Þi hhSGð1Þi 2 MARE h^ðiÞ ¼ AREjðiÞ=J: J X j¼1 ð24Þ The results are summarized in Table 3. Conclusion This paper has addressed the need to gain better understanding of how simple citation metrics are related to the h-index, or rather, to a ‘‘good’’ proxy representation of the h index. This also responds to the more basic requirement of ‘‘building bridges’’ between different types of known and available measures of impact/impact indicators—under IIC. Differently from other studies (that consider the problem of defining a ‘‘model’’ of the h-index), our concern has not been to estimate the parameters (sometimes even considered at the unit level, i.e. single journal, or single scientist; see e.g. Petersen et al. 2011) of a parametric model for the h-index under the assumption of knowing the entire citation pattern; rather, we addressed the quite different and more practical problem of finding a proxy representation of h through a universal formula that only depends on few summary statistics of the data. The formula hWW is ‘‘universal’’ in the sense that it gives a proxy representation of h that holds for any given journal and any dataset. The issue of determining an indicator under IIC is closely related to the search for a solution of the problem of recovering and comparing impact indicators from different databases. As a simple but significant example of this issue, we may cite the specific problem of determining/estimating the IF for journals using the Google Scholar-based hindex as a predictor (Bertocchi et al. 2015) . As confirmed in our case study analysis, the h-index can be viewed as an almost-exact function of C; C1; T and T1, through hWW , i.e. that the basic statistics C; C1; T and T1 provide salient information for the evaluation of the h-index with high precision. In practice, while computation of the h-index h requires knowledge of the entire citation profile (or at least large part of it, e.g. the so-called h-core), formula hWW requires knowledge of only a few elementary summary statistics, but reproduces the actual value of h quite well. In truth, in our computations we found that the estimates yielded by hWW were slightly biased downwards for quite high values of the h-index but, as can be seen from Table 3, overall the formula hWW yields very accurate approximations to the empirical value of the h-index, with values of the MARE ranging around 5–6%, not too dissimilar from those obtained by formula h~ðW1Þ (Bertoli-Barsotti and Lando 2017) . Both formulas h~ð1Þ W and hWW exhibit comparable levels of accuracy (the advantages of the formula h~ðW1Þ, as compared to formula hWW , may be that: (i) it yields an explicit expression of the basic indicators C; C1; T and T1, while the latter not, and (ii) it is based on a simpler probabilistic model). Even though the Pearson correlation, q, is not an adequate measure of the accuracy of the estimation and should not be used to compare the effectiveness of the different estimators considered (and this is the reason why this concept has been banished from this study), for the sake of completeness we point out that: (1) for the S&MM dataset (230 journals), we found q h; hWW ¼ 0:99, q h; h~ðW1Þ ¼ 0:98, qðh; hSGÞ ¼ 0:98 and qðh; hRÞ ¼ 0:96; (2) for the EE&F dataset we found q h; hWW ¼ 0:97, q h; h~ðW1Þ ¼ 0:98, qðh; hSGÞ ¼ 0:97 and qðh; hRÞ ¼ 0:90. Ultimately, despite the differences between the Acknowledgements Funding was provided by Czech Science Foundation (Grant No. 17-23411Y). Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Appendix Conditions for existence and uniqueness of a solution of Eq. (15) For every fixed a ¼ a [ 0, gða ; bÞ ! þ1 as b ! 0 and gða ; bÞ ! 1 as b ! þ1. Moreover, since oob gða ; bÞ ¼ gðab2; bÞ log a d log CðzÞ ¼ where w is the digamma function, i.e. the function defined by wðzÞ ¼ dz C0ðzÞ=CðzÞ (see Johnson et al. 2005, pp. 8–9) , we find that the inequality holds if and only if it holds o ob gða ; bÞ\0 log a every C0ð1Þ ffi 0:5772), at þ1. b [ 0 if and only if ð25Þ ð26Þ ð27Þ ð28Þ (a) sign from negative to positive at b ¼ b0, for some b0 [ 0; hence gða ; bÞ is strictly decreasing for every 0\b\b0, and strictly increasing for every b [ b0, and the point b0 is a global minimum for gða ; bÞ. Moreover since, as seen before, lim gða ; bÞ ¼ 1, then 0\gða ; b0Þ\1, and the limit at infinity is approached from b!1 below. We conclude that, in this case too, Eq. (15) has a unique solution if and only if m [ 1; conversely, if m 1 Eq. (15) may have two solutions, or no solution at all. In both cases (a) and (b), Eq. (15) has one and only one solution if and only if m Bertocchi , G. , Gambardella , A. , Jappelli , T. , Nappi , C. A. , & Peracchi , F. ( 2015 ). Bibliometric evaluation vs. informed peer review: Evidence from Italy . Research Policy , 44 , 451 - 466 . Bertoli-Barsotti , L. , & Lando , T. ( 2017 ). A theoretical model of the relationship between the h-index and other simple citation indicators . Scientometrics , 111 ( 3 ), 1415 - 1448 . Burrell , Q. L. ( 2013 ). A stochastic approach to the relation between the impact factor and the uncitedness factor . Journal of Informetrics , 7 , 676 - 682 . Corless , R. M. , & Jeffrey , D. J. ( 2015 ). The Lambert W Function . In N. J. Higham , M. Dennis , P. Glendinning , P. Martin , F. Santosa , & J. Tanner (Eds.), The Princeton companion to applied mathematics (pp. 151 - 155 ). Princeton: Princeton University Press. Egghe , L. ( 2013 ). The functional relation between the impact factor and the uncitedness factor revisited . Journal of Informetrics , 7 , 183 - 189 . Gla ¨nzel, W. ( 2006 ). On the h-index-A mathematical approach to a new measure of publication activity and citation impact . Scientometrics , 67 , 315 - 321 . Hicks , D. , Wouters , P. , Waltman , L. , De Rijcke , S. , & Rafols , I. ( 2015 ). The Leiden Manifesto for research metrics . Nature , 520 ( 7548 ), 429 . Hirsch , J. E. ( 2005 ). An index to quantify an individual's scientific research output . Proceedings of the National Academy of Sciences , 102 , 16569 - 16572 . Hsu , J. W. , & Huang , D. W. ( 2012 ). A scaling between impact factor and uncitedness . Physica A , 391 , 2129 - 2134 . Iglesias , J. , & Pecharroman , C. ( 2007 ). Scaling the h-index for different scientific ISI fields . Scientometrics , 73 , 303 - 320 . Ionescu , G. , & Chopard , B. ( 2013 ). An agent-based model for the bibliometric h-index . The European Physical Journal B , 86 , 426 . Johnson , N. L. , Kemp , A. W. , & Kotz , S. ( 2005 ). Univariate discrete distributions . New York: Wiley. Malesios , C. ( 2015 ). Some variations on the standard theoretical models for the h-index: A comparative analysis . Journal of the Association for Information Science and Technology , 66 , 2384 - 2388 . Panaretos , J. , & Malesios , C. ( 2009 ). Assessing scientific research performance and impact with single indices . Scientometrics , 81 , 635 - 670 . Petersen , A. M. , Stanley , H. E. , & Succi , S. ( 2011 ). Statistical regularities in the rank-citation profile of scientists . Scientific Reports , 1 , 181 . Prathap , G. ( 2010a ). Is there a place for a mock h-index? Scientometrics , 84 , 153 - 165 . Prathap , G. ( 2010b ). The 100 most prolific economists using the p-index . Scientometrics , 84 , 167 - 172 . R Development Core Team. ( 2012 ). R: A language and environment for statistical computing . Vienna: R Foundation for Statistical Computing. http://www.R-project. org. Redner , S. ( 2010 ). On the meaning of the h-index . Journal of Statistical Mechanics: Theory and Experiment , 2010 ( 03 ), L03005 . Schreiber , M. , Malesios , C. C. , & Psarakis , S. ( 2012 ). Exploratory factor analysis for the Hirsch index, 17 h-type variants, and some traditional bibliometric indicators . Journal of Informetrics , 6 , 347 - 358 . Schubert , A. , & Gla¨nzel, W. ( 2007 ). A systematic analysis of hirsch-type indices for journals . Journal of Informetrics , 1 , 179 - 184 . Vinkler , P. ( 2009 ). The p-index: A new indicator for assessing scientific impact . Journal of Information Science , 35 , 602 - 612 . Vinkler , P. ( 2013 ). Quantity and impact through a single indicator . Journal of the American Society for Information Science and Technology , 64 , 1084 - 1085 . Wolfram R. ( 2014 ). Mathematica 10.0 . Champaign , IL: Wolfram Research Inc.


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1007%2Fs11192-017-2508-6.pdf

Lucio Bertoli-Barsotti, Tommaso Lando. The h-index as an almost-exact function of some basic statistics, Scientometrics, 2017, 1-20, DOI: 10.1007/s11192-017-2508-6