The heterogeneity statistic I 2 can be biased in small meta-analyses
Hippel BMC Medical Research Methodology
2 The heterogeneity statistic I can be biased in small meta-analyses
Paul T von Hippel 0
0 Center for Health and Social Policy, LBJ School of Public Affairs, University of Texas , Austin, 2315 Red River, Box Y, Austin, TX 78712 , USA
Background: Estimated effects vary across studies, partly because of random sampling error and partly because of heterogeneity. In meta-analysis, the fraction of variance that is due to heterogeneity is estimated by the statistic I2. We calculate the bias of I2, focusing on the situation where the number of studies in the meta-analysis is small. Small meta-analyses are common; in the Cochrane Library, the median number of studies per meta-analysis is 7 or fewer. Methods: We use Mathematica software to calculate the expectation and bias of I2. Results: I2 has a substantial bias when the number of studies is small. The bias is positive when the true fraction of heterogeneity is small, but the bias is typically negative when the true fraction of heterogeneity is large. For example, with 7 studies and no true heterogeneity, I2 will overestimate heterogeneity by an average of 12 percentage points, but with 7 studies and 80 percent true heterogeneity, I2 can underestimate heterogeneity by an average of 28 percentage points. Biases of 12-28 percentage points are not trivial when one considers that, in the Cochrane Library, the median I2 estimate is 21 percent. Conclusions: The point estimate I2 should be interpreted cautiously when a meta-analysis has few studies. In small meta-analyses, confidence intervals should supplement or replace the biased point estimate I2.
Meta-analysis; Heterogeneity; Bias
-
Background
When different studies estimate the effect of a treatment
or exposure, the estimates will vary from one study to
another. Some of this between-study variance comes
from random sampling error, while some may come
from heterogeneity. There are several sources of
heterogeneity, including differences in the treatment, the
treated population, the study design, or the data analysis
method. When there is no heterogeneity, estimates are
said to be homogeneous and differ only because of
random sampling error.
Heterogeneity is very important. If the existing studies
of a treatment are homogeneous, or nearly
homogeneous, then there is some assurance that the treatment
will have a similar effect when applied to new subjects.
On the other hand, if the existing studies are very
heterogeneous, then unless the reasons for heterogeneity
are well understood, the effect of the treatment on new
subjects will be hard to predict [1].
Unfortunately, when studies are compared in a
metaanalysis, it is often difficult to say anything definitive
about heterogeneity. The reason for this difficulty is that
most meta-analyses are small. One summary of the
Cochrane Library reported that the median number of
studies per meta-analysis was 7 [2], another summary
reported that the median was 6 [3], and another reported
that the median was just 3 [3]. With so few studies, the
classical test for heterogeneity, Cochrans Q [4], is not
very informative because its result is as much a function
of the number of studies as it is of the amount of
heterogeneity. When the number of studies is large, Q will
often reject the null hypothesis even if the true extent of
heterogeneity is trivial, but if the number of studies is
small, Q provides little power to reject the null
hypothesis of homogeneity even if substantial heterogeneity is
present [5]. The power of Q and other homogeneity tests
is further reduced when the studies in the meta-analysis
are unbalanced in sizefor example, if one of the studies
in the meta-analysis is much larger than the others [5].
To better describe heterogeneity, Higgins and Thompson
[6] introduced the I2 statistic, which was meant to improve
in two ways on Cochrans Q. First, I2 is more interpretable
than Q; specifically, I2 estimates the proportion of the
variance in study estimates that is due to
heterogeneity. Second, unlike Q, I2 was meant to be independent
of the number of studies; regardless of the number of
studies, I2 ranges from 0 to 1 because it estimates a
proportion. The I2 statistic is now used not just in
metaanalysis but also in other analyses where we want to know
what fraction of the variance in a set of estimates is due to
heterogeneity [7-9].
I2 does not eliminate the uncertainty that comes from
having a small number of studies. No statistic can. In
small meta-analyses, for the same reason that Q has low
power, I2 is very imprecise. For example, if Q fails to
reject the null hypothesis of homogeneity, then the
confidence interval around I2 will usually include 0. In
metaanalyses from the Cochrane Library, the 95% confidence
interval around I2 typically runs approximately from 0 to
.60, implying that up to 60% of the between-study variance
could be due to heterogeneity, or there could be no
heterogeneity at all [2]. This is not a very informative
conclusion. Unfortunately, the uncertainty of the I2estimate is
not obvious to the typical reader of a meta-analysis
published in, for example, Epidemiology [10,11], the American
Journal of Epidemiology [12,13], or the Cochrane Library
[14]. These outlets do not report the confidence interval
around I2; they only report the point estimate I2, which
may give a false impression of precision.
In this note, we show that I2 is not just imprecise; it is
also biased. Depending on the circumstances, the bias of
I2 can be small or large, positive or negative, but the bias
is largest when the number of studies is small and the
true fraction of variance that is due to heterogeneity is
either very large or very small. For example, in
metaanalyses with 7 studies and no true heterogeneity, the I2
statistic will on average lead us to believe that
heterogeneity accounts for about 12% of the between-study
variance. At the other extreme, with 7 studies and 80%
of the variance due to heterogeneity, the I2 statistic can on
average lead us to believe that just 52% of the variance is
due to heterogeneity. These biases of 12 to 28 percentage
points are not trivial when one considers that, in the
Cochrane Library, the median I2 value is just 21% [2].
In the following sections, we calculate and illustrate
the bias of I2 and discuss implications for the statistics
reported in meta-analyses.
section introduces notation, assumptions, and statistical
properties, and describes the calculations that we
submitted to Mathematica. The Results section will give the
results of those calculations.
Meta-analysis
Meta-analysis summarizes the results of K studies, each
of which has sample size nk, k = 1,,K. In each study,
there is a true effect k estimated by ^k , with a true
standard error k estimated by ^k , or, equivalently, a true
variance k2 estimated by ^k2 . With large nk, the quantity
^k k =^k approaches a standard normal distribution
according to the central limit theorem.
(...truncated)