The heterogeneity statistic I 2 can be biased in small meta-analyses (pdf)

Article PDF cannot be displayed. You can download it here:

http://www.biomedcentral.com/content/pdf/s12874-015-0024-z.pdf

The heterogeneity statistic I 2 can be biased in small meta-analyses

Hippel BMC Medical Research Methodology 2 The heterogeneity statistic I can be biased in small meta-analyses Paul T von Hippel 0 0 Center for Health and Social Policy, LBJ School of Public Affairs, University of Texas , Austin, 2315 Red River, Box Y, Austin, TX 78712 , USA Background: Estimated effects vary across studies, partly because of random sampling error and partly because of heterogeneity. In meta-analysis, the fraction of variance that is due to heterogeneity is estimated by the statistic I2. We calculate the bias of I2, focusing on the situation where the number of studies in the meta-analysis is small. Small meta-analyses are common; in the Cochrane Library, the median number of studies per meta-analysis is 7 or fewer. Methods: We use Mathematica software to calculate the expectation and bias of I2. Results: I2 has a substantial bias when the number of studies is small. The bias is positive when the true fraction of heterogeneity is small, but the bias is typically negative when the true fraction of heterogeneity is large. For example, with 7 studies and no true heterogeneity, I2 will overestimate heterogeneity by an average of 12 percentage points, but with 7 studies and 80 percent true heterogeneity, I2 can underestimate heterogeneity by an average of 28 percentage points. Biases of 12-28 percentage points are not trivial when one considers that, in the Cochrane Library, the median I2 estimate is 21 percent. Conclusions: The point estimate I2 should be interpreted cautiously when a meta-analysis has few studies. In small meta-analyses, confidence intervals should supplement or replace the biased point estimate I2. Meta-analysis; Heterogeneity; Bias - Background When different studies estimate the effect of a treatment or exposure, the estimates will vary from one study to another. Some of this between-study variance comes from random sampling error, while some may come from heterogeneity. There are several sources of heterogeneity, including differences in the treatment, the treated population, the study design, or the data analysis method. When there is no heterogeneity, estimates are said to be homogeneous and differ only because of random sampling error. Heterogeneity is very important. If the existing studies of a treatment are homogeneous, or nearly homogeneous, then there is some assurance that the treatment will have a similar effect when applied to new subjects. On the other hand, if the existing studies are very heterogeneous, then unless the reasons for heterogeneity are well understood, the effect of the treatment on new subjects will be hard to predict [1]. Unfortunately, when studies are compared in a metaanalysis, it is often difficult to say anything definitive about heterogeneity. The reason for this difficulty is that most meta-analyses are small. One summary of the Cochrane Library reported that the median number of studies per meta-analysis was 7 [2], another summary reported that the median was 6 [3], and another reported that the median was just 3 [3]. With so few studies, the classical test for heterogeneity, Cochrans Q [4], is not very informative because its result is as much a function of the number of studies as it is of the amount of heterogeneity. When the number of studies is large, Q will often reject the null hypothesis even if the true extent of heterogeneity is trivial, but if the number of studies is small, Q provides little power to reject the null hypothesis of homogeneity even if substantial heterogeneity is present [5]. The power of Q and other homogeneity tests is further reduced when the studies in the meta-analysis are unbalanced in sizefor example, if one of the studies in the meta-analysis is much larger than the others [5]. To better describe heterogeneity, Higgins and Thompson [6] introduced the I2 statistic, which was meant to improve in two ways on Cochrans Q. First, I2 is more interpretable than Q; specifically, I2 estimates the proportion of the variance in study estimates that is due to heterogeneity. Second, unlike Q, I2 was meant to be independent of the number of studies; regardless of the number of studies, I2 ranges from 0 to 1 because it estimates a proportion. The I2 statistic is now used not just in metaanalysis but also in other analyses where we want to know what fraction of the variance in a set of estimates is due to heterogeneity [7-9]. I2 does not eliminate the uncertainty that comes from having a small number of studies. No statistic can. In small meta-analyses, for the same reason that Q has low power, I2 is very imprecise. For example, if Q fails to reject the null hypothesis of homogeneity, then the confidence interval around I2 will usually include 0. In metaanalyses from the Cochrane Library, the 95% confidence interval around I2 typically runs approximately from 0 to .60, implying that up to 60% of the between-study variance could be due to heterogeneity, or there could be no heterogeneity at all [2]. This is not a very informative conclusion. Unfortunately, the uncertainty of the I2estimate is not obvious to the typical reader of a meta-analysis published in, for example, Epidemiology [10,11], the American Journal of Epidemiology [12,13], or the Cochrane Library [14]. These outlets do not report the confidence interval around I2; they only report the point estimate I2, which may give a false impression of precision. In this note, we show that I2 is not just imprecise; it is also biased. Depending on the circumstances, the bias of I2 can be small or large, positive or negative, but the bias is largest when the number of studies is small and the true fraction of variance that is due to heterogeneity is either very large or very small. For example, in metaanalyses with 7 studies and no true heterogeneity, the I2 statistic will on average lead us to believe that heterogeneity accounts for about 12% of the between-study variance. At the other extreme, with 7 studies and 80% of the variance due to heterogeneity, the I2 statistic can on average lead us to believe that just 52% of the variance is due to heterogeneity. These biases of 12 to 28 percentage points are not trivial when one considers that, in the Cochrane Library, the median I2 value is just 21% [2]. In the following sections, we calculate and illustrate the bias of I2 and discuss implications for the statistics reported in meta-analyses. section introduces notation, assumptions, and statistical properties, and describes the calculations that we submitted to Mathematica. The Results section will give the results of those calculations. Meta-analysis Meta-analysis summarizes the results of K studies, each of which has sample size nk, k = 1,,K. In each study, there is a true effect k estimated by ^k , with a true standard error k estimated by ^k , or, equivalently, a true variance k2 estimated by ^k2 . With large nk, the quantity ^k k =^k approaches a standard normal distribution according to the central limit theorem. (...truncated)