Effect of pooling samples on the efficiency of comparative studies using microarrays

Bioinformatics, Dec 2005

Motivation: Many biomedical experiments are carried out by pooling individual biological samples. However, pooling samples can potentially hide biological variance and give false confidence concerning the data significance. In the context of microarray experiments for detecting differentially expressed genes, recent publications have addressed the problem of the efficiency of sample pooling, and some approximate formulas were provided for the power and sample size calculations. It is desirable to have exact formulas for these calculations and have the approximate results checked against the exact ones. We show that the difference between the approximate and the exact results can be large. Results: In this study, we have characterized quantitatively the effect of pooling samples on the efficiency of microarray experiments for the detection of differential gene expression between two classes. We present exact formulas for calculating the power of microarray experimental designs involving sample pooling and technical replications. The formulas can be used to determine the total number of arrays and biological subjects required in an experiment to achieve the desired power at a given significance level. The conditions under which pooled design becomes preferable to non-pooled design can then be derived given the unit cost associated with a microarray and that with a biological subject. This paper thus serves to provide guidance on sample pooling and cost-effectiveness. The formulation in this paper is outlined in the context of performing microarray comparative studies, but its applicability is not limited to microarray experiments. It is also applicable to a wide range of biomedical comparative studies where sample pooling may be involved. Availability: A Java Webstart application can be accessed at http://wads.le.ac.uk/htox/WadsSoftware/MrcStats/SCal4Poolings.jnlp Contact: sdz1{at}le.ac.uk; twg1{at}le.ac.uk

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://bioinformatics.oxfordjournals.org/content/21/24/4378.full.pdf

Effect of pooling samples on the efficiency of comparative studies using microarrays

Shu-Dong Zhang 0 Timothy W. Gant 0 0 MRC Toxicology Unit , Hodgkin Building, Lancaster Road , University of Leicester , Leicester , UK Motivation: Many biomedical experiments are carried out by pooling individual biological samples. However, pooling samples can potentially hide biological variance and give false confidence concerning the data significance. In the context of microarray experiments for detecting differentially expressed genes, recent publications have addressed the problem of the efficiency of sample pooling, and some approximate formulas were provided for the power and sample size calculations. It is desirable to have exact formulas for these calculations and have the approximate results checked against the exact ones. We show that the difference between the approximate and the exact results can be large. Results: In this study, we have characterized quantitatively the effect of pooling samples on the efficiency of microarray experiments for the detection of differential gene expression between two classes. We present exact formulas for calculating the power of microarray experimental designs involving sample pooling and technical replications. The formulas can be used to determine the total number of arrays and biological subjects required in an experiment to achieve the desired power at a given significance level. The conditions under which pooled design becomes preferable to non-pooled design can then be derived given the unit cost associated with a microarray and that with a biological subject. This paper thus serves to provide guidance on sample pooling and cost-effectiveness. The formulation in this paper is outlined in the context of performing microarray comparative studies, but its applicability is not limited to microarray experiments. It is also applicable to a wide range of biomedical comparative studies where sample pooling may be involved. Availability: A Java Webstart application can be accessed at http:// wads.le.ac.uk/htox/WadsSoftware/MrcStats/SCal4Poolings.jnlp Contact: ; The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: 1 INTRODUCTION Pooling samples in biomedical studies has now become a frequent practice among many researchers. For example, >15% of the datasets deposited in the Gene Expression Omnibus Database involve pooled RNA samples (Kendziorski et al., 2005). The practice of pooling biological samples though is not a new phenomenon, as it can be traced back at least to 1940s (Dorfman, 1943) and has been used in different application areas (Gastwirth, 2000), e.g. for the detection of certain medical conditions and estimation of prevalence in a population. In the context of detecting differential gene expressions using microarrays, divergent views on the wisdom of pooling To whom correspondence should be addressed. samples can be found in the literature (Agrawal et al., 2002; Affymetrix, 2004; Shih et al., 2004; Churchill and Oliver, 2001; Peng et al., 2003; Jolly et al., 2005). One of the arguments supporting the practice of pooling biological samples is that biological variation can be reduced by pooling RNA samples in microarray experiments(Churchill and Oliver, 2001). As more carefully described by Kendziorski et al. (2005), pooling can reduce the effects of biological variation, but not the biological variation itself. Another argument in support of pooling samples in microarray experiments is that it reduces financial cost. However, cost reduction is meaningful only if statistical equivalence between the pooled and the non-pooled experimental setups is maintained. Here we address this issue and present formulas to determine the conditions under which pooled and non-pooled designs are statistically equivalent. To compare experimental designs with and without sample pooling the two designs must have something in common that can be measured, e.g. using the same or equivalent amount of resources or yielding the same level of detection power. Kendziorski et al. (2003) used the width of the 95% confidence interval for gene expression to compare different experimental designs with and without sample pooling. The criterion was that the narrower the confidence interval, the more accurate the results from the experimental design. In a comparative study where two groups of biological subjects are compared the common goal of the different experimental designs is to detect a change between the two groups with a given power at a given false positive rate, as adopted in Shih et al. (2004). We shall use the latter method to compare different designs. So in this work statistical equivalence means that the designs have the same statistical power at the same level of significance. Therefore the more appropriate experimental design will be the one which uses less resources to achieve this statistical equivalence. The basic assumption underlying sample pooling is biological averaging; that the measure of interest taken on the pool of samples is equal to the average of the same measure taken on each of the individual samples which contributed to the pool. For example in the situation of a microarray experiment, if r individual samples contribute equally to a pool, and the concentrations of a genes mRNA transcripts for the r samples are denoted by Ti with i 1,2, . . . , r indexing the individual samples, the assumption of biological averaging says that the concentration of this genes mRNA transcripts in the pool is T 1=r Pir1 Ti. However, for microarray experiment there is some debate on whether the basic assumption of pooling holds. Kendziorski et al. (2003, 2005) argue that there is limited support for this assumption. Here we do not seek to enter into this debate but rather take the assumption of biological averaging as valid, or at least approximately so, so that we are in a position to determine whether pooling samples is financially beneficial or not. The validity of biological averaging makes it possible (or easier) to derive a neat theoretical formulation. On a practical level though, the requirement for the validity of this assumption may not be as stringent as a theoretical formulation does. For instance, Kendziorski et al. (2005) show that even when biological averaging does not hold, pooling can be useful and inferences regarding differential gene expression are not adversely affected by pooling. One situation where there is little alternative but to pool biological samples is where there is insufficient amount of RNA from each individual biological subject to perform single microarray hybridization. RNA amplification may be a possible way of obtaining more RNA, but may not be practically feasible when many individual biological subjects are involved as in the case of Jin et al. (2001). In such a circumstance, pooling samples is justified by the lack of alternative and will not be considered further here. Similarly we will not consider here the case where all the biological samples of the same group were pooled together, and multiple technical replicate measurements were carried out on the sample pool. This is sometimes seen in the literature (Muckenthaler et al., 2003), but such an experimental design leaves no degree of freedom to estimate the biological variance. Thus valid inferences about the differences between the two populations of biological subjects under study cannot be made. Here we only consider situations other than the above two and where pooling may reduce the overall costs of the experiments. A GENERAL FORMALISM For every comparative study, there is at least one measurable quantity which is the quantity of interest. The goal of the study is to deduce from the data collected whether there is any difference between the means of the two populations. As measuring all the biological subjects in two populations is rarely possible in most situations representatives from a population are randomly selected and measurements are made on these. These are then taken to infer the properties of the population. Let X be the measurable quantity that is being determined in the experiment, e.g. the expression level of a gene. In the case of onechannel microarray, X could denote the logarithm (most commonly base 2 is used) of fluorescence intensity or the logarithm of the fluorescence ratio in the case of two-channel microarray. Let xicdenote the value of X for an individual subject i in the control population (c), and xtj that of the individual subject j in the treatment population (t). We assume that xics for all individuals in the control population are independent normally distributed with a mean mc and a variance sc2, denoted by xic N mc, sc2 for all i. Similarly, xtj N mt st2 for all j. A general experimental setup For a general experimental setup individual subjects from both populations are randomly selected and tissue samples collected from each. Tissue sample pools are made by pooling a given number r of randomly selected tissue samples (of the same population) together. Note that to make n pools we need to have selected nr individual subjects from the population. m measurements are then made on each pool of tissue samples. So m is the number of technical replications of measurement on each pool. Notice that by introducing two parameters r and m a general and flexible experimental setup has been created. For instance, if we set r 1, the experiment would be equivalent to no pooling of tissue samples. And if we set m 1 there is no technical replication. Under the basic assumption of biological averaging, the result of pooling r tissue samples in equal proportions together is that the value of X for the pool is the average of those subjects which formed this pool, It follows that x N mc sc2=r for a pool from the control population, or x N mt st2=r for a pool from the treated population. Note that in this paper we shall only discuss pooling samples with equal individual contributions. While pools formed by unequal contributions from individual samples are possible, such pooled experimental design is generally less effective than the equal pooling, as already shown by Peng et al. (2003) with their simulated results. When we take a measurement on a pool p, the measured value is yp k x~p ek where p indexes pools, k indexes measurements and ek is a random error term assumed to be independently and normally distributed as ek N 0 se2 . Hereafter se2 will be referred to as the technical variance, sc2 the biological variance for the control population and st2 the biological variance for the treatment population. The output of the experiment are the measurements on the two sets of pools. For the control group, we have ycp k for p 1, . . . , nc and k 1, . . . , m. And for the treatment group, we have ytp k for p 1, . . . , nt and k 1, . . . , m. Here nc and nt are the numbers of pools prepared for the control and treatment populations, respectively. Our task is to infer population properties from these measured data. In particular, we want to know whether there is any difference between the two population means mc and mt. It can be shown that is an unbiased estimator of mc, with a variance and similarly, 1 Xnc Xmycp k is an unbiased estimator of mt, with a variance If we make an additional assumption that the variances for the two populations of biological subjects are the same, i.e. s2 t s2, c s2 then the difference between Equation (5) and (3), D Y t Y c, is an unbiased estimator of m mt mc with a variance The factor s2=r se2=m in Equation (7) can be estimated without bias by It is then clear that Y c mt sppffi1ffiffi=ffiffinffiffifficffiffiffiffiffiffiffi1ffiffiffi=ffiffinffiffiffit follows the Students t-distribution with nc + nt 2 degrees of freedom. In detecting a differential gene expression, we want to test the null hypothesis mc mt against an alternative hypothesis mc 6 mt. So our test statistic is t0 sppYffi1ffiffi=tffiffinffiffifficffiffiYffiffiffifficffi1ffiffiffi=ffiffinffiffiffit and there are no unknowns in Equation (10). Note that t0 can be seen as a generalized two-sample t-test statistic, which reduces to the statistic of the traditional two-sample t-test with equal variance when we set the parameters r 1 (no pooling of tissue samples) and m 1 (no technical replication of measurements). Shih et al. (2004), arrived at two separate statistics, one for non-pooled design, the other for pooled design. The t0 defined by Equation (10) is in more general form, setting r 1 and m 1 in Equation (10) recovers Shih et al.s statistic for non-pooled design; while setting r > 1 and m 1 recovers Shih et al.s statistic for pooled design. Note that m does not need to equal 1. Here by incorporating two additional parameters r and m, the statistic t0 can deal with situations where there are pooled tissue samples and multiple technical replications. Criteria of significance As with any statistical test we need to specify a threshold P-value Pth to claim significant results in the test. When all the other parameters are given, setting Pth is equivalent to setting a threshold, say |j|, for the statistics t0 defined in Equation (10). With this threshold t-value, our criteria for claiming a significant test is as follows: If t0 > |j|, we declare that mt mc > 0; if t0 < |j|, it is claimed as mt mc < 0. So the rate at which false positive claims are made is rncnt 2t0 dt0 2Tncnt 2j jj where rncnt 2: is the probability density function (PDF) of the Students t-distribution with nc + nt 2 degrees of freedom, and Tncnt 2: is the corresponding cumulative probability distribution function (CDF). It is therefore apparent that the threshold t-value |j| can be obtained by solving the equation 2Tncnt 2 j jj Pth with a given false positive rate Pth. POWER FUNCTION In Zhang and Gant (2004) we presented a power function for a new statistical t-test (hereafter referred to as two-labelling t-test) in the context of using two-colour microarrays to detect differential gene expression. Following similar steps we can derive the power function for the generalized two-sample t-test presented in this paper, which reads where pncnt 2Y is the PDF for the x2-distribution with nc + nt 2 degrees of freedom and F(.) is the CDF for the standard normal distribution. The rate S at which a true difference between mt and mc can be successfully detected is a function of nc, nt, |m|/sD, and |j|. With sD given by the square root of Equation (7), and |j| determined by solving Equation (11) at a given false positive rate Pth, S is, eventually, a function of Pth, nc, nt and |m|/sD. A few points are worth noting here. (1) The two-labelling t-test presented in Zhang and Gant (2004) was designed to deal with systematic labelling biases generated during microarray experimentation. The t-test presented in this paper, however, assumes no systematic data biases. In the case of two-colour microarrays this requires a common reference design. In such an experimental design the labelling biases cancel themselves out in the calculation of the test statistic. (2) In Zhang and Gant (2004), the biological variances of the two populations under comparison do not have to be the same, i.e. we did not assume s2 c st2. For the t-test in this paper, we have made an additional assumption that sc2 st2. Relaxing this requirement was possible, as in the case of the traditional two-sample t-test with unequal variance (Brownlee, 1965), but an exact power function could not be readily obtained. (3) The exact power function obtained in this paper allows evaluation of the effects of pooling biological samples and the effects of taking multiple technical measurements, thus giving researchers quantitative guidance on the practice of pooling samples. (4) By setting the parameters r 1 and m 1, an exact power function is provided for the traditional two-sample t-test with equal variance. We have implemented the computation of the power function S of Equation (12) as a Java application, which can be accessed at the URL given in the abstract. Here we apply this to microarray comparative studies for finding differentially expressed genes and investigate the effect of pooling RNA samples in the experiments. We also compare our exact results with some approximate results presented by other authors (Shih et al., 2004) to demonstrate why an exact formula is desirable. The r 1 panel represents that of non-pooled design. Other parameter values are s2 0.05, se2 0:0125, l s2=se2 4 and m 1. Ns r(nc + nt) is the total number of biological subjects required, and Nm m(nc + nt) is the total number of measurements (microarrays) needed, counting both the control and the treatment populations. The preset targets are false positive rate being controlled at Pth 0.001, to detect 2-fold differential expression (m 1) with power no less than 0.95. The minimum number of biological subjects (Ns) and microarrays (Nm) that meet the preset targets are highlighted with bold fonts. Comparison with approximate results Based on their approximate formulas, Shih et al. (2004) considered two scenarios to compare the number of biological subjects and number of microarrays in the non-pooled and pooled designs. Here TFihge. p1a.rTamheetpeorsweurseSd aasrea ffourncthtieonseocfonthde stcoetnalarnioumsb2erof0.p2o, oslse2 nc 0+:0n5t., l s2=se2 4, Pth 0.001, m 1 and m 1. The five solid curves correspond to different levels of pooling, from right to left, r 1, r 2, r 4, r 6 and r 15, respectively. The dashed line indicates the 95% power, the intersections of which with the power curves specify the total numbers of pools (assuming nc nt) needed to achieve the target power. The total number of biological subjects and the total number of arrays can then be calculated simply by Ns r(nc + nt), and Nm m(nc + nt), respectively. we give exact results for the two scenarios to show the difference to the approximate results. In the first scenario, we consider that the common biological variance of the two populations is s2 0.05, and the technical variance s2 e 0:0125, which gives the biologicalto-technical variance ratio l s2=se2 4. The preset target of the experiment in this scenario is that the false positive rate being controlled at Pth 0.001 and the power being no less than S 0.95 to detect a 2-fold differential gene expression, which corresponds to m 1 with base 2 logarithm (Shih et al., 2004). In Table 1, we present results for different pooling parameter r. It can be seen from the first panel of this table that in order to hit the preset target, the non-pooled design (r 1) requires at least 12 biological subjects divided evenly to the two populations, i.e. 6 from each of the two populations. Having seven subjects from one population and five subjects from the other is insufficient to achieve the target of 95% detection power. The effects of other levels of pooling on the detection power are also shown in Table 1. The minimum number of biological subjects (Ns) and microarrays (Nm) that meet the preset targets is highlighted with bold fonts. It is clear that as the level of pooling is increased (with increasing r), the number of microarrays Nm can be reduced, but the number of biological subjects Ns has to be increased. For example, in order to reduce the number of arrays from 12 (Table 1, first panel) to 8 (Table 1, fourth panel), the number of biological subjects to form the pools must be increased from 12 to 40. For the second scenario we consider the case s2 0.2, se2 0:05, which gives l s2=se2 4. Again the preset targets are to detect a true differential expression m 1 with no less than 95% power while the false positive rate is set at Pth 0.001. Using these parameters, the power S as a function of nc + nt is plotted in Figure 1 for different levels of sample pooling. For the Table 2. Comparison of our exact results and the approximate results of Shih et al. (2004) Ns (Exact) Ns (Approx) Conditions 12 20 27 The upper panel of the table is for the first scenario, where s2 0.05, se2 0:0125, l s2=se2 4. The lower panel is for the second scenario, where s2 0.2, se2 0:05, and l s2=se2 4. The targets of both scenarios are that the false positive rate Pth 0.001 and the power no less than S 0.95. The last column in each panel gives the cost conditions when pooling samples become beneficial relative to a lower level of pooling shown in this table. non-pooled design (r 1), Ns 30 total biological subjects and Nm 30 arrays are required to hit the preset targets. Similar to the first scenario, as the level of pooling is increased, the number of arrays Nm is reduced while the number of subjects increased to meet the preset targets. In Table 2, we summarize our exact results and the approximate results of Shih et al. (2004). It can be seen that the difference between the two can be very large, indicating the need for exact results. For example, in the first scenario when Nm 8 the approximate result of Shih et al. (2004) predicts that a minimum of 21 biological subjects are required. In practice 24 subjects are required as 24 is the minimum number >21 and divisible by 8. However this experiment setup (24 subjects forming 8 pools, 8 microarrays) will only give a detection power of 90%. To meet the target power of 95%, 40 biological subjects are actually required by our exact result. If an experiment with Nm 7 microarrays is planned, Shih et al. predicts that 37 subjects are required, but in fact 126 subjects must be used to achieve the target. Generally, the approximate formulas of Shih et al. (2004) are too optimistic in assessing the benefits of pooling samples and reducing the number of microarrays, because they underestimate the number of biological subjects required. Cost analysis Depending on the material costs involved in the biological subjects and microarrays, the conditions where pooling samples becomes beneficial may be different from laboratory to laboratory. Here we show examples to determine these conditions. Denoting the cost associated with each biological subject as Cs (including materials and labour, etc.) and the cost associated with a microarray as Cm, the total costs for an experiment in microarray comparative study is CT NsCs + NmCm. Taking the first scenario as an example, the total cost of a non-pooled design to achieve our preset targets is CTr 1 12Cs 12Cm and the total cost for pooled design with r 2 is CTr 2 20Cs 10Cm: Therefore in order that the pooled design with r 2 is beneficial we must have CTr 1 which requires that Cm 4Cs. Put another way, only when the cost associated with one microarray Cm is more than four times the cost of a subject Cs, does the pooling design with r 2 become preferable to the non-pooled design. Similarly a higher level of pooling with r 3 becomes preferable to r 2 only when Cm 7Cs. Furthermore the conditions for increasing the level of pooling from r 3 to r 5 are Cm 13Cs, and so on. Table 2 gives these conditions for further levels of pooling. For the first scenario using the actual cost figures given in Shih et al. (2004) where Cs $230 and Cm $300, it can be seen that none of the pooling conditions is met. Therefore for this laboratory pooling samples is not recommended. However, if we use the cost figures of Kendziorski et al. (2003) where Cs $50 and Cm $700, an optimal design is a pooled design with r 5. For the second scenario, it is a similar story. The cost figures of Shih et al. (2004) (Cs $230 and Cm $300) give Cm 1.30Cs, which does not satisfy any of the pooling conditions. So again the non-pooled design with Nm 30 and Ns 30 is recommended. On the other hand, the cost figures of Kendziorski et al. (2003) (Cs $50 and Cm $700) give Cm 14Cs which satisfies all the pooling conditions in the lower panel of Table 2 except the last row. So in Kendziorski et al.s laboratory the pooled design with Nm 14 and Ns 84 would be recommended. DISCUSSION We have in this paper presented exact formulas for calculating the power of microarray experimental design with different levels of pooling. These formulas can be used to determine the conditions of statistical equivalence between different pooling setups. As in Kendziorski et al. (2003) and Shih et al. (2004), the calculations presented in this paper are for an individual gene, so the statistical equivalence for different designs of pooling can be determined with regard to one particular gene. However, microarray monitors thousands of genes simultaneously, and the biological and technical variances vary from gene to gene, therefore no single result of statistical equivalence between pooled and non-pooled designs applies equally to all genes on the array. So in practice how would the formulations in this work be used? One possible way, as suggested by Kendziorski et al. (2003), is to specify the distributions of s2 and se and calculate the total number of subjects and arrays that maximize the average power across the array. In theory, if the biological variances and technical variances were known for all genes on the array, an equivalence condition between pooled and non-pooled designs could be determined for each gene individually. The overall (or say, average) equivalence condition between pooled and non-pooled designs could be obtained, e.g., by some form of averaging operation over all genes. An alternative and probably a more practical way is to use representative values of s2 and se. We therefore propose that parameters for typical gene be used as inputs for the power and sample size calculations. A typical gene is a gene whose biological and technical variances have the most probable values among the genes, i.e. the mode of the distribution for biological and technical variances of genes. Alternatively, the median or mean variances across genes could be used as representative values Shih et al. (2004). An issue associated with microarray experiments is the problem of multiple inferences, where a separate null hypothesis is being tested for each gene. Given thousands of null hypotheses being tested simultaneously, the customary significance level a 0.05 for declaring positive tests will surely give too many false positives. For example, if among a total number N 10 000 of genes being tested, N0 4000 are truly null genes (genes that are nondifferentially expressed between the two classes), the expected number of false positive results would be 4000 0.05 200, which may be too many to be acceptable. Thus a smaller threshold P-value for declaring differentially expressed genes should be used. Effectively controlling false positives in a multiple-testing situation such as microarray experiments is an area which has drawn much attention in recent years due to the wider application of microarray technology. As discussed in our previous work in Zhang and Gant (2004), generally speaking, all different multiple-testing adjustment methods eventually amount to effectively setting a threshold P-value and then rejecting all the null hypotheses with P-value below this threshold. The classical Bonferroni multiple-testing procedure, which controls family-wise error rate at a by setting the threshold Pth a/N, is generally regarded as being too conservative in the microarray context. The FDR (false discovery rate) idea, initially due to Benjamini and Hochberg (1995) in dealing with the multiple-testing problem, has now been widely accepted as appropriate to the microarray situation. Recently, Efron (2004) extended the FDR idea by defining FDR, a local version FDR. When planning microarray experiments in terms of power and sample size calculation, the FDR of Benjamini and Hochberg (1995) is more appropriate and convenient to use. There are now in literature a few slightly different variants of the definition of FDR (Benjamini and Hochberg, 1995; Storey and Tibshirani, 2003; Grant et al., 2005), but in essence it is defined as the proportion of false positives among all positive tests declared. To provide an interface between FDR and the formulation in the previous sections, here we show that there is a simple correspondence between controlling FDR and specifying the traditional type I error rate and power. Suppose that there are a total number N of genes being monitored by microarray, so there will be N hypotheses being tested, one for each gene. Suppose that a fraction p0 of the N genes are true null genes, i.e. genes that are non-differentially expressed between the two classes. Given the type I error rate Pth, the expected number of false positive tests is PthNp0; Given the power S, the expected number of non-null genes (truly differentially expressed genes) that are declared positive is SN(1 p0). So the FDR achieved by this setting is Here p0 is an important parameter in controlling FDR, for which several different methods of estimating this parameter have been proposed (Pounds and Morris, 2003; Storey and Tibshirani, 2003; Zhang and Gant, 2004). Especially the method we presented in Zhang and Gant (2004) is an accurate yet computationally much simpler algorithm than the one proposed by Storey and Tibshirani (2003). With the interface Equation (14), FDR can be readily presented and incorporated into the calculations. ACKNOWLEDGEMENTS We wish to acknowledge the support of the microarray team of the MRC Toxicology Unit particularly Reginald Davies, JinLi Luo and Joan Riley. We also thank the two anonymous reviewers for their helpful and constructive comments. Conflict of Interest: none declared.


This is a preview of a remote PDF: https://bioinformatics.oxfordjournals.org/content/21/24/4378.full.pdf

Shu-Dong Zhang, Timothy W. Gant. Effect of pooling samples on the efficiency of comparative studies using microarrays, Bioinformatics, 2005, 4378-4383, DOI: 10.1093/bioinformatics/bti717