Optimal sample sizes for Welch’s test under various allocation and cost considerations

Behavior Research Methods, Dec 2011

The issue of the sample size necessary to ensure adequate statistical power has been the focus of considerableattention in scientific research. Conventional presentations of sample size determination do not consider budgetary and participant allocation scheme constraints, although there is some discussion in the literature. The introduction of additional allocation and cost concerns complicates study design, although the resulting procedure permits a practical treatment of sample size planning. This article presents exact techniques for optimizing sample size determinations in the context of Welch (Biometrika, 29, 350–362, 1938) test of the difference between two means under various design and cost considerations. The allocation schemes include cases in which (1) the ratio of group sizes is given and (2) one sample size is specified. The cost implications suggest optimally assigning subjects (1) to attain maximum power performance for a fixed cost and (2) to meet adesignated power level for the least cost. The proposed methods provide useful alternatives to the conventional procedures and can be readily implemented with the developed R and SAS programs that are available as supplemental materials from brm.psychonomic-journals.org/content/supplemental.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

http://link.springer.com/content/pdf/10.3758%2Fs13428-011-0095-7.pdf

Optimal sample sizes for Welch’s test under various allocation and cost considerations

Show-Li Jan 0 Gwowen Shieh 0 0 S.-L. Jan Department of Applied Mathematics, Chung Yuan Christian University , Chungli, Taiwan 32023 , Republic of China 1 ) Department of Management Science, National Chiao Tung University , 1001 Ta Hsueh Road, Hsinchu, Taiwan 30050 , Republic of China The issue of the sample size necessary to ensure adequate statistical power has been the focus of considerableattention in scientific research. Conventional presentations of sample size determination do not consider budgetary and participant allocation scheme constraints, although there is some discussion in the literature. The introduction of additional allocation and cost concerns complicates study design, although the resulting procedure permits a practical treatment of sample size planning. This article presents exact techniques for optimizing sample size determinations in the context of Welch (Biometrika, 29, 350-362, 1938) test of the difference between two means under various design and cost considerations. The allocation schemes include cases in which (1) the ratio of group sizes is given and (2) one sample size is specified. The cost implications suggest optimally assigning subjects (1) to attain maximum power performance for a fixed cost and (2) to meet adesignated power level for the least cost. The proposed methods provide useful alternatives to the conventional procedures and can be readily implemented with the developed R and SAS programs that are available as supplemental materials from brm.psychonomic-journals.org/ content/supplemental. - The a priori determination of a proper sample size necessary to achieve some specified power is an important problem frequently encountered in practical studies. To make inferences about differences between two normal population means, the hypothesis-testing procedure and corresponding sample size formula are well known and easy to apply. For important guidance, see the comprehensive treatments in Cohen (1988) and Murphy and Myors (2004). In the statistical literature, comparison of the means of two normal populations with unknown and possibly unequal variances has been the subject of much discussion and is well recognized as the BehrensFisher problem (Kim & Cohen, 1998). The existence and importance of violation of the assumption of homogeneity of variance in clinical research settings are also addressed in Grissom (2000). The practical importance and methodological complexity of the problem have occasioned numerous attempts to develop various procedures and algorithms for resolving the issue. Notably, several studies have shown that Welchs (1938) approximate degrees of freedom approach offers a reasonably accurate solution to the BehrensFisher problem. Therefore, Welchs procedure is routinely introduced in elementary statistics courses and textbooks. Moreover, some popular statistical computer packages, such as SAS and SPSS, have implemented the method for quite some time. In practice, power analyses and sample size calculations are often critical for investigators to credibly address specific research hypotheses and confirm differences. Thus, the planning of sample size should be included as an integral part in the study design. Accordingly, it is of practical interest and fundamentalimportance to be able to perform these tasks in the context of the BehrensFisher problem. The essential question is how to determine sample sizes optimally under different allocation and cost considerations that call for independent random samples from two normal populations with possibly unequal variances. Conventional studies of power and sample size have not addressedmatters of allocationrestriction and cost efficiency, although researchers have been exploring design strategies that take into account the impact of different constraints of the sample scheme and project funding while maintaining adequate power. Specifically, the allocation ratio of group sizes was fixed in the calculation of sample size for comparing independent proportions in Fleiss, Tytun, and Ury (1980), while Heilbrun and McGee (1985) considered sample size determination for the comparison of normal means with a known ratio of variances and one sample size being specified in advance.In an actual experiment, however, the available resources are generally limited, and it may require different amounts of effort and costs to recruit subjects for the treatment and the control groups. Assuming homogeneous variances, Nam (1973) presented optimal sample sizes to maximize power for the comparison of the treatment and control under budget constraints. Conversely, Allison, Allison, Faith, Paultre, and Sunyer (1997) advocated designing statistically powerful studies while minimizing costs. Interested readers are referred to recent articles by Bacchetti, McCulloch, and Segal (2008) and Bacchetti (2010) for alternative viewpoints and related discussions. Within the framework of the BehrensFisher problem, assuming a desired sample size ratio, Schouten (1999) derived an approximate formula for computing sufficient sample size for a selected power. In addition, in Schouten (1999), a simplified sample size formula was proposed to minimize the total cost when the cost of treating a subject varies with experimental groups. Also, Lee (1992) determined the optimal sample sizes for a designated power so that the total sample size is minimized. It is important to note that the setting in Lee can be viewed as a special case of Schouten. However, unlike the exact approach of Lee, the presentation of Schouten involved several approximations, including the use of a normal distribution, which does not conform to the notion of a t distribution with approximate degrees of freedom proposed in Welch (1938). Alternatively, Singer (2001) modified the simple formula of Schouten by replacing the percentiles of the standard normal distribution with those of a t distribution with approximate degrees of freedom. Unfortunately, the resulting formulation is questionable on account of its absence of theoretical justification. Detailed analytical and empirical examinations are presented later to demonstrate the underlying drawbacks associated with the approximate procedures of Schouten and Singer. Moreover, Luh and Guo (2007), Guo and Luh (2009), and Luh and Guo (2010) extended the approximations of Schouten and Singer to the two-sample trimmed mean test with unequal variances under allocation and cost considerations. Basically, when the trimming proportion is 0, the procedures of Guo and Luhare applicable for the BehrensFisher problem. However, their procedures are still approximate in nature and possess the same disadvantages of Schoutens and Singers. More important, the algorithms employed by Guo and Luhfail to take into account the underlying metric of integer sample sizes and often lead to suboptimal results. From a methodological standpoint, the results in Schouten, Singer, Luh and Guo (2007), Guo and Luh, and Luh and Guo (2010) should be reexamined with technical clarifications and exact computations. Nonetheless, our calculations not only show that the prescribed approximate methods do not guarantee giving correct optimal sample sizes, but also reveal that some of the optimal sample sizes reported in the empirical illustrations of Lee are actually suboptimal. Due to the discrete character of sample size, it requires a detailed inspection of sample size combinations to find the optimal allocation that attains the desired power while giving the least total sample size. This extra step and resulting merit in sample size determination is not considered by Lee. The theoretical and numericalexaminations conducted here provide a comprehensive comparison of the various procedures available to date.In short, the accuracy of the existing sample size procedures for the BehrensFisher problem can be further improved by adapting an exact and refined approach. As was described above, there are important and useful considerations or strategies for study design other than the minimization of total sample size or total cost. Since Welchs (1938) approach to the BehrensFisher problem is so entrenched, it is prudent to present a comprehensive exposition of design configurations in terms of diverse allocation schemes and budget constraints. Here, exact methods are presented to give proper sample sizes either when the ratio of group sizes is fixed in advance or when one sample size is fixed. In addition, detailed procedures are provided to determine the optimal sample sizes that maximize the power for a given total cost and that minimize the cost for a specified power. More important, the corresponding computer algorithms are developed to facilitate computation of the exact necessary sample sizes in actual applications. Due to the prospective nature of advance research planning, it is difficult to assess the adequacy of selected configurations for model parameters in sample size calculations. The general guideline suggests that typical sources such as previously published research and successful pilot studies can offer plausible and reasonable planning values for the vital model characteristics (Thabane et al., 2010). However, the potential deficiency of using a pilot sample variance to compute the sample size needed to achieve the planned power for one- and two-sample t-tests has been examined by, among others, Browne (1995) and Kieser and Wassmer (1996). They showed that the sample sizes provided by the traditional formulas are too small, since they neglect the imprecise nature of a variance estimate. Note that all standard sample size procedures share the same fundamental weakness when sample variance estimates are used for the underlying population parameters. However, the issue is more involved, and a detailed discussion of this topic is beyond the scope of the present study. The interested reader is referred to Browne, Kieser, and Wassmer, and the references therein for further details. The Welch test As part of a continuing effort to improve the quality of research findings, this research contributes to the derivation and evaluation for sample size methodology of Welchs (1938) approximate t test for the BehrensFisher problem. Consider independent random samples from two normal populations with the following formulations: where 1, 2, s21, and s22 are unknown parameters, j = 1, . . . , Ni, and i = 1 and 2. For detecting the group effect in terms of the hypothesis H0: 1 = 2 versus H1: 12, the well-known Welchs t statistic has the form S12=N1 S22=N2 1=2 ; where X 1 PN1 X1j=N1; X 2 PN2 X2j=N2; S12 PN1 X1j X 1 2= N1 1; an djS122 PN2 X2j jX1 2 2=N2 1j .1 Under the null hypothesis H0:j11 = 2, Welch (1938) proposed the approximate distribution for V: 1=bv N1 1 S12=N1 S22=N2 2 : of the t distribution tv. The same notion was independently suggested by Smbith (1936) and Satterthwaite (1946), and the test is sometimes referred to as the SmithWelch Satterthwaite test. It is important to emphasize that the degrees of freedom v is bounded by the smaller of N11 and N21 at one end band by N1 N2 2 at the otherthat is, MinN1 1; N2 1 v N1 N2 2. Because the critical value tdf, /2 decreabses as df increases, the approximate critical value t v;a=2 is slightly larger than that of the two-sample t test tN 1bN2 2; a=2 under homogeneity of variance assumptions. Although the differences between the two critical value saresmall with moderate to large sample sizes, they r e f l e c t t h e c o n c e p t u a l d i s t i n c t i o n b e t w e e n t h e corresponding Welchs t test and the regular two-sample t test. Note that a standard normal distribution can be viewed as a t distribution with an infinite number of degrees of freedom. However, the close resemblance between a standard normal distribution and a t distribution never causesthe introductory courses or textbooks to omit the coverage of Students t distribution. Therefore, the theoretical distinction and implication between the critical value tN 1N2 2; a=2 and a standard normal critical value z/2 is highlyanalogous to that between t v; a=2 and tN 1N2 2; a=2. Ultimately, the t approximation wbith the approximate degrees of freedom given in Eq. 1 serves as the prime solution to the BehrensFisher problem. Although the underlying normality assumption in the above-mentioned two-sample location problem provides a convenient and useful setup, the exact distribution of Welchs test statistic V is comparatively complicated and may be expressed in different forms (see Wang, 1971, Lee & Gurland, 1975, and Nel, van der Merwe, & Moser, 1990, for technical derivation and related details). For ease of presentation, we need to develop some notation. It follows from the fundamental assumption that Z X 1 X 2 =s N d; 1, d md=s, md m1 m2, s2 s21=N1 s22=N2, W N1 1S12=s21 N2 1S22= s22 # 2N1 N2 2 a n d B N1 1S12=s21 =W BetafN1 1=2; N2 1=2g. Thus, we consider the following alternative expression of V for its ease of numerical investigation: where T Z=fW =N1 N2 2g1=2 tN1 N2 2; d; tN1 N2 2; d is the noncentral t distribution with degrees of freedom N1 N2 2, and noncentrality parameter , and H s12=N1fB=pg s22=N2f1 B=1 pg =s2 with p N1 1=N1 N2 2. Note that the random variables Z, W, and B are mutually independent. Hence, T and B are independent. Also, it is important to note that 1=bv B21=N1 1 B22=N2 1 where B2 1 B1 and B1 s12=N1fB=pg = s21=N1fB=pg s22= N2f1 B=1 pg . Hence, both H and are functions of the random variable B. With the prescribed distributional properties in Eq. 2, the associated power function of V is denoted by p md ; s12; s22; N1; N2 PnjT j > t v; a=2 H 1=2o 3 b The numerical computation of exact power requires the evaluation of the cumulative distribution function of a noncentral t variable and the one-dimensional integration with respect to a beta probability distribution function. Since all related functions are readily embedded in major statistical packages, the exact computations can be conducted with current computing capabilities. To determine sample size, the power function can be employed to calculate the sample sizes (N1, N2) needed to attain the specified power 1- for the chosen significance level and parameter values m1; m2; s21; s22. Clearly, the power function is rather complex, and it usually involves an iterative process to find the solution, because both random variables V and t v; a=2are functions of the sample sizes (N1, N2). In order tobenhance the applicability of sample size methodology and the fundamental usefulness of Welchs (1938) procedure, in subsequent sections this study considers design configurations allowing for different allocation constraints and cost considerations. The R(R Development Core Team, 2010) and SAS/IML (SAS Institute, 2008a) programs employed to perform the corresponding sample size calculations are available in the supplementary files. Allocation constraints Since there may be several possible choices of sample sizes N1 and N2 that satisfy the chosen power level in the process of sample size calculations, it is prudent to consider an appropriate design that permits unique and optimal result. The following two allocation constraints are considered because of their potential usefulness. First, the ratio r = N2/ N1 between the two group sizes may be fixed in advance, so the task is to decidetheminimum sample size N1(N2 = rN1) required to achieve the specified power level. Second, one of the two sample sizessay, N2may be pre-assigned, and so the smallest size N1 required to satisfy the designated power should be found. Sample size ratio is fixed Assume that the sample size ratio r = N2/N1 is fixed in advance. To facilitate computation, without loss of generality, the ratio can be taken as r 1. Then the power function pmd ; s12; s22; N1; N2 of V becomes a strictly monotone function of N1 when all other factors are treated as constants. A simple incremental search can be conducted to find the minimum sample size N1 needed to attain the specified power 1- for the chosen significance level and parameter values m1; m2; s21; s22. To simplify the computation, the largesample normal approximation V N d; 1 can be used to provide initial values to start the iteration. Specifically, the starting sample size N1Z computed by the normal approximation would be the smallest integer that satisfies the inequality s21 s22=rza=2 zb2=md2 ; where z/2 and z are the upper 100(/2)th and 100th percentiles of the standard normal distribution, respectively. For illustration, when d = 1, = .05 and 1 = .90, the sample sizes N1 and N2 = rN1 are presented in Table 1 for selected values of r = 1, 2, and 3, 1 = 1/3, 1/2, 1, 2, and 3, and 2 = 1. The actual power is also listed, and the values are marginally larger than the nominal level .90. Note that SAS procedure PROC POWER (SAS Institute, 2008b) provides the same feature to find the optimal sample sizes N1 and N2 with a given sample size ratio. However, it does not accommodate the extended settings in which one of the sample sizes is fixed and the more involved cost concernsthat we consider next. One sample size is fixed For ease of exposition, the sample sizeN2 of the second group is held constant. Just as in the previous case, the minimum sample size N1 needed to ensure the specified power 1 can be found by a simple iterative search for the chosen significance level and parameter values m1; m2; s12; s22. In this case, the starting sample size N1Z, based on the normal approximation, is the smallest integer that satisfies the inequality s22=N2g: Cost considerations With limited research funding, it is desirable to consider the cost and effectiveness issues during the planning stage. In addition, the costs of obtaining subjects of treatment and control groups are not necessarily the same. Suppose that c1 and c2 are the costs per subject in the first and second groups, respectively; then, the total cost of the experiment is C = c1N1 + c2N2. The following two questions arise with considerable frequency in sample size determinations. First, Cs1c12=2 Cs2c11=2 given a fixed amount of money, what is the maximum power that the design can achieve? Second, assuming a preferred degree of power, what is the design that costs the least? In both cases, equal sample sizes for the two groups do not necessarily yield the optimal solution (Allison et al., 1997). Consequently, optimally unbalanced designs are more efficient, and a detailed and systematic approach to sample size allocation is required. With the simplified asymptotic approximation of Welchs test V N d; 1, the optimal allocation isobtainedfor the prescribed two scenarios when the ratio of the sample sizes assumes the equality where q s2c11=2=s1c21=2. However, the exact distribution of V given in Eq. 2 involves a beta mixture of noncentral t distributions. Thus, the associated properties can be notably different from a normal distribution for finite sample sizes. It is understandable that the particular identity of Eq. 6 will give a suboptimal result when the sample sizes are small. Such a phenomenon is demonstrated in the following illustration. Total cost is fixed and actual power needs to be maximized To develop a systematic search for the optimal solution, the aforementioned normal approximation is utilized as the benchmark in the exploration. It can be shown,under a fixed value of total cost C, that the maximum power is obtained with the sample size combination It is easy to see that c1N1Z c2N2Z C and N2Z = N1Z q, as in Eq. 6. But in practice, the sample sizes need to be integer values, so the use of discrete numbers introduces some in exactness into the cost analysis. To find the proper result, a detailed power calculation and comparison are performed for the sample size combinations with N1from N1min to N1max and N2 FloorC c1N1=c2 , w h e r e N1min FloorN1Z 1, N1max FloorfC c2 FloorN2Z 1g=c1 , and the function Floor(a) returns the largest integer that is less than or equal to a. Thus theoptimal sample size allocation is the one giving the largest power. Numerical results are given in Table 3 for (c1, c2) = (1, 1), (1, 2), and (1, 3) and fixed total cost C = 25, 30, 50, 100, and 180 in accordance with the five standard deviation settings of 1 and 2 reported in the previous two tables. Examination of the results in Table 3 reveals that the actual power for a given total cost deceases drasticallyas the unit cost c2 increases from 1 to 3. Regarding the optimal allocation, the general formula for the sample size ratio presented in Eq. 6 does not hold in several cases. For example, the ratio N2=N1 11=17 0:6471 for (1, 2) = (1, 1) and (c1, c2) = (1, 3) is slightly greater than the ratiocomputed with Eq. 6: q 1 11=2 = 1 31=2 0:5774. It should be noted that Guo and Luh (2009, Eq. 20) give the same approximate sample size formulas as in Eq. 7. However, they did not discuss how to utilize the particular result to find the ideal sample sizes for a fixed cost. Also, the numerical demonstration of Guo and Luh (p. 291) did not provide a systematic search for the optimal solution, and the sample sizes reported in their exposition are not integers. Ultimately, the inexactness issue incurred by integer sample sizes in cost analysis is not addressed by Guo and Luh. Target power is fixed and total cost needs to be minimized In contrast to the previous situation where costs were fixed, the strategy to accommodate both power performance and cost appraisal can be conducted by finding the optimal allocation for minimizing cost when the target power is prechosen. In this case, the large-sample theory shows that in order to ensure the nominal power while minimizing total cost C c1N1Z c2N2Z , the best sample size combination is where is the optimal ratio in Eq. 6. It can be readily seen that N2Z =N1Z q and s21=N1Z s22=N2Z m2d= Za=2 Zb 2. Due to the discrete character of sample size, the optimal allocation is found through a screening of sample size combinations that attain the desired power while giving the least cost. The exact power computation and cost evaluation are conducted for sample size combinations with N1 from N1min toza=N21mazxb2ands21a=Np1rgopesratvisafluyeingof thNe2 reqFuloiroerds22p=ofwme2dr=, where N1min FloorN1Z ; N1max Ceil s2= md=za=2 2 1 zb2 s22=FloorN2Z 1g , and the function Ceil(a) returns the smallest integer that is greater than or equal to a. Thus, the optimal sample size allocation is the one giving the smallest cost while maintaining the specified power level. In cases where there is more than one combination yielding the same magnitude of least cost, the one producing the larger power is reported. Table 4 provides the corresponding optimal sample size allocation, cost, and actual power for the configurations of (c1, c2) = (1, 1), (1, 2), and (1, 3) and the five standard deviation settings of 1 and 2 in the preceding tables. It is clear that the total cost for a required power and fixed standard deviations increases substantially as the unit cost c2 changes from 1 to 3. Again, the sample size ratios are close to, but different from, the approximate ratio . The largest discrepancy occurs with the case N2=N1 16= 6 2:6667 for (1, 2) = (1/3, 1) and (c1, c2) = (1, 1), where as the counterpart ratio q 1 11=2=1=3 11=2 3. To demonstrate the advantage and importance of the exact technique, we alsoexamine the theoretical and empirical properties of the approximate methods of Schouten (1999) and Singer (2001). Accordingly, Schoutens (p. 90) formulas are based on the normal approximation and give the identical approximate estimates N1Z and N2Z as defined in Eq. 8. In view of the approximate t distribution of the Welchs test statistic V defined in Eq. 1, Singer (Eq. 2) suggested a modification of Eq. 4 by replacing the percentiles of standard normal distribution with those of a t distribution with degrees of freedom v. Specifically, it requires an iterative process to b find the smallest integer that satisfies the inequality where rS = N2S/N1S. However, Singer did not provide any analytical justification for this alternative expression. Essentially, the naive formulation of Eq. 9 is questionable for lack of theoretical explanation. It is well known that if Z ~ N(0, 1), then X Z m N m; 1, where is a constant. This particular result and related properties yield the approximate formulas in Eq. 8. On the other hand, the linear transformation of the normal distribution does not generalize to the case of the t distribution; that is, if t~ t(df ), then Y t m does not follow a noncentralt distribution t(df, ) with a noncentrality parameter and degrees of freedom df. Actually, a random variable Y is said to have a noncentral t distribution t (df, ) if and only if Y Z m=W =df 1=2, where Z ~ N (0, 1), W ~ 2(df ), and Z and W are independent (Rencher, 2000, pp. 102103). This may explain the fact that direct substitution of standard normal percentiles with those of t distribution was rarely described in the literature of sample size methodology. Instead, an iterative search is required to resolve the issue for statistical reasoning and exactness. Nevertheless, Guo and Luh (2009) applied Eq. 9 with rS = to determine optimal sample sizes when target power is fixed and total cost needs to be minimized. For the purpose of comparison, we performed an extensive numerical examination of sample size calculations for the model settings in Table 4 of Guo and Luh (2009). To our knowledge, no research to date has compared the performance of the available approximate procedures with the exact method. All the sample sizes, cost, and corresponding actual power of the two approximate methods of Schouten (1999) and Singer (2001) and the exact approach are presented in Table 5. For target power 1 - = .80, d = 1, and = .05, a total of 24 model settings are examined according to the combined configurations of standard deviation ratio (1:2 = 1: 1 and 1: 2) and unit cost ratio (c1:c2 = 1: 2, 1: 1, and 2: 3) for s12 1:00, 2.15, 1.46, and 4.18. The sample sizes computed by Schoutens method are denoted by N1Z and N2Z, whereas the sample sizes N1S and N2S listed in Table 5 for the procedure of Singer are exact replicates of those presented for the untrimmed case in Table 4 of Guo and Luh. The corresponding exact sample sizes computed with the suggested approach are expressed as N1E and N2E. It can be readily seen from Table 5 that there are discrepancies between the approximate and exact procedures. First, the normal approximation or Schoutens (1999) method is misleading because only 4 out of 24 cases have attained the target power level of .80 (cases 4, 6, 12, and 24). Thus, the sample sizes N1Z and N2Z are generally inadequate. For the four occasions that meet the minimum power requirement, the resulting costs of cases 6, 12, and 24 are larger than those of the exact approach. Again, the reported sample sizes N1Z and N2Z are not optimal. Accordingly, case 4 is the single instance that agrees with the exact result. On the other hand, all the sample sizes N1S and N2S associated with Singers (2001) method satisfy the necessary minimum power .80. While there are seven occurrences (cases 2, 4, 8, 14, 15, 19, and 20) that match the exact results, the other 17 sample size N1S and N2S combinations suffer the disadvantage of incurring higher cost than the optimal selections N1E and N2E. In view of these empirical evidences, it is clear that the existing approximate procedures of Schouten and Singer are not accurate enough to guarantee optimal sample sizes and, therefore, the procedures presented in Eqs. 8 and 9 are not recommended. Furthermore, Lee (1992) examined the same problemwithout considering the differential unit cost per subject in the two groups, and this can be viewed as a special case of the presentation here with c1 = c2 = 1. Accordingly, his algorithm for determining the optimal sample sizes is questionable. For example, when 1 = 2 = 1, the reported sample sizes are N1 = N2 = 23 with total cost = total sample size = 46, and actual power is .9121. In contrast, our computation gives N1 = 23 and N2 = 22, with total cost = total sample size = 45, and attained power is. 9057. Therefore, to maintain the least target power level of .90, it requires only a total of 45 sample sizes, rather than the sizes of 46 as reported by Lee. Consequently, it is worthwhile conducting the suggested exact sample size computations. Table 5 Computed sample sizes (N1, N2), cost, and actual power for different procedures when the total cost needs to be minimized with target power 1 = .80, d = 1, and = .05 Exact Method Numerical example To demonstrate the features crossing different allocation constraints and cost considerations in sample size planning, the comparison of ability tests administered online and in the laboratory of Ihme et al. (2009) is used as an example. The test scores collected online and offline are assumed to have normal distributions with different variances, because the demographical structure of online samples can differ from that of offline samples acquired in conventional laboratory settings. To illustrate sample size determination for design planning, the results of Ihme et al. are modified to have the underlying population parameter values of Lab = 11, Online = 10, Lab = 2.3, and Online = 2.7. It is clear that online testing has the advantages of ease of obtaining a large sample and low cost. Thus, it may be desirable to set the sample ratio as NOnline/NLab = 4/1, which would imply that the sample sizes required to attain power .90 at the significance level .05 are NLab = 76 and NOnline = 304. In case in which the sample size NOnline is selected as 400, the offline group needs sample size NLab = 71 to meet the same power and significance requirements. However, it is important to take budget issues into account. Assume that the available total cost is set as C = 100 and the respective unit costs per subject are cLab = 1 and cOnline = 0.2. The optimal sample size solutionis NLab = 65 and NOnline = 175, which has an actual power of .8079. On the other hand, to attain the pre-assigned power of .90, the design must have the sample size allocation as NLab = 86 and NOnline = 224, which amounts to the budget of C =130.8. Such information may be useful for investigators to justify the design strategy and financial support. Although they did not address the sample size calculation, the reader is referred to Ihme et al. for further details about online achievement tests. The problem of testing the equality of the means from two independent and normally distributed populations with unknown and unequal variances has beenwidely considered in the literature. The distinctive usefulness of Welchs (1938) test in applications further occasions methodological and practical concerns about the corresponding procedures for sample size determination. Computationally, the use of computers and the general availability of statistical software permit inherent requirements for exact analysis. In view of the importance of sample size calculations in actual practice and the limited features of available computer packages, the corresponding programs are developed to facilitate the usage of the suggested approaches. Intensive numerical integration and incremental search are incorporated in the presented computer algorithms for finding the optimal solutions for different design requirements. Furthermore, various sample size tables are provided to help researchers have a better understanding of the inherent relationship that exists between the planned sample sizes conditional on the model configurations. The proposed sample size procedures enhance and expand the current methods and should be useful for planning of research in two-group situations where variances and costs per subject both differ across groups. Author Note The authors thank the editor, Gregory Francis, and the three anonymous reviewers for their helpful comments. This research was partially supported by National Science Council Grant NSC-992118-M-033-002.


This is a preview of a remote PDF: http://link.springer.com/content/pdf/10.3758%2Fs13428-011-0095-7.pdf

Show-Li Jan, Gwowen Shieh. Optimal sample sizes for Welch’s test under various allocation and cost considerations, Behavior Research Methods, 2011, 1014-1022, DOI: 10.3758/s13428-011-0095-7