Optimal sample sizes for precise interval estimation of Welch’s procedure under various allocation and cost considerations

Behavior Research Methods, Mar 2012

Welch’s (Biometrika 29: 350–362, 1938) procedure has emerged as a robust alternative to the Student’s t test for comparing the means of two normal populations with unknown and possibly unequal variances. To facilitate the advocated statistical practice of confidence intervals and further improve the potential applicability of Welch’s procedure, in the present article, we consider exact approaches to optimize sample size determinations for precise interval estimation of the difference between two means under various allocation and cost considerations. The desired precision of a confidence interval is assessed with respect to the control of expected half-width, and to the assurance probability of interval half-width within a designated value. Furthermore, the design schemes in terms of participant allocation and cost constraints include (a) giving the ratio of group sizes, (b) specifying one sample size, (c) attaining maximum precision performance for a fixed cost, and (d) meeting a specified precision level for the least cost. The proposed methods provide useful alternatives to the conventional sample size procedures. Also, the developed programs expand the degree of generality for the existing statistical software packages and can be accessed at brm.psychonomic-journals.org/content/ supplemental.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

http://link.springer.com/content/pdf/10.3758%2Fs13428-011-0139-z.pdf

Optimal sample sizes for precise interval estimation of Welch’s procedure under various allocation and cost considerations

Gwowen Shieh 0 Show-Li Jan 0 0 S.-L. Jan Department of Applied Mathematics, Chung Yuan Christian University , Chungli, Taiwan 32023 , Republic of China 1 ) Department of Management Science, National Chiao Tung University , 1001 Ta Hsueh Road, Hsinchu, Taiwan 30050 Welch's (Biometrika 29: 350-362, 1938) procedure has emerged as a robust alternative to the Student's t test for comparing the means of two normal populations with unknown and possibly unequal variances. To facilitate the advocated statistical practice of confidence intervals and further improve the potential applicability of Welch's procedure, in the present article, we consider exact approaches to optimize sample size determinations for precise interval estimation of the difference between two means under various allocation and cost considerations. The desired precision of a confidence interval is assessed with respect to the control of expected half-width, and to the assurance probability of interval half-width within a designated value. Furthermore, the design schemes in terms of participant allocation and cost constraints include (a) giving the ratio of group sizes, (b) specifying one sample size, (c) attaining maximum precision performance for a fixed cost, and (d) meeting a specified precision level for the least cost. The proposed methods provide useful alternatives to the conventional sample size procedures. Also, the developed programs expand the degree of generality for the existing statistical software packages and can be accessed at brm.psychonomic-journals.org/ content/ supplemental. - The fundamental results and associated usages of standard parametric proceduressuch as Studentst, ANOVA F, and ordinary least squares regressionare well documented in the literature. One important assumption underlying the prescribed traditional methods is that of equal population variances. Although the homogeneity of variance formulation provides a convenient and useful setup, it is not unusual for the homoscedasticity assumption to be violated in actual applications. For example, Grissom (2000) emphasized that there are theoretical reasons to expect and empirical results to document the existence of heteroscedasticity in clinical data. Moreover, Grissom and Kim (2005, pp. 1014) provided additional explanations for the intrinsic causes of variance heterogeneity in real data. Notably, Grissom recommended employing suitable techniques that are superior to the traditional inferential methods under various conditions of heteroscedasticity. For comparing the difference between two normal means that may have unequal population variance, the scenario is the well-known BehrensFisher problem (Kim & Cohen, 1998). Accordingly, Welchs (1938) approximate t procedure has been recognized as a satisfactory and robust solution over the two-sample t of the BehrensFisher problem. The same notion was independently suggested by Smith (1936) and Satterthwaite (1946); hence, the technique is sometimes referred to as the SmithWelch Satterthwaite procedure. The method not only is covered in introductory textbooks of statistics and quantitative methods but also is available in several commonly used statistics packagesfor example, Excel, Minitab, SAS, and SPSS. However, most research in this area is concerned with the null hypothesis significance tests for detecting mean differencesfor example, Best and Rayner (1987) and Wang (1971). This dominance of hypothesis testing for making statistical inferences does not occur exclusively in the BehrensFisher problem. It more broadly reflects the longstanding and prevalent practice of significance tests in applied research across many scientific fields. As a compelling alternative, there has been a growing awareness in the use of confidence intervals instead of hypothesis tests for inference-making purposes, such as Hahn and Meeker (1991), Harlow, Mulaik, and Steiger (1997), Kline (2004), and Smithson (2003). But from both practical and scientific standpoints, it may be more informative to provide a reliable estimate of the magnitude of the examined effect, rather than simply to decide whether or not a finding is statistically significant. Accordingly, Wilkinson and the American Psychological Associations Task Force on Statistical Inference (1999) and the sixth edition of the Publication Manual of the American Psychological Association (APA, 2010) called for the greater use of confidence intervals. However, the interval estimation procedures are intrinsically stochastic in nature. From a study-planning point of view, researchers may wish to credibly address specific research questions and confirm meaningful treatment differences, so that the resulting confidence interval will meet the designated precision requirements. Hence, it is of practical interest and methodological importance to develop sample size procedures for precise interval estimation in the context of the BehrensFisher problem. To ensure precision of the resulting confidence intervals, the notion of expected half-width for sample size calculations is frequently introduced in standard texts. However, considerable attention has focused on the criterion of tolerance probability of interval half-width within a given value. For example, see Beal (1989), Kelley, Maxwell, and Rausch (2003), Kupper and Hafner (1989), and Liu (2009) for related discussion in the context of estimating the mean difference between two normal populations with homoscedasticity. The empirical illustration in Kupper and Hafner shows that it typically requires a larger sample size to meet the necessary assurance of tolerance probability than the control of a designated expected half-width. Therefore, the sample sizes computed by the expected half-width approach tend to be inadequate to guarantee the desired tolerance level of interval half-width. Consequently, the assurance probability approach is recommended over the expected width criterion for sample size determination. However, it is noteworthy that the two principles of expected width and assurance probability are closely related to the two standard criteria of unbiasedness and consistency in statistical point estimation, respectively. In other words, these two measures impose unique and distinct aspects of precision characteristics on the resulting confidence intervals, and each principle has conceptual and empirical implications in its own right. Within the framework of the BehrensFisher problem, Wang and Kupper (1997) derived a formula to compute the necessary sample size for a selected tolerance probability when the sample size ratio is given. Although the suggested sample size technique accommodates the more realistic situation of variance heterogeneity, three essential caveats of the results in Wang and Kupper should be pointed out. First, their theoretical presentations and algebraic expressions are noticeably awkward. The formulation is complicated in form, and the complexity requires intensive cumbersome evaluations. Furthermore, to our knowledge, there is no computer algorithm available for performing the necessary computation. Therefore, their result is of less practical value in application. Second, they suggest fixing the proportion of standard deviations as the allocation ratio to determine the optimal sample sizes for a designated tolerance level so that the total sample size is minimized. But the simplified algorithm employed by Wang and Kupper fails to take into account the underlying metric of integer sample sizes and often leads to suboptimal results. It is shown below in our numerical investigation that their procedure is not guaranteed to give the correct optimal sample sizes. Third, although there are mixed opinions on the effectiveness of expected width, they did not address the issue of how to perform the sample size calculations so that the expected confidence interval half-width will attain the planned precision. Thus, the results in Wang and Kupper should be clarified and extended with more transparent explications and exact computations. Note that the assurance probability for achieving a desired interval width can be further modified as a conditional probability that the confidence interval includes the true parameter. As was reported in Beal (1989), corresponding sample sizes computed with the conditional consideration are almost identical to or at most only slightly larger than those calculated with the aforementioned unconditional or tolerance probability approach. Nonetheless, our calculations also confirm that this phenomenon continues to exist in the BehrensFisher problem. Hence, the conditional criterion presented in Wang and Kupper will not be considered further in this article. In view of the potential variance heterogeneity one might encounter in applied work, the present article contributes to the applications of Welchs (1938) procedure by providing feasible sample size methodology for constructing precise confidence intervals under two distinct perspectives. One method gives the minimum sample size such that the expected confidence interval half-width is within the designated bound. The other approach provides the sample size needed to guarantee, with a given tolerance probability, that the half-width of a confidence interval will not exceed the planned value. Furthermore, conventional sample size calculations do not consider allocation schemes with participant constraints or cost implications. However, researchers have explored design strategies that take into account the impact of different constraints of the sample scheme and project funding while maintaining adequate power (Allison, Allison, Faith, Paultre, & Pi-Sunyer, 1997, and references therein). Jan and Shieh (2011) considered the problem of determining optimal sample sizes to meet a designated power for Welchs test under various allocation and cost considerations that call for independent random samples from two normal populations with possibly unequal variances. The same principles would apply for a study seeking a precise estimate of the mean difference between two treatments. It is well known that there exists a direct connection between hypothesis testing and interval estimation, although the two procedures are philosophically different in the power and precision viewpoints. Not surprisingly, the sample size required to test a hypothesis regarding the specific value of a parameter with desired power can be markedly different from the sample size needed to obtain adequate precision of interval estimation in the same study. Since there are crucial and useful tactics for study design other than the minimization of total sample size, it is prudent to present a comprehensive account of design configurations in terms of various participant and budget constraints. In this article, exact methods are presented to give proper sample sizes when either the ratio of group sizes is fixed in advance or one sample size is fixed. In addition, detailed procedures are provided to determine the optimal sample sizes to maximize the precision for a given total cost and to minimize the cost for a specified precision. Finally, corresponding SAS computer codes are developed to facilitate computations of the exact necessary sample size in actual applications. Precise interval estimation In line with the advocated practice of greater use of confidence intervals, we attempt to develop the sample size methodology under precision consideration for Welchs (1938) approximate t procedure in the context of the BehrensFisher problem. Consider independent random samples from two normal populations with the following formulations: where 1, 2, s21; s22 are unknown parameters, j = 1, , Ni, and i = 1 and 2. To detect the difference between two group means, the well-known Welchs pivotal quantity is of the form 1=m22 ; S12=N1 S22=N2 N1 N2 N1 w h e r e X 1 P X1j=N1; X 2 P X2j=N2; S12 P X1j X 12=N1 1ajn1d S22 PN2 X2j jX1 2 2=N2 1.jA1ccordj1 ingly, Welch proposed the approximate distribution for V: 1= ^v N1 where t ^v is the t distribution with degrees of freedom ^v and ^v ^v N1; N2; S12; S22 with S12=N1 S12=N1 S22=N2 S22=N 2 S12=N1 S22=N2 2 : Thus, an approximate 100(1 )% two-sided confidence interval of mean difference (1 2) is of the form (L, U), where L X 1 X 2 t^v;a=2 S12=N1 S22=N2 1=2; U X 1 X 2 t^v;a=2 S12=N1 S22=N2 1=2, and t^v;a=2 is the 100(1 /2) percentile of the t distribution t ^v with degrees of freedom ^v. For ease of presentation, the halfwidth of the 100(1 )% two-sided confidence interval is denoted by H t^v;a=2 S12=N1 S22=N2 It is clear that the actual half-width H depends on the sample sizes N1 and N2, the confidence coefficient 1 , as well as on variance estimates S12 and S2. More 2 importantly, both S12 and S22 are scaled chi-square random variables with degrees of freedom (N1 1) and (N2 1), respectively, and thus jointly determine the distributional feature of the half-width H of a confidence interval. When planning a study for ensuring that the confidence interval is narrow enough to produce meaningful findings, researchers must consider the stochastic nature of sample variances. For the purpose of advanced research design, it is desirable to determine the sample sizes required to achieve the designated precision properties of a confidence interval. Two useful principles concern the control of the expected half-width and the tolerance probability of the half-width within a preassigned value. Specifically, it is necessary to determine the required sample size such that the expected half-width of a 100(1 )% confidence interval is within the given bound where the expectation E[H] is taken with respect to the joint distribution of S12 and S22, and (> 0) is a constant. On the other hand, one may compute the sample size needed to guarantee, with a given tolerance probability, that the halfwidth of a 100(1 )% confidence interval will not exceed the planned value where (1 ) is the specified tolerance level, and (> 0) is a constant. To simplify presentation and computation, the following alternative formulation for H is derived: H t^v;a=2K G=k1=2 w h e r e k N1 N2 2; K N1 1S12=s21 N2 1 S22=s22 # 2k; G s21=N1 fB=pg s22=N2 f1 B= 1 p g ; p N1 1=k; and B N1 1S12=s21 = K BetafN1 1=2; N2 1=2g: Note that the random variables K and B are independent. Also, it can be shown that w h e r e B2 1 B1 andB1 s21=N1 fB=pg = s21=N1 fB=pg s22=N2 f1 B=1 pg . Hence, both G and ^v are functions of the random variable B. It is clear from the distinct formulations in Eqs. 2 and 5 that the underlying core distribution of H transforms from the joint distribution of two independent chi-square random variables to the joint distribution of a chi-square random variable K and a beta random variable B. The suggested transformation appears at first sight to be of not much use, but actually it greatly simplifies our analytical and computational illustrations. Note that the product form of a chisquare random variable K and other terms associated with a beta random variable B in Eq. 5 permit more transparent representations than those presented in Wang and Kupper (1997). Moreover, a beta distribution is bounded by 0 and 1, and requires less computational effort than a chi-square distribution. Therefore, the numerical computation of exact values of E[H] and P{H < } can be conducted with the evaluations of both the one-dimensional integration with respect to a beta probability distribution function, and the cumulative distribution function of a chi-square random variable. Since all related functions are readily available in major statistical packages, the exact computations can be performed with current computing capabilities. In order to permit a practical treatment of sample size planning, additional concerns are considered to accommodate the participant and cost constraints in practical situations. In the next two sections, we will synthesize the ideas of Jan and Shieh (2011) and Kupper and Hafner (1989) to develop exact procedures of precise interval estimation with four different design and budget settings under the expected width and tolerance probability considerations, respectively. All calculations are performed using programs written with SAS/IML (SAS Institute, 2008a), and they are available in the supplementary files. Expected width consideration With the distributional properties described in Eqs. 5 and 6, the assessment of expected half-width E[H] in Eq. 3 can be simplified as EH EK hK1=2i EBht^v;a=2 G1=2i=k1=2: It follows from the standard result of a chi-square distribution with degrees of freedom that EK K1=2 21=2 fk 1=2g= fk=2g. Moreover, the expectation EBht^v;a=2 G1=2i is taken with respect to the distribution of B and does not permit a closed-form expression. Although the expected width can still be numerically evaluated for all proper model configurations, it is prudent to focus on those with significant implications. To simplify the exposition, the following two allocation constraints are considered because of their potential usefulness. First, the ratio r = N2/ N1 between the two group sizes may be fixed in advance, so the goal is to find the minimum sample size N1 (N2 = rN1) required to achieve the selected precision level. Second, one of the two sample sizes, say, N2, may be determined in advance, so the smallest size N1 required to satisfy the specified precision should be determined. Sample size ratio is fixed Consider that the sample size ratio r = N2/N1 is preassigned, and without loss of generality, the ratio is assumed as r 1. Thus, for a specified precision , a simple incremental search can be conducted to find the minimum sample size N1 such that E[H] for the chosen confidence level (1 ) and error variances (s21; s22). Note that the expected half-width is asymptotically equivalent to EH za=2 s21=N1 s22=N2 1=2, where z/2 is the upper 100(/2)th percentile of the standard normal distribution. The particular result provides a convenient initial value for N1. Accordingly, it is more efficient to start the computation process with the sample size N1Z, which is the smallest integer that satisfies the inequality For demonstration, when = 0.5 and 1 = 0.95, the sample sizes N1 and N2 = rN1 are presented in Table 1 for selected values of r = 1, 2, and 3; 1 = 1/3, 1/2, 1, 2 and 3; and 2 = 1. The actual expected half-width E[H] is also listed, and the values are slightly less than the nominal value of 0.5. One sample size is fixed Assume the sample size N2 of the second group is held constant, and that it is desirable to find the proper sample size N1 to achieve the selected precision in terms of expected half-width. Just as in the previous case, the minimum sample size N1 needed to ensure confidence intervals with the specified expected half-width can be found by a simple iterative search for the chosen confidence level (1 ) and parameter values (s21; s22). In this case, the starting sample size N1Z, based on the asymptotic approximation, is the smallest integer that satisfies the inequality s21=n d=za=2 2 s22=N2o: Note that the chosen sample size N2 should not be too small because it is problematic to consider a small N2 < s22= d=za=2 2 since the initial value N1Z and resulting N1 may be negative. In addition, it should be noted the resulting N1Z and N1 values are unbounded and impractical if one considers a value of N2 s2= d=za=2 2. Accordingly, 2 Table 2 presents the computed sample size N1 and the actual expected half-width with chosen value N2 for the same settings with = 0.5, 1 = 0.95, and the five standard deviation settings of 1 and 2 in Table 1. In addition to the prescribed allocation constraints of participants, it is often sensible to consider cost and effectiveness issues when research funding is limited. Moreover, the costs of obtaining subjects may differ across the two groups. Suppose c1 and c2 are the costs per subject in the first and second groups, respectively; then, the total cost of the study is C c1N1 c2N2. Thus, the following two questions arise naturally in choosing the optimal sample sizes. First, how can the maximum precision be achieved in a study with a limited budget? Second, what is the least cost for an investigation to maintain its desired level of precision? In general, balanced group sizes do not necessarily yield the optimal solution in the aforementioned two scenarios. This assertion can be easily justified from the simplified asymptotic approximation of EH za=2 s21=N1 s22=N2 1=2, that the optimal sample size allocation ratio for the appraisals of cost and precision is where q s2c11=2= s1c12=2 . Although this identity reveals the obvious disadvantage of a naive, balanced design, it has its own weakness as a rule of thumb. It is readily seen from Eq. 7 that the exact properties of the expected halfwidth depend on the joint distribution of a chi-square random variable K and a beta random variable B. The resulting behavior of E[H] for finite sample sizes can be notably different from that of asymptotic theory. Hence, the simple guideline of Eq. 10 does not guarantee an optimal result when the sample sizes are small. Instead, the identity is employed as a benchmark in the following detailed and systematic presentation of optimal sample size allocation. Total cost is fixed and expected width needs to be minimized It can be shown under a fixed value of total cost C c1N1Z c2N2Z and N2Z/N1Z = that the resulting sample sizes are C s1c21=2 c1 s1c12=2 c2 s2c11=2 C s2c11=2 c1 s1c12=2 c2 s2c11=2 As was described previously, although this sample size combination minimizes the magnitude s21=N1 s22=N2 1=2 or asymptotic expected half-width za=2 s21=N1 s22=N2 1=2, it may be suboptimal with respect to the actual precision level E[H]. In practice, the sample sizes need to be integers, and it is unlikely that the values of N1Z and N2Z in Eq. 11 are actually whole numbers. Consequently, any sample size adjustment or rounded numbers made on N1Z and N2Z will introduce further inexactness into the optimization analysis. To find the exact solution, a detailed precision calculation and comparison is performed for the sample size combinat i o n s w i t h N 1 f r o m N 1 m i n t o N 1 m a x a n d N2 FloorfC c1N1=c2g, where N1min MaxfFloorN1Z 10; 5g, N1max CeilfC c2N2min=c1g, N2min Max fFloorN2Z 10; 5g, the function Floor(a) returns the largest integer that is less than or equal to a, and Ceil(a) returns the smallest integer that is greater than or equal to a. Note that the constants of 10 and 5 are chosen to prevent computation error and to ensure that an optimal solution is covered. Thus, the optimal sample size allocation is the one giving the maximum precision or minimum expected halfwidth. For illustration, numerical results are presented in Table 3 for (c1, c2) = (1, 1), (1, 2), and (1, 3), and fixed total cost C = 30, 40, 60, 150, and 240 in accordance with the standard deviation combinations reported in the previous two tables. The results in Table 3 reveal that the actual expected half-width for a given total cost increases considerably as the unit cost c2 increases from 1 to 3. Furthermore, the simplified allocation scheme does not yield the optimal sample sizes in several cases. For example, the optimal sample sizes are N1 = 24 and N2 = 18 for (1, 2) = (1, 1) and (c1, c2) = (1, 2), in contrast with the result of N1Z = 24.8528 and N2Z = 17.5736 computed by Eq. 11. Correspondingly, the optimal ratio N2/N1 = 18/24 = 0.7500 is slightly greater than the ratio computed with the simple formula presented in Eq. 10: = (111/2)/(121/2) = 0.7071. Target expected width is fixed and total cost needs to be minimized In this case, the large sample approximation shows that in order to ensure the nominal expected half-width d za=2 s21=N1z s22=N2z 1=2 while minimizing total cost C = c1N1Z + c2N2Z, the best sample size combination is q d=za=2 d=za=2 where is the optimal ratio defined in Eq. 10. Similar to the usage of sample sizes in Eq. 11, the computed values of N1Z and N2Z in Eq. 12 are modified to expedite a screening of sample size combinations in order to find the optimal allocation that maintains the desired expected half-width with the least cost. Specifically, the exact precision computation and cost evaluation are conducted for sample size combinations with N1, from N1min to N1max satisfying the required precision, where N1min MaxFloorN1Z 10; Ceilns12= d=za=2 2o; 6 , N1max Ceil s21=fd= za=22 s22=N2 ming 20, N2min MaxFloorN2Z 10; Ceil s22= d=za=2 2g, 6]. The constants of 6, 20, and 10 are chosen to prevent computation error and to enhance the optimal search. For each fixed value of N1, the matching sample size N2 is calculated to satisfy the required expected halfwidth. Thus, the optimal sample size allocation is the one giving the smallest cost while maintaining the specified expected half-width value. In cases in which there is more than one combination yielding the same least cost, the one producing the maximum precision is reported. Table 4 provides the corresponding optimal sample size allocation, cost, and actual expected half-width for the configurations of (c1, c2) = (1, 1), (1, 2), and (1, 3), and the five standard deviation settings of 1 and 2. It is clear that the total cost for a required precision and for fixed standard deviations increases substantially as the unit cost c2 changes from 1 to 3. The optimal allocations have the simple ratio for the three cases of (1, 2) = (1, 1), (2, 1), and (3, 1) when (c1, c2) = (1, 1). However, most of the sample size ratios are close to, but different from, the ratio . The largest discrepancy occurs with the case N2/N1 = 22/8 = 2.7500 for (c1, c2) = (1, 1) and (1, 2) = (1/3, 1), whereas the approximate ratio q 1 11=2 = 1=3 11=2 3. Tolerance probability consideration Instead of the expected half-width criterion, an useful alternative approach for sample size determination is to ensure that the actual confidence interval half-width will not exceed the planned bound with a given tolerance probability. For analytic clarity and computational ease, the probability P{H < } given in Eq. 4 is expressed as PfH < wg EB FK k=G w=t^v;a=2 where FK() is the cumulative density function of K ~ 2(). Note that the expression in Eq. 13 provides a more clear and concise exposition of the assurance probability of precision than does Eq. 14 in Wang and Kupper (1997). The formulation also expedites the subsequent computational task for various participant and cost constraints. Since there may be several possible sample sizes N1 and N2 that meet the required tolerance level, it is worthwhile to consider the same practical circumstances as in the case of expected interval half-width. Accordingly, the examinations presented here simplify and expand the existing and limited results in Wang and Kupper. Sample size ratio is fixed With the allocation ratio r N2=N1 > 1, specified width , tolerance probability (1 ), confidence coefficient (1 ), and error variances (s21; s22), a straightforward iterative process is performed to find the minimum sample size N1, such that PfH < wg 1 g. To simplify the incremental search, the initial value of N1 in the algorithm is based on Eq. 8 with = , because the optimal solutions here for large level of (1 ) are greater than those of the expected interval width approach with the same interval bound. This situation is similar to those noted in Kupper and Hafner (1989) for the traditional two-sample problem. More concrete examples are presented in Table 5 for (1 ) = 0.90 and = 0.5. For ease of comparison, the other parameter configurations of (1 ), (s21; s22) and r are identical to those in Table 1. In addition to its complex formulation, the numerical calculation of Wang and Kupper (1997) is also questionable. Specifically, for the settings of = 0.3, (1 ) = 0.95, (s21; s2) = (1, 2), and r = 1, our 2 computations yield the optimal sample sizes N1 = N2 = 139 and N1 = N2 = 149 for (1 ) = 0.80 and 0.95, respectively. The corresponding results reported in Table 1 of Wang and Kupper are N1 = N2 = 138 and N1 = N2 = 144. Note that SAS procedure PROC POWER (SAS Institute, 2008b) provides the useful feature of finding the optimal sample sizes N1 = N2 (r = 1) for the desired tolerance probability with confidence intervals of mean difference under homogeneous variances assumption. However, it does not consider the corresponding sample size calculations for the BehrensFisher problem with arbitrary sample size ratio r 1, as is illustrated here. Table 4 Computed sample sizes (N1, N2), cost, and expected half-width E[H] when the total cost needs to be minimized with target expected halfwidth = 0.5 and 1 = 0.95 One sample size is fixed A different restriction of the design setting is to find the minimum sample size, say, N1, that ensures a required tolerance probability when the other sample size, N2, is fixed in advance. With the substitution of = in Eq. 9, the resulting sample size is utilized as the starting value for the incremental search of optimal solution. The corresponding results with (1 ) = 0.90 and = 0.5 are listed in Table 6 for the same configurations of (1 ) = 0.90, (s21; s22), and N2 in Table 2. It is clear that the computed sample size N1 in Table 6 is larger than that for the same setting in Table 2. Since there is no explicit low bound of N2, it is possible that the specified N2 is too small, and the matching N1 may be unbounded. Thus, the iterative search of optimal N1 is programmed to terminate when N1 reaches the value 1,001, because the resulting sample size combination appears to be impractical or unusual. In the following section, we will turn our attention to the budget issue with varying unit cost per subject in each group. Total cost is fixed and tolerance probability needs to be maximized The notion of maximizing the tolerance level with a fixed value of total cost C c1N1 c2N2 is considered, where c1 and c2 are the known costs for each participant of the two groups. To find the best sample size allocation, the prescribed logic and algorithm under the expected width criterion is applied to the optimization of cost and tolerance probability with the substitution of precision criterion P{H < } for E[H]. With a selective set of designated total cost C = 50, 60, 80, 180, and 300, and heterogeneity levels, the optimal sample sizes are summarized in Table 7 for = 0.5, 1 = 0.95, and three unit cost settings. As was described earlier for the expected half-width consideration in Table 3, the results in Table 7 also have the same behavior, in that the actual tolerance probability for a given total cost deceases substantially as the unit cost c2 increases from 1 to 3. Therefore, researchers should be cautious about the prominent impact of heterogeneity on precision performance when the sources are limited. Target tolerance probability is fixed and total cost needs to be minimized In contrast with the previous case in which the total costs were fixed, the cost and precision assessment can be conversely performed by finding the optimal sample sizes to minimize cost when the target tolerance level is given. The utility of this procedure for the evaluation of expected half-width is extended to accommodate the precision criterion of assurance probability that the interval halfwidth is enclosed in the desirable range. To demonstrate the interrelation of the parameter configurations, numerical results are presented in Table 8 for the target tolerance probability 1 = 0.90, = 0.5, and 1 = 0.95, along with several combinations of unit costs (c1, c2) and standard deviations (1, 2). Similar to the expected width situation, the resulting total cost for fixed values of tolerance probability and standard deviations is drastically increasing as the unit cost c2 changes from 1 to 3. It is suggested in Wang and Kupper (1997, p. 735) that the optimal sample sizes ratio is N2/N1 = 2/1 for the problem of minimizing the total number of sample sizes. However, none of the optimal allocation ratios in their Table 5 agrees with this guideline. Essentially, a systematic search and detailed inspection of sample size combinations is required to find the optimal allocation that attains the desired precision while giving the least total sample size. This extra procedure and resulting merit in sample size determination is not addressed in Wang and Kupper (1997). In contrast, all of the issues are considered in our suggested procedure and the developed program. Numerical example To illustrate the usefulness and discrepancy of the proposed sample size procedures under different various situations of precision criteria and design schemes, we extend the numerical demonstration in Jan and Shieh (2011) from hypothesis testing to interval estimation for the difference of ability tests administered online and in the laboratory. Since the demographical structure of online samples can differ from that of offline samples acquired in traditional laboratory settings (Ihme, Lemke, Lieder, Martin, Muller & Schmidt, 2009), the planning parameter values are chosen as Lab = 11, Online = 10, Lab = 2.3, and Online = 2.7 to reflect the underlying treatment effect and heteroscedasticity. Moreover, online testing has the advantages of ease of obtaining a large sample and low cost. It would seem sensible that more samples could be obtained online rather than offline. The determination of actual sample sizes depends on the precision properties that the research wants to ensure for the resulting confidence intervals as well as other essential design features. First, it is intuitively reasonable to consider the expected width criterion. Suppose that the sample ratio is NOnline/NLab = 4. It follows that the sample sizes NLab = 110 and NOnline = 440 are required for the 95% confidence intervals of mean differences to have the expected interval half-width 0.5. On the other hand, if the sample size for the online sample is fixed at NOnline = 400, then it would need NLab = 115 to meet the same precision. To account for a budgetary concern where the total cost is C = 200 and the respective unit costs per subject are cLab = 1 and cOnline = 0.2, the optimal allocation of sample sizes is NLab = 132 and NOnline = 340, Table 8 Computed sample sizes (N1, N2), cost, and tolerance probability P{H < } when the total cost needs to be minimized with target tolerance probability = 0.90, = 0.5, and 1 = 0.95 Table 9 Computed sample sizes (N1, N2) for precise interval estimation under various participant and cost constraints when 1 = 2.3, 2 = 2.7, 1 = 0.95, = 0.5, 1 = 0.90, = 0.5, c1 = 1, and c2 = 0.2 1. Fixed allocation ratio: r = N2/N1 = 4 II. One sample size is fixed: N2 = 400 III. Fixed cost: C = 200 IV. Fixed target precision: = 0.5, = 0.5, and 1 = 0.90 Tolerance Probability thus producing the maximum precision within the cost constraint. Conversely, the sample size combination NLab = 125 and NOnline = 328 induces the lowest cost C = 190.6, while ensuring the expected interval half-width E[H] 0.5. The computed sample sizes and the corresponding actual values of expected interval half-width are summarized in Table 9 for ease of discussion. Alternatively, it may be necessary for the assurance level of confidence interval half-widths to be enclosed by a designated bound. Assume that the tolerance probability 1 = 0.90, and 95% confidence interval half-width = 0.5. A study with the sample ratio r NOnline=NLab 4 must have the sample sizes NLab = 125 and NOnline = 500 to meet the precision specification. When the online sample is predetermined at NOnline = 400, the computation shows that the laboratory group must at least have the sample size NLab = 134 in order to satisfy the designated precision. In the case of limited total cost C = 200, with cLab = 1 and cOnline = 0.2, the best set of sample sizes is NLab = 133 and NOnline = 335, and the resulting tolerance level is the highest for all sample sizes NLab and NOnline, with NLab 0:2NOnline 200. However, for the tolerance probability 1 = 0.90 and 95% confidence interval half-width = 0.5, the minimum cost is C = 211 for the optimal sample sizes NLab = 143 and NOnline = 340. These results and associated tolerance probabilities are also presented in Table 9. It is noteworthy that the computed sample sizes under the expected width consideration are smaller than those of the tolerance probability criterion. The only exception is the third case, with fixed total cost C = 200. Accordingly, the optimal sample sizes NLab = 132 and NOnline = 340 yield the expected half-width 0.4878, whereas the best sample size combination NLab = 133 and NOnline = 335 gives a tolerance level of merely 0.7253 < 1 = 0.90. These contrasting behaviors may be useful for researchers to justify their design strategy and financial support. The reader is referred to Ihme et al. (2009) for further details about the comparison of ability tests administered online and in the laboratory. In order to enhance the applicability of confidence intervals and the fundamental usefulness of Welchs (1938) procedure, in the present article, we present the corresponding sample size techniques under various precision principles and design schemes. The precision criteria consist of the control of the expected width and the assurance of tolerance probability of confidence intervals. The design perspective includes four different allocation constraints and cost considerations. Detailed sample size tables are provided to help researchers have a better understanding of the intrinsic relationships that exist between the optimal sample sizes and the associated model, precision, and design configurations. Since existing software packages do not accommodate sample size calculations with the same degree of generality as is illustrated in this article, computer programs are developed to facilitate the use of the suggested procedures. The proposed sample size methodology should be useful for behavioral and other areas of social sciences to plan two-group comparison studies in which variances differ across groups. Author Note The authors thank the editor, Gregory Francis, for enhancing the clarity of the articles presentation, Professor ChaoYing Joanne Peng of Indiana University, and an anonymous referee, whose suggestions extended and strengthened its content immensely.


This is a preview of a remote PDF: http://link.springer.com/content/pdf/10.3758%2Fs13428-011-0139-z.pdf

Gwowen Shieh, Show-Li Jan. Optimal sample sizes for precise interval estimation of Welch’s procedure under various allocation and cost considerations, Behavior Research Methods, 2012, 202-212, DOI: 10.3758/s13428-011-0139-z