Optimal sample sizes for precise interval estimation of Welch’s procedure under various allocation and cost considerations
Gwowen Shieh
0
ShowLi Jan
0
0
S.L. Jan Department of Applied Mathematics, Chung Yuan Christian University
, Chungli,
Taiwan 32023
,
Republic of China
1
) Department of Management Science, National Chiao Tung University
, 1001 Ta Hsueh Road, Hsinchu,
Taiwan 30050
Welch's (Biometrika 29: 350362, 1938) procedure has emerged as a robust alternative to the Student's t test for comparing the means of two normal populations with unknown and possibly unequal variances. To facilitate the advocated statistical practice of confidence intervals and further improve the potential applicability of Welch's procedure, in the present article, we consider exact approaches to optimize sample size determinations for precise interval estimation of the difference between two means under various allocation and cost considerations. The desired precision of a confidence interval is assessed with respect to the control of expected halfwidth, and to the assurance probability of interval halfwidth within a designated value. Furthermore, the design schemes in terms of participant allocation and cost constraints include (a) giving the ratio of group sizes, (b) specifying one sample size, (c) attaining maximum precision performance for a fixed cost, and (d) meeting a specified precision level for the least cost. The proposed methods provide useful alternatives to the conventional sample size procedures. Also, the developed programs expand the degree of generality for the existing statistical software packages and can be accessed at brm.psychonomicjournals.org/ content/ supplemental.

The fundamental results and associated usages of standard
parametric proceduressuch as Studentst, ANOVA F, and
ordinary least squares regressionare well documented in
the literature. One important assumption underlying the
prescribed traditional methods is that of equal population
variances. Although the homogeneity of variance
formulation provides a convenient and useful setup, it is not
unusual for the homoscedasticity assumption to be violated
in actual applications. For example, Grissom (2000)
emphasized that there are theoretical reasons to expect
and empirical results to document the existence of
heteroscedasticity in clinical data. Moreover, Grissom and Kim
(2005, pp. 1014) provided additional explanations for the
intrinsic causes of variance heterogeneity in real data.
Notably, Grissom recommended employing suitable
techniques that are superior to the traditional inferential methods
under various conditions of heteroscedasticity.
For comparing the difference between two normal means
that may have unequal population variance, the scenario is
the wellknown BehrensFisher problem (Kim & Cohen,
1998). Accordingly, Welchs (1938) approximate t
procedure has been recognized as a satisfactory and robust
solution over the twosample t of the BehrensFisher
problem. The same notion was independently suggested
by Smith (1936) and Satterthwaite (1946); hence, the
technique is sometimes referred to as the SmithWelch
Satterthwaite procedure. The method not only is covered in
introductory textbooks of statistics and quantitative
methods but also is available in several commonly used statistics
packagesfor example, Excel, Minitab, SAS, and SPSS.
However, most research in this area is concerned with the
null hypothesis significance tests for detecting mean
differencesfor example, Best and Rayner (1987) and Wang
(1971). This dominance of hypothesis testing for making
statistical inferences does not occur exclusively in the
BehrensFisher problem. It more broadly reflects the
longstanding and prevalent practice of significance tests in
applied research across many scientific fields. As a
compelling alternative, there has been a growing awareness
in the use of confidence intervals instead of hypothesis tests
for inferencemaking purposes, such as Hahn and Meeker
(1991), Harlow, Mulaik, and Steiger (1997), Kline (2004),
and Smithson (2003). But from both practical and scientific
standpoints, it may be more informative to provide a
reliable estimate of the magnitude of the examined effect,
rather than simply to decide whether or not a finding is
statistically significant. Accordingly, Wilkinson and the
American Psychological Associations Task Force on
Statistical Inference (1999) and the sixth edition of the
Publication Manual of the American Psychological
Association (APA, 2010) called for the greater use of confidence
intervals. However, the interval estimation procedures are
intrinsically stochastic in nature. From a studyplanning
point of view, researchers may wish to credibly address
specific research questions and confirm meaningful
treatment differences, so that the resulting confidence interval
will meet the designated precision requirements. Hence, it
is of practical interest and methodological importance to
develop sample size procedures for precise interval
estimation in the context of the BehrensFisher problem.
To ensure precision of the resulting confidence intervals,
the notion of expected halfwidth for sample size
calculations is frequently introduced in standard texts. However,
considerable attention has focused on the criterion of
tolerance probability of interval halfwidth within a given
value. For example, see Beal (1989), Kelley, Maxwell, and
Rausch (2003), Kupper and Hafner (1989), and Liu (2009)
for related discussion in the context of estimating the mean
difference between two normal populations with
homoscedasticity. The empirical illustration in Kupper and Hafner
shows that it typically requires a larger sample size to meet
the necessary assurance of tolerance probability than the
control of a designated expected halfwidth. Therefore, the
sample sizes computed by the expected halfwidth
approach tend to be inadequate to guarantee the desired
tolerance level of interval halfwidth. Consequently, the
assurance probability approach is recommended over the
expected width criterion for sample size determination.
However, it is noteworthy that the two principles of
expected width and assurance probability are closely related
to the two standard criteria of unbiasedness and consistency
in statistical point estimation, respectively. In other words,
these two measures impose unique and distinct aspects of
precision characteristics on the resulting confidence
intervals, and each principle has conceptual and empirical
implications in its own right.
Within the framework of the BehrensFisher problem,
Wang and Kupper (1997) derived a formula to compute the
necessary sample size for a selected tolerance probability
when the sample size ratio is given. Although the suggested
sample size technique accommodates the more realistic
situation of variance heterogeneity, three essential caveats
of the results in Wang and Kupper should be pointed out.
First, their theoretical presentations and algebraic
expressions are noticeably awkward. The formulation is
complicated in form, and the complexity requires intensive
cumbersome evaluations. Furthermore, to our knowledge,
there is no computer algorithm available for performing the
necessary computation. Therefore, their result is of less
practical value in application. Second, they suggest fixing
the proportion of standard deviations as the allocation ratio
to determine the optimal sample sizes for a designated
tolerance level so that the total sample size is minimized.
But the simplified algorithm employed by Wang and
Kupper fails to take into account the underlying metric of
integer sample sizes and often leads to suboptimal results. It
is shown below in our numerical investigation that their
procedure is not guaranteed to give the correct optimal
sample sizes. Third, although there are mixed opinions on
the effectiveness of expected width, they did not address the
issue of how to perform the sample size calculations so that
the expected confidence interval halfwidth will attain the
planned precision. Thus, the results in Wang and Kupper
should be clarified and extended with more transparent
explications and exact computations. Note that the
assurance probability for achieving a desired interval width can
be further modified as a conditional probability that the
confidence interval includes the true parameter. As was
reported in Beal (1989), corresponding sample sizes
computed with the conditional consideration are almost
identical to or at most only slightly larger than those
calculated with the aforementioned unconditional or
tolerance probability approach. Nonetheless, our calculations
also confirm that this phenomenon continues to exist in the
BehrensFisher problem. Hence, the conditional criterion
presented in Wang and Kupper will not be considered
further in this article.
In view of the potential variance heterogeneity one might
encounter in applied work, the present article contributes to
the applications of Welchs (1938) procedure by providing
feasible sample size methodology for constructing precise
confidence intervals under two distinct perspectives. One
method gives the minimum sample size such that the
expected confidence interval halfwidth is within the
designated bound. The other approach provides the sample
size needed to guarantee, with a given tolerance probability,
that the halfwidth of a confidence interval will not exceed
the planned value. Furthermore, conventional sample size
calculations do not consider allocation schemes with
participant constraints or cost implications. However,
researchers have explored design strategies that take into
account the impact of different constraints of the sample
scheme and project funding while maintaining adequate
power (Allison, Allison, Faith, Paultre, & PiSunyer, 1997,
and references therein). Jan and Shieh (2011) considered
the problem of determining optimal sample sizes to meet a
designated power for Welchs test under various allocation
and cost considerations that call for independent random
samples from two normal populations with possibly
unequal variances. The same principles would apply for
a study seeking a precise estimate of the mean difference
between two treatments. It is well known that there exists
a direct connection between hypothesis testing and
interval estimation, although the two procedures are
philosophically different in the power and precision
viewpoints. Not surprisingly, the sample size required to
test a hypothesis regarding the specific value of a
parameter with desired power can be markedly different
from the sample size needed to obtain adequate precision
of interval estimation in the same study. Since there are
crucial and useful tactics for study design other than the
minimization of total sample size, it is prudent to present a
comprehensive account of design configurations in terms
of various participant and budget constraints. In this
article, exact methods are presented to give proper sample
sizes when either the ratio of group sizes is fixed in
advance or one sample size is fixed. In addition, detailed
procedures are provided to determine the optimal sample
sizes to maximize the precision for a given total cost and
to minimize the cost for a specified precision. Finally,
corresponding SAS computer codes are developed to
facilitate computations of the exact necessary sample size
in actual applications.
Precise interval estimation
In line with the advocated practice of greater use of
confidence intervals, we attempt to develop the sample
size methodology under precision consideration for Welchs
(1938) approximate t procedure in the context of the
BehrensFisher problem. Consider independent random
samples from two normal populations with the following
formulations:
where 1, 2, s21; s22 are unknown parameters, j = 1, , Ni,
and i = 1 and 2. To detect the difference between two group
means, the wellknown Welchs pivotal quantity is of the
form
1=m22 ;
S12=N1 S22=N2
N1 N2 N1
w h e r e X 1 P X1j=N1; X 2 P X2j=N2; S12 P X1j
X 12=N1 1ajn1d S22 PN2 X2j jX1 2 2=N2
1.jA1ccordj1
ingly, Welch proposed the approximate distribution for V:
1= ^v N1
where t ^v is the t distribution with degrees of freedom ^v
and ^v ^v N1; N2; S12; S22 with
S12=N1
S12=N1 S22=N2
S22=N 2
S12=N1 S22=N2
2
:
Thus, an approximate 100(1 )% twosided
confidence interval of mean difference (1 2) is of the form
(L, U), where L X 1 X 2 t^v;a=2 S12=N1 S22=N2 1=2;
U X 1 X 2 t^v;a=2 S12=N1 S22=N2 1=2, and t^v;a=2 is
the 100(1 /2) percentile of the t distribution t ^v with
degrees of freedom ^v. For ease of presentation, the
halfwidth of the 100(1 )% twosided confidence interval is
denoted by
H t^v;a=2 S12=N1 S22=N2
It is clear that the actual halfwidth H depends on the
sample sizes N1 and N2, the confidence coefficient 1 ,
as well as on variance estimates S12 and S2. More
2
importantly, both S12 and S22 are scaled chisquare random
variables with degrees of freedom (N1 1) and (N2 1),
respectively, and thus jointly determine the distributional
feature of the halfwidth H of a confidence interval. When
planning a study for ensuring that the confidence interval
is narrow enough to produce meaningful findings,
researchers must consider the stochastic nature of sample
variances.
For the purpose of advanced research design, it is
desirable to determine the sample sizes required to achieve
the designated precision properties of a confidence interval.
Two useful principles concern the control of the expected
halfwidth and the tolerance probability of the halfwidth
within a preassigned value. Specifically, it is necessary to
determine the required sample size such that the expected
halfwidth of a 100(1 )% confidence interval is within
the given bound
where the expectation E[H] is taken with respect to the joint
distribution of S12 and S22, and (> 0) is a constant. On the
other hand, one may compute the sample size needed to
guarantee, with a given tolerance probability, that the
halfwidth of a 100(1 )% confidence interval will not exceed
the planned value
where (1 ) is the specified tolerance level, and (> 0) is
a constant.
To simplify presentation and computation, the following
alternative formulation for H is derived:
H t^v;a=2K G=k1=2
w h e r e k N1 N2 2; K N1 1S12=s21 N2 1
S22=s22 # 2k; G s21=N1 fB=pg s22=N2 f1 B=
1 p g ; p N1 1=k; and B N1 1S12=s21 =
K BetafN1 1=2; N2 1=2g: Note that the random
variables K and B are independent. Also, it can be shown that
w h e r e B2 1 B1 andB1 s21=N1 fB=pg = s21=N1
fB=pg s22=N2 f1 B=1 pg . Hence, both G and
^v are functions of the random variable B.
It is clear from the distinct formulations in Eqs. 2 and 5 that
the underlying core distribution of H transforms from the
joint distribution of two independent chisquare random
variables to the joint distribution of a chisquare random
variable K and a beta random variable B. The suggested
transformation appears at first sight to be of not much use,
but actually it greatly simplifies our analytical and
computational illustrations. Note that the product form of a
chisquare random variable K and other terms associated with a
beta random variable B in Eq. 5 permit more transparent
representations than those presented in Wang and Kupper
(1997). Moreover, a beta distribution is bounded by 0 and 1,
and requires less computational effort than a chisquare
distribution. Therefore, the numerical computation of exact
values of E[H] and P{H < } can be conducted with the
evaluations of both the onedimensional integration with
respect to a beta probability distribution function, and the
cumulative distribution function of a chisquare random
variable. Since all related functions are readily available in
major statistical packages, the exact computations can be
performed with current computing capabilities.
In order to permit a practical treatment of sample size
planning, additional concerns are considered to
accommodate the participant and cost constraints in practical
situations. In the next two sections, we will synthesize the
ideas of Jan and Shieh (2011) and Kupper and Hafner
(1989) to develop exact procedures of precise interval
estimation with four different design and budget settings
under the expected width and tolerance probability
considerations, respectively. All calculations are performed using
programs written with SAS/IML (SAS Institute, 2008a),
and they are available in the supplementary files.
Expected width consideration
With the distributional properties described in Eqs. 5 and 6,
the assessment of expected halfwidth E[H] in Eq. 3 can be
simplified as
EH EK hK1=2i EBht^v;a=2
G1=2i=k1=2:
It follows from the standard result of a chisquare
distribution with degrees of freedom that EK K1=2
21=2 fk 1=2g= fk=2g. Moreover, the expectation
EBht^v;a=2 G1=2i is taken with respect to the distribution of
B and does not permit a closedform expression. Although
the expected width can still be numerically evaluated for all
proper model configurations, it is prudent to focus on those
with significant implications. To simplify the exposition,
the following two allocation constraints are considered
because of their potential usefulness. First, the ratio r = N2/
N1 between the two group sizes may be fixed in advance,
so the goal is to find the minimum sample size N1 (N2 =
rN1) required to achieve the selected precision level.
Second, one of the two sample sizes, say, N2, may be
determined in advance, so the smallest size N1 required to
satisfy the specified precision should be determined.
Sample size ratio is fixed
Consider that the sample size ratio r = N2/N1 is
preassigned, and without loss of generality, the ratio is
assumed as r 1. Thus, for a specified precision , a
simple incremental search can be conducted to find the
minimum sample size N1 such that E[H] for the chosen
confidence level (1 ) and error variances (s21; s22). Note
that the expected halfwidth is asymptotically equivalent
to EH za=2 s21=N1 s22=N2 1=2, where z/2 is the upper
100(/2)th percentile of the standard normal distribution.
The particular result provides a convenient initial value for
N1. Accordingly, it is more efficient to start the
computation process with the sample size N1Z, which is the
smallest integer that satisfies the inequality
For demonstration, when = 0.5 and 1 = 0.95, the
sample sizes N1 and N2 = rN1 are presented in Table 1 for
selected values of r = 1, 2, and 3; 1 = 1/3, 1/2, 1, 2 and 3;
and 2 = 1. The actual expected halfwidth E[H] is also
listed, and the values are slightly less than the nominal
value of 0.5.
One sample size is fixed
Assume the sample size N2 of the second group is held
constant, and that it is desirable to find the proper sample
size N1 to achieve the selected precision in terms of
expected halfwidth. Just as in the previous case, the
minimum sample size N1 needed to ensure confidence
intervals with the specified expected halfwidth can be
found by a simple iterative search for the chosen confidence
level (1 ) and parameter values (s21; s22). In this case, the
starting sample size N1Z, based on the asymptotic
approximation, is the smallest integer that satisfies the inequality
s21=n d=za=2 2
s22=N2o:
Note that the chosen sample size N2 should not be too small
because it is problematic to consider a small N2 <
s22= d=za=2 2 since the initial value N1Z and resulting N1
may be negative. In addition, it should be noted the
resulting N1Z and N1 values are unbounded and impractical
if one considers a value of N2 s2= d=za=2 2. Accordingly,
2
Table 2 presents the computed sample size N1 and the actual
expected halfwidth with chosen value N2 for the same
settings with = 0.5, 1 = 0.95, and the five standard
deviation settings of 1 and 2 in Table 1.
In addition to the prescribed allocation constraints of
participants, it is often sensible to consider cost and
effectiveness issues when research funding is limited.
Moreover, the costs of obtaining subjects may differ across
the two groups. Suppose c1 and c2 are the costs per subject
in the first and second groups, respectively; then, the total
cost of the study is C c1N1 c2N2. Thus, the following
two questions arise naturally in choosing the optimal
sample sizes. First, how can the maximum precision be
achieved in a study with a limited budget? Second,
what is the least cost for an investigation to maintain its
desired level of precision? In general, balanced group
sizes do not necessarily yield the optimal solution in the
aforementioned two scenarios. This assertion can be
easily justified from the simplified asymptotic
approximation of EH za=2 s21=N1 s22=N2 1=2, that the
optimal sample size allocation ratio for the appraisals of cost
and precision is
where q s2c11=2= s1c12=2 . Although this identity reveals
the obvious disadvantage of a naive, balanced design, it
has its own weakness as a rule of thumb. It is readily seen
from Eq. 7 that the exact properties of the expected
halfwidth depend on the joint distribution of a chisquare
random variable K and a beta random variable B. The
resulting behavior of E[H] for finite sample sizes can be
notably different from that of asymptotic theory. Hence,
the simple guideline of Eq. 10 does not guarantee an
optimal result when the sample sizes are small. Instead,
the identity is employed as a benchmark in the following
detailed and systematic presentation of optimal sample
size allocation.
Total cost is fixed and expected width needs
to be minimized
It can be shown under a fixed value of total cost C
c1N1Z c2N2Z and N2Z/N1Z = that the resulting sample
sizes are
C s1c21=2
c1 s1c12=2
c2 s2c11=2
C s2c11=2
c1 s1c12=2
c2 s2c11=2
As was described previously, although this sample size
combination minimizes the magnitude s21=N1 s22=N2 1=2
or asymptotic expected halfwidth za=2 s21=N1 s22=N2 1=2,
it may be suboptimal with respect to the actual precision
level E[H]. In practice, the sample sizes need to be integers,
and it is unlikely that the values of N1Z and N2Z in Eq. 11 are
actually whole numbers. Consequently, any sample size
adjustment or rounded numbers made on N1Z and N2Z will
introduce further inexactness into the optimization analysis.
To find the exact solution, a detailed precision calculation
and comparison is performed for the sample size
combinat i o n s w i t h N 1 f r o m N 1 m i n t o N 1 m a x a n d N2
FloorfC c1N1=c2g, where N1min MaxfFloorN1Z
10; 5g, N1max CeilfC c2N2min=c1g, N2min Max
fFloorN2Z 10; 5g, the function Floor(a) returns the
largest integer that is less than or equal to a, and Ceil(a)
returns the smallest integer that is greater than or equal to a.
Note that the constants of 10 and 5 are chosen to prevent
computation error and to ensure that an optimal solution is
covered. Thus, the optimal sample size allocation is the one
giving the maximum precision or minimum expected
halfwidth. For illustration, numerical results are presented in
Table 3 for (c1, c2) = (1, 1), (1, 2), and (1, 3), and fixed total
cost C = 30, 40, 60, 150, and 240 in accordance with the
standard deviation combinations reported in the previous two
tables. The results in Table 3 reveal that the actual expected
halfwidth for a given total cost increases considerably as the
unit cost c2 increases from 1 to 3. Furthermore, the
simplified allocation scheme does not yield the optimal
sample sizes in several cases. For example, the optimal
sample sizes are N1 = 24 and N2 = 18 for (1, 2) = (1, 1)
and (c1, c2) = (1, 2), in contrast with the result of N1Z =
24.8528 and N2Z = 17.5736 computed by Eq. 11.
Correspondingly, the optimal ratio N2/N1 = 18/24 = 0.7500 is
slightly greater than the ratio computed with the simple
formula presented in Eq. 10: = (111/2)/(121/2) = 0.7071.
Target expected width is fixed and total cost needs
to be minimized
In this case, the large sample approximation shows that in
order to ensure the nominal expected halfwidth
d za=2 s21=N1z s22=N2z 1=2 while minimizing total cost
C = c1N1Z + c2N2Z, the best sample size combination is
q d=za=2
d=za=2
where is the optimal ratio defined in Eq. 10. Similar to
the usage of sample sizes in Eq. 11, the computed values of
N1Z and N2Z in Eq. 12 are modified to expedite a screening
of sample size combinations in order to find the optimal
allocation that maintains the desired expected halfwidth
with the least cost. Specifically, the exact precision
computation and cost evaluation are conducted for sample
size combinations with N1, from N1min to N1max satisfying
the required precision, where N1min MaxFloorN1Z
10; Ceilns12= d=za=2 2o; 6 , N1max Ceil s21=fd= za=22
s22=N2 ming 20, N2min MaxFloorN2Z 10; Ceil s22=
d=za=2 2g, 6]. The constants of 6, 20, and 10 are chosen to
prevent computation error and to enhance the optimal
search. For each fixed value of N1, the matching sample
size N2 is calculated to satisfy the required expected
halfwidth. Thus, the optimal sample size allocation is the one
giving the smallest cost while maintaining the specified
expected halfwidth value. In cases in which there is more
than one combination yielding the same least cost, the one
producing the maximum precision is reported. Table 4
provides the corresponding optimal sample size allocation,
cost, and actual expected halfwidth for the configurations
of (c1, c2) = (1, 1), (1, 2), and (1, 3), and the five standard
deviation settings of 1 and 2. It is clear that the total cost
for a required precision and for fixed standard deviations
increases substantially as the unit cost c2 changes from 1 to
3. The optimal allocations have the simple ratio for the
three cases of (1, 2) = (1, 1), (2, 1), and (3, 1) when (c1,
c2) = (1, 1). However, most of the sample size ratios are
close to, but different from, the ratio . The largest
discrepancy occurs with the case N2/N1 = 22/8 = 2.7500
for (c1, c2) = (1, 1) and (1, 2) = (1/3, 1), whereas the
approximate ratio q 1 11=2 = 1=3 11=2 3.
Tolerance probability consideration
Instead of the expected halfwidth criterion, an useful
alternative approach for sample size determination is to
ensure that the actual confidence interval halfwidth will
not exceed the planned bound with a given tolerance
probability. For analytic clarity and computational ease, the
probability P{H < } given in Eq. 4 is expressed as
PfH < wg EB FK k=G w=t^v;a=2
where FK() is the cumulative density function of K ~ 2().
Note that the expression in Eq. 13 provides a more clear
and concise exposition of the assurance probability of
precision than does Eq. 14 in Wang and Kupper (1997). The
formulation also expedites the subsequent computational
task for various participant and cost constraints. Since there
may be several possible sample sizes N1 and N2 that meet
the required tolerance level, it is worthwhile to consider the
same practical circumstances as in the case of expected
interval halfwidth. Accordingly, the examinations
presented here simplify and expand the existing and limited
results in Wang and Kupper.
Sample size ratio is fixed
With the allocation ratio r N2=N1 > 1, specified width ,
tolerance probability (1 ), confidence coefficient (1 ),
and error variances (s21; s22), a straightforward iterative
process is performed to find the minimum sample size N1,
such that PfH < wg 1 g. To simplify the incremental
search, the initial value of N1 in the algorithm is based on
Eq. 8 with = , because the optimal solutions here for
large level of (1 ) are greater than those of the expected
interval width approach with the same interval bound. This
situation is similar to those noted in Kupper and Hafner
(1989) for the traditional twosample problem. More
concrete examples are presented in Table 5 for (1 ) =
0.90 and = 0.5. For ease of comparison, the other
parameter configurations of (1 ), (s21; s22) and r are
identical to those in Table 1. In addition to its complex
formulation, the numerical calculation of Wang and Kupper
(1997) is also questionable. Specifically, for the settings of
= 0.3, (1 ) = 0.95, (s21; s2) = (1, 2), and r = 1, our
2
computations yield the optimal sample sizes N1 = N2 = 139
and N1 = N2 = 149 for (1 ) = 0.80 and 0.95, respectively.
The corresponding results reported in Table 1 of Wang and
Kupper are N1 = N2 = 138 and N1 = N2 = 144. Note that SAS
procedure PROC POWER (SAS Institute, 2008b) provides
the useful feature of finding the optimal sample sizes N1 = N2
(r = 1) for the desired tolerance probability with confidence
intervals of mean difference under homogeneous variances
assumption. However, it does not consider the corresponding
sample size calculations for the BehrensFisher problem
with arbitrary sample size ratio r 1, as is illustrated here.
Table 4 Computed sample sizes (N1, N2), cost, and expected halfwidth E[H] when the total cost needs to be minimized with target expected
halfwidth = 0.5 and 1 = 0.95
One sample size is fixed
A different restriction of the design setting is to find the
minimum sample size, say, N1, that ensures a required
tolerance probability when the other sample size, N2, is
fixed in advance. With the substitution of = in Eq. 9,
the resulting sample size is utilized as the starting value for
the incremental search of optimal solution. The
corresponding results with (1 ) = 0.90 and = 0.5
are listed in Table 6 for the same configurations of (1 ) =
0.90, (s21; s22), and N2 in Table 2. It is clear that the
computed sample size N1 in Table 6 is larger than that for
the same setting in Table 2. Since there is no explicit low
bound of N2, it is possible that the specified N2 is too small,
and the matching N1 may be unbounded. Thus, the iterative
search of optimal N1 is programmed to terminate when N1
reaches the value 1,001, because the resulting sample size
combination appears to be impractical or unusual.
In the following section, we will turn our attention to the
budget issue with varying unit cost per subject in each
group.
Total cost is fixed and tolerance probability needs
to be maximized
The notion of maximizing the tolerance level with a
fixed value of total cost C c1N1 c2N2 is considered,
where c1 and c2 are the known costs for each participant of
the two groups. To find the best sample size allocation, the
prescribed logic and algorithm under the expected width
criterion is applied to the optimization of cost and
tolerance probability with the substitution of precision
criterion P{H < } for E[H]. With a selective set of
designated total cost C = 50, 60, 80, 180, and 300, and
heterogeneity levels, the optimal sample sizes are
summarized in Table 7 for = 0.5, 1 = 0.95, and three unit
cost settings. As was described earlier for the expected
halfwidth consideration in Table 3, the results in Table 7
also have the same behavior, in that the actual tolerance
probability for a given total cost deceases substantially as
the unit cost c2 increases from 1 to 3. Therefore,
researchers should be cautious about the prominent impact
of heterogeneity on precision performance when the
sources are limited.
Target tolerance probability is fixed and total cost needs
to be minimized
In contrast with the previous case in which the total costs
were fixed, the cost and precision assessment can be
conversely performed by finding the optimal sample sizes
to minimize cost when the target tolerance level is given.
The utility of this procedure for the evaluation of expected
halfwidth is extended to accommodate the precision
criterion of assurance probability that the interval
halfwidth is enclosed in the desirable range. To demonstrate the
interrelation of the parameter configurations, numerical
results are presented in Table 8 for the target tolerance
probability 1 = 0.90, = 0.5, and 1 = 0.95, along
with several combinations of unit costs (c1, c2) and standard
deviations (1, 2). Similar to the expected width situation,
the resulting total cost for fixed values of tolerance
probability and standard deviations is drastically increasing
as the unit cost c2 changes from 1 to 3. It is suggested in
Wang and Kupper (1997, p. 735) that the optimal sample
sizes ratio is N2/N1 = 2/1 for the problem of minimizing
the total number of sample sizes. However, none of the
optimal allocation ratios in their Table 5 agrees with this
guideline. Essentially, a systematic search and detailed
inspection of sample size combinations is required to find
the optimal allocation that attains the desired precision
while giving the least total sample size. This extra
procedure and resulting merit in sample size determination
is not addressed in Wang and Kupper (1997). In contrast, all
of the issues are considered in our suggested procedure and
the developed program.
Numerical example
To illustrate the usefulness and discrepancy of the proposed
sample size procedures under different various situations of
precision criteria and design schemes, we extend the
numerical demonstration in Jan and Shieh (2011) from
hypothesis testing to interval estimation for the difference
of ability tests administered online and in the laboratory.
Since the demographical structure of online samples can
differ from that of offline samples acquired in traditional
laboratory settings (Ihme, Lemke, Lieder, Martin, Muller &
Schmidt, 2009), the planning parameter values are chosen
as Lab = 11, Online = 10, Lab = 2.3, and Online = 2.7 to
reflect the underlying treatment effect and
heteroscedasticity. Moreover, online testing has the advantages of ease of
obtaining a large sample and low cost. It would seem
sensible that more samples could be obtained online rather
than offline. The determination of actual sample sizes
depends on the precision properties that the research wants
to ensure for the resulting confidence intervals as well as
other essential design features. First, it is intuitively
reasonable to consider the expected width criterion.
Suppose that the sample ratio is NOnline/NLab = 4. It follows
that the sample sizes NLab = 110 and NOnline = 440 are
required for the 95% confidence intervals of mean
differences to have the expected interval halfwidth 0.5. On
the other hand, if the sample size for the online sample is
fixed at NOnline = 400, then it would need NLab = 115 to meet
the same precision. To account for a budgetary concern
where the total cost is C = 200 and the respective unit costs
per subject are cLab = 1 and cOnline = 0.2, the optimal
allocation of sample sizes is NLab = 132 and NOnline = 340,
Table 8 Computed sample sizes (N1, N2), cost, and tolerance probability P{H < } when the total cost needs to be minimized with target
tolerance probability = 0.90, = 0.5, and 1 = 0.95
Table 9 Computed sample sizes (N1, N2) for precise interval estimation under various participant and cost constraints when 1 = 2.3, 2 = 2.7, 1
= 0.95, = 0.5, 1 = 0.90, = 0.5, c1 = 1, and c2 = 0.2
1. Fixed allocation ratio: r = N2/N1 = 4
II. One sample size is fixed: N2 = 400
III. Fixed cost: C = 200
IV. Fixed target precision: = 0.5, = 0.5, and 1 = 0.90
Tolerance Probability
thus producing the maximum precision within the cost
constraint. Conversely, the sample size combination NLab =
125 and NOnline = 328 induces the lowest cost C = 190.6,
while ensuring the expected interval halfwidth E[H] 0.5.
The computed sample sizes and the corresponding actual
values of expected interval halfwidth are summarized in
Table 9 for ease of discussion.
Alternatively, it may be necessary for the assurance level of
confidence interval halfwidths to be enclosed by a designated
bound. Assume that the tolerance probability 1 = 0.90, and
95% confidence interval halfwidth = 0.5. A study with the
sample ratio r NOnline=NLab 4 must have the sample
sizes NLab = 125 and NOnline = 500 to meet the precision
specification. When the online sample is predetermined at
NOnline = 400, the computation shows that the laboratory
group must at least have the sample size NLab = 134 in order
to satisfy the designated precision. In the case of limited total
cost C = 200, with cLab = 1 and cOnline = 0.2, the best set of
sample sizes is NLab = 133 and NOnline = 335, and the
resulting tolerance level is the highest for all sample sizes
NLab and NOnline, with NLab 0:2NOnline 200. However,
for the tolerance probability 1 = 0.90 and 95%
confidence interval halfwidth = 0.5, the minimum cost
is C = 211 for the optimal sample sizes NLab = 143 and
NOnline = 340. These results and associated tolerance
probabilities are also presented in Table 9. It is noteworthy
that the computed sample sizes under the expected width
consideration are smaller than those of the tolerance
probability criterion. The only exception is the third case,
with fixed total cost C = 200. Accordingly, the optimal
sample sizes NLab = 132 and NOnline = 340 yield the expected
halfwidth 0.4878, whereas the best sample size combination
NLab = 133 and NOnline = 335 gives a tolerance level of
merely 0.7253 < 1 = 0.90. These contrasting behaviors
may be useful for researchers to justify their design strategy
and financial support. The reader is referred to Ihme et al.
(2009) for further details about the comparison of ability
tests administered online and in the laboratory.
In order to enhance the applicability of confidence intervals
and the fundamental usefulness of Welchs (1938)
procedure, in the present article, we present the corresponding
sample size techniques under various precision principles
and design schemes. The precision criteria consist of the
control of the expected width and the assurance of tolerance
probability of confidence intervals. The design perspective
includes four different allocation constraints and cost
considerations. Detailed sample size tables are provided to
help researchers have a better understanding of the intrinsic
relationships that exist between the optimal sample sizes
and the associated model, precision, and design
configurations. Since existing software packages do not
accommodate sample size calculations with the same degree of
generality as is illustrated in this article, computer programs
are developed to facilitate the use of the suggested
procedures. The proposed sample size methodology should
be useful for behavioral and other areas of social sciences to
plan twogroup comparison studies in which variances
differ across groups.
Author Note The authors thank the editor, Gregory Francis, for
enhancing the clarity of the articles presentation, Professor
ChaoYing Joanne Peng of Indiana University, and an anonymous
referee, whose suggestions extended and strengthened its content
immensely.