Sample size determination for confidence intervals of interaction effects in moderated multiple regression with continuous predictor and moderator variables
GWOWEN SHIEH
0
0
National Chiao Tung University
, Hsinchu,
Taiwan
Moderated multiple regression (MMR) has been widely employed to analyze the interaction or moderating effects in behavior and related disciplines of social science. Much of the methodological literature in the context of MMR concerns statistical power and sample size calculations of hypothesis tests for detecting moderator variables. Notably, interval estimation is a distinct and more informative alternative to significance testing for inference purposes. To facilitate the practice of reporting confidence intervals in MMR analyses, the present article presents two approaches to sample size determinations for precise interval estimation of interaction effects between continuous moderator and predictor variables. One approach provides the necessary sample size so that the designated interval for the least squares estimator of moderating effects attains the specified coverage probability. The other gives the sample size required to ensure, with a given tolerance probability, that a confidence interval of moderating effects with a desired confidence coefficient will be within a specified range. Numerical examples and simulation results are presented to illustrate the usefulness and advantages of the proposed methods that account for the embedded randomness and distributional characteristic of the moderator and predictor variables.
-
In view of the widespread recognition and increased
use of moderated multiple regression (MMR) in
behavioral and related disciplines, various attempts have been
devoted to address methodological and computational
issues in the detection of interaction effects. It is evident
from the comprehensive review of Aguinis, Beaty, Boik,
and Pierce (2005) that MMR studies focus mainly on null
hypothesis significance testing for drawing conclusions
about moderating effects. This dominance of hypothesis
testing for making statistical inferences does not occur
exclusively in MMR analysis. It more broadly reflects the
longstanding and prevalent reliance of applied research on
significance tests across many scientific fields. However,
the dichotomous acceptreject decision of null
hypothesis significance testing ignores other useful information
in its analysis. As an alternative, confidence intervals are
more informative about location and precision of the
statistic, and they are the best reporting strategy according
to the recommendations of Wilkinson and the American
Psychological Association Task Force on Statistical
Inference (1999), as well as of the Publication Manual of the
American Psychological Association (APA, 2001).
Consequently, the notion of interval estimation has been stressed
repeatedly in the literature on education, psychology, and
social sciences. For example, see Algina and Olejnik
(2000), Kelly and Maxwell (2003), Smithson (2001), and
Steiger and Fouladi (1997) for in-depth discussions on
constructing confidence intervals for the squared multiple
correlation coefficient, regression coefficient, and related
parameters within the multiple regression framework.
The most common application of MMR is in the
context of simple interaction models with criterion variable Y,
predictor variable X, moderator variable Z, their cross
product term XZ, and an error term in the formulation
of Y I X X Z Z XZ XZ . The moderator Z
is essentially the second predictor variable hypothesized
to moderate the XY relationship. In the present article,
we consider the situation in which both the predictor X
and the moderator Z are continuous variables since it is
applicable to a wide range of problems encountered in
applied research. Because of the nature of continuous
measurements, it is conceivable that not only are the values of
the response variables for each participant available only
after the observations are made, but the levels of predictor
and moderator variables are also outcomes of the study.
In order to take account of this stochastic feature of
explanatory variables, the appropriate strategy is to consider
a random regression formation rather than a fixed or
conditional setting. Similar emphasis and related implications
can be found in Dunlap, Xin, and Myers (2004),
Gatsonis and Sampson (1989), Mendoza and Stafford (2001),
Shieh (2006), and Shieh and Kung (2007). In practice, the
inferential procedures of hypothesis testing and interval
estimation are the same under both fixed and random
formulations. However, the distinction between the two
modeling approaches becomes crucial when power, coverage
probability, and corresponding sample size calculations
are to be made. See Cramer and Appelbaum (1978) and
Sampson (1974) for clear and succinct presentations on
the intrinsic appropriateness and theoretical properties of
fixed and random models.
For the simple interaction model described above, in
most illustrative and theoretical treatments of MMR, it is
generally assumed that the two continuous predictor and
moderator variables have a joint bivariate normal
distribution (see, e.g., OConnor, 2006). Obviously, the
product of two normally distributed variables does not have a
normal distribution. Moreover, there are also many
situations in which the predictor and moderator variables are
continuous but the assumption of normality is completely
untenable. These results are concerned with fixed- or
multinormal-regressors settings and are thus not
applicable to the great diversity of random frameworks.
Recently, Shieh (2007) considered using a unified approach
to accommodate arbitrary distributional formulations of
the stochastic explanatory variables and demonstrated
power calculation as well as sample size determination for
hypothesis tests of coefficient parameters within the
random regression framework. The general results of Shieh
(2007) were utilized in Shieh (2009) to perform power and
sample size computations in MMR to detect interaction
effects between continuous predictor and moderator
variables, regardless of whether they follow a jointly bivariate
normal distribution. It is well known that there exists a
direct connection between hypothesis testing and interval
estimation, although the two procedures are
philosophically different in the power and precision viewpoints. In
particular, the necessary sample size required for
significance testing is a function of coefficient parameters. On
the other hand, it will be shown later that the sample size
needed for precise interval estimation is affected by the
interval width and does not depend on the magnitude of
coefficient parameters. Related discussions and examples
can be found in Algina and Olejnik (2000) and in Kelly
and Maxwell (2003). Not surprisingly, the sample size
required to test a hypothesis regarding the specific value of
a parameter with desired power can be markedly different
from the sample size needed to obtain adequate precision
of interval estimation in the same study. The planning of
sample size should be included as an integral part in the
design of MMR studies, and it is of both
methodological and practical importance to develop feasible methods
for sample size determination considering precise interval
estimation.
To elucidate the key concepts in the present article,
consider a study on the self-assurance of managers that
examines how the impact of length of time in the position on
self-assurance is moderated by managerial ability (Aiken
& West, 1991, chap. 2). A sample of managers is randomly
selected from the participating source corporation, and
various measurements for each manager are recorded. The
MMR model Y I X X Z Z XZ XZ is
specifically constructed to relate managers self-assurance (Y )
with length of time in the position (X ), managerial
ability (Z ), and their interaction. Note that both explanatory
variables (time in the position and managerial ability) are
not typically fixed in advance and that they are available
after collecting the data. Therefore, there is no problem
in regarding them as random, provided that the managers
are drawn randomly from the relevant population. Hence,
the appropriate approach is random regression modeling.
The purpose of the present investigation was to find out to
what extent the relation between self-assurance and time
in position varies with managerial ability. Essentially, it is
constructive to assess the systematic magnitude alternation
for the strength of the relationship between the managers
self-assurance and length of time in the managerial position
that results from a one-unit change in managerial ability.
In a continual effort to support analytical development
and to improve the practical use of research findings in
MMR, the present article contributes to the derivation and
evaluation of sample size methodology in two important
aspects. On one hand, it provides the necessary sample
size so that the designated interval for the least squares
estimator of moderating effects attains the specified
coverage probability. The following discussion shows that
this problem is identical to the computation of the
minimum sample size so that the prescribed confidence
interval formula of moderating effects attains a desired level
of confidence. On the other hand, the present study gives
the sample size required to ensure, with a given tolerance
probability, that a confidence interval of moderating
effects with a desired confidence coefficient will be within
a specified range. Notably, the sample size formulas of
Guenther and Thomas (1965), Hahn and Meeker (1991),
Kupper and Hafner (1989), and Nelson (1994) are
concerned exclusively with the length of confidence intervals.
Since the actual values of the resulting confidence interval
depend not only on the estimated width but also on the
realized value of the location estimator, their procedures
do not consider the stochastic nature of the point estimator
for central tendency. Nonetheless, these previous studies
focused on the interval estimation procedures in one- and
two-sample problems; hence, they did not address the
associated issues in an MMR application. Sample size tables
are provided for a variety of situations to demonstrate the
individual impact of deterministic factors and how they
pertain to the two aforementioned precision considerations
of confidence intervals. Furthermore, numerical examples
and simulation results are presented to illustrate the
usefulness and advantage of the proposed methods that account
for the embedded randomness and distributional
characteristic of the moderator and predictor variables.
Interval Estimation Procedures
of Moderated Effects
Consider the simple interaction model or MMR model
within the fixed modeling framework
where Yi is the value of the response variable Y; Xi and Zi
are the known constants of the predictor X and
moderator Z; i is independent and identically distributed N(0,
2) random errors for i 1, . . . , N; and I, X, Z, and
XZ are unknown parameters. To examine the moderator
effect, we are concerned with the distributional property
associated with the least squares estimator XZ of XZ.
According to the standard results (Rencher, 2000, Section
8.6), a 100(1 )% confidence interval of XZ is
where 2 is the usual unbiased estimator of 2; M is the
(3, 3) element of A1 , A Ni1 (Xi X)(Xi X)T; X
Ni1 Xi/N, and Xi ( Xi, Zi, XiZi)T is the 3 1 row vector
for values of predictor Xi, moderator Zi, and their cross
product XiZi for i 1, . . . , N. In addition, tN 4, 1 and
tN 4, 2 are the 100(1 1)th and 100(1 2)th
percentiles of the t distribution with N 4 degrees of freedom,
respectively, and 1 2. See Rencher (2000, chap. 7
and 8) for general treatments and further details on linear
models and their analysis. The most common practice is
to assume 1 2 /2, and this leads to the shortest
100(1 )% two-sided confidence interval for XZ:
Furthermore, the 100(1 )% one-sided lower and upper
confidence intervals can be readily obtained from
Equation 2 by setting either or , respectively, to zero, as
follows:
Here, we concentrate exclusively on the specific
circumstance that both the predictor X and the moderator Z
are continuous variables. Due to the nature of
continuous measurements encountered in practical research, the
explanatory variables typically cannot be controlled and
are available only after observation. Hence, in order to
extend the concept and applicability to MMR, the
continuous predictor and moderator variables {(Xi, Zi), i
1, . . . , N} in Equation 1 are assumed to have a joint
probability function g(Xi, Zi) with finite moments.
Moreover, the form of g(Xi, Zi) does not depend on any of the
unknown parameters ( I, X, Z, XZ) or 2. From the
investigations of Shieh (2007, 2009), it is conceivable that
the extended consideration of random features associated
with the predictor X and the moderator Z complicates the
fundamental statistical properties of the inferential
procedures. As was noted above, however, the inferential
procedures of hypothesis testing and interval estimation
are the same under both fixed and random formulations.
Hence, the two- and one-sided confidence limits given in
Equations 3 and 4 are still valid under random predictor
and moderator settings. The follow-up analyses can be
performed without any alteration or extra effort. In view
of the practical value of interval estimation, it is
important to determine the necessary sample size so that the
resulting interval estimate is not only precise enough to
identify meaningful findings but also sufficiently
accurate in achieving the desired reliability. In the following
sections, two interval estimation approaches to sample
size determination are developed.
Sample Size Methodology for Designated
Interval Estimation of Moderating Effects
When the focus is on the inferential procedure of
interval estimation, it is prudent for one to ensure that the
resulting estimate is in the neighborhood of the actual
or possible parameter value with sufficiently high
probability. In the context of MMR analysis, therefore, it is of
interest to calculate the sample size required for a
designated interval so that the least squares estimator of
moderating effects simultaneously satisfies the desired levels of
precision and probability. Ultimately, the corresponding
method for sample size determination requires
considering the sampling distribution of the least squares estimator
XZ of XZ. Analogous to the practical standpoint of Shieh
(2009) for providing a generally useful and versatile
solution without being specifically confined to any particular
joint probability function g(Xi, Zi), the large-sample
distribution of
is presented in Equation A6 of Appendix A, where is
a constant and is not necessarily equivalent to XZ. The
asymptotic property of TXZ( ) will be later employed to
implement varieties of probability calculations and
sample size determinations.
With the specified quantities of population
configurations for a moderating effect XZ, error variance 2, joint
distribution g(X, Z ) of (X, Z ), probability level 1 , and
the designated interval ( XZ bL, XZ bU) with proper
bounds bL 0 and bU 0, the smallest sample size N
needed for the interval ( XZ bL, XZ bU) of XZ with
coverage probability of at least 1 can be computed
from
Alternatively, Equation 6 can be expressed as
Therefore, the sample size problem just described is
equivalent to finding the minimum sample size N needed for the
designated confidence interval ( XZ bU, XZ bL) of XZ
to attain the desired level of confidence 1 . However,
unlike the fixed predictor and moderator setting, the
computation of P{ XZ bL XZ XZ bU} in Equation 6
is fairly complicated due to the arbitrary and stochastic
characteristics of (X, Z ). The theoretical properties of the
proposed procedure are presented in Appendix B.
In order to enhance the application of precise interval
estimation of the moderating effect, selected computations
of sample size planning for precise interval estimation of
moderating effects are performed. To improve analytical
tractability in the derivation and primary focus in
literature, the MMR model with bivariate normal predictor
and moderator variables is used as the base for numerical
exposition. Specifically, the coefficient parameters and
variance of the MMR model are set as I X Z
XZ 1 and 2 1, respectively. For the joint
distribution of the predictor and moderator, the (X, Z ) variables
are jointly normally distributed with mean (0, 0), variance
(1, 1), and correlation . The minimum sample sizes that
are needed to control the designated two-sided intervals
( XZ b, XZ b) of XZ with coverage probability of at
least .90 and .95 are presented in Table 1 for values of
ranging from 0 to .8 in increments of .2, and b 0.1,
0.125, and 0.15. Similarly, the corresponding sample size
calculations for the one-sided interval ( , XZ b) and
( XZ b, ) of XZ are listed in Table 2. Note that the
sample sizes presented in Table 2 are applicable for one-sided
intervals (, XZ b) and ( XZ b, ) of XZ under
the chosen model configurations. An inspection of both
tables reveals the expected general relations: Sample sizes
increase with an increasing level of confidence 1 , and
they increase with decreasing value of bound b when all
other factors are fixed. Moreover, the sample size reported
in Table 1 for a two-sided confidence interval is greater
than the corresponding value of a one-sided confidence
interval in Table 2 for fixed values of , b, and 1 .
Sample Size Methodology for Confidence
Intervals of Moderating Effects With
Specified Ranges and Tolerances
It is well known that confidence intervals are superior
to hypothesis tests not only in that they reveal what
parameter values would be rejected if they were used in a null
hypothesis, but in that the determined value of the point
estimate and the width of the interval also give ideas of the
inherent location and precision of the estimation.
However, the interval estimation procedures are intrinsically
stochastic in nature. From a study-planning point of view,
researchers may wish to obtain meaningful research
findings so that the resulting confidence interval will meet
the prespecified assurance and precision requirements.
The corresponding approach to determining the required
sample size is presented next.
With the specified quantities of population
configurations for moderating effect XZ, error variance 2, joint
distribution g(X, Z ) of (X, Z ), tolerance probability 1 ,
and the prescribed range ( XZ wL, XZ wU) with
proper bounds wL 0 and wU 0, the minimum sample
size N required to ensure that the 100(1 )% two-sided
confidence interval given in Equation 3 is within the range
of ( XZ wL, XZ wU) with a tolerance probability of at
least 1 can be determined by
XZ XZ
This procedure is complex since it must consider
the stochastic nature of the confidence limits XZ
tN 4, /2{ 2M}1/2 and XZ tN 4, /2{ 2M}1/2 within the
unconditional framework that the predictor and
moderator are random variables. The corresponding analytical
presentation is summarized in Appendix C.
As additional illustrations, we continue to exemplify
the sample size procedures for the preceding MMR
model with bivariate normal predictor and moderator
variables. In this case, Table 3 presents the minimum
sample sizes required to ensure that the 95% two-sided
conf idence interval ( XZ tN 4,.025{ 2M}1/2, XZ
tN 4,.025{ 2M}1/2) is within the range of ( XZ w,
XZ w) with a tolerance probability of at least .90
and .95 for values of ranging from 0 to .8 in
increments of .2, and w 0.2, 0.225, and 0.25. In addition,
Table 4 shows the corresponding sample sizes that
ensure that the 95% one-sided confidence intervals (,
XZ tN 4,.05{ 2M}) and ( XZ tN 4,.05{ 2M}, ) are
within the ranges of (, XZ w) and ( XZ w, ),
respectively, with tolerance probabilities of at least .90
Table 3
Minimum Sample Sizes Required to Ensure That the 95%
Two-Sided Confidence Interval ( XZ tN 4,.025{ 2M}1/2, XZ
tN 4,.025{ 2M}1/2) Is Within the Range of ( XZ w, XZ w)
With a Tolerance Probability of at Least .90 and .95 for
Bivariate Normal Predictor and Moderator Variables
( 2 1, X Z 0, X2 Z2 1)
and .95. As in the numerical evaluations associated with
Table 2, the sample sizes given in Table 4 are
applicable to both cases of one-sided intervals because of the
special feature of the noncentral t distribution. It can be
seen from Tables 3 and 4 that required sample sizes
increase with an increasing level of tolerance probability
1 , and with a decreasing value of bound w when
all other factors are fixed. As before, the sample size
reported in Table 3 for the two-sided confidence interval
is greater than the corresponding value of the one-sided
confidence interval reported in Table 4 for fixed values
of , w, and 1 . Furthermore, although the results are
not completely comparable, the sample sizes in Tables 3
and 4 are larger than those in Tables 1 and 2.
Numerical Examples
The following numerical assessment represents a
typical research situation frequently encountered in the
planning stage of a study in order to assess interaction effects
in the context of MMR. The ultimate aim is to demonstrate
the sample size calculations for precise interval estimation
of moderating effects based on a pilot sample and to show
the potential consequence of failing to account for the
underlying stochastic property of the explanatory variables.
As a continued exposition of the illustration of Aiken
and West (1991), it is important to remember that the aim
of their numerical study was to determine whether the
relationship between the self-assurance of managers (Y ) and
the length of time in the position (X ) changes as a
function of managerial ability (Z ). To facilitate the following
illustration in the context of MMR research, suppose that
there are 60 pairs of observations for predictor variable X
and moderator variable Z obtained from a pilot study. The
values of (X, Z ) that are presented in Table 5 represent
random samples generated from a bivariate normal
population with X Z 0, X2 Z2 1, and correlation
.4. In view of the continuous characteristics of
measurements X and Z, it is clear that the sample values in the
subsequent study vary from one application to another.
However, the observed configurations from the pilot study
can be employed as an empirical approximation for the
underlying joint distribution of X and Z. Moreover, it is
shown next that the suggested approach and a simplified
method utilize the empirical features that are associated
with the predictor and moderator variables in distinctive
ways and, thus, the two formulas lead to substantially
different results in sample size calculations and in accuracy
in achieving satisfactory levels of precision for interval
estimation.
We follow the analysis results in Aiken and West (1991,
p. 10) that the parameter estimates of the MMR model are
chosen as I 2.54, X 1.14, Z 3.58, XZ 2.58,
and 2 1. On the basis of the 60 observed
configurations of pilot data in Table 5 with Xi ( Xi, Zi, XiZi)T and
the empirical probability 1/60 for i 1, . . . , 60, the
estimated moment matrices for the quantities in Equation A4
can be obtained by
Thus, the approximate normal distribution of W* in
EquatmioanteAd5vahraisanthcee esW2ti*mate2d2.m65e1an1. IWn*plan1n.i2n3g48a
arensdeaersctihstudy according to the present information, the minimum
sample sizes needed to control the designated two-sided
interval ( XZ b, XZ b) (2.58 0.15, 2.58 0.15)
Table 5
Observed Values of Predictor Variable X and Moderator Variable Z of the Pilot Study
Z X Z X Z X Z
0.7970 0.3581 0.1677 0.4875 0.0481 0.2312 2.6297
0.4406 1.7096 0.0614 1.3712 0.2643 0.1967 0.6026
0.2503 1.2201 1.0737 0.3063 0.4640 0.7609 0.1105
0.7871 1.9457 0.4328 1.2158 0.8524 1.3095 0.1378
0.7407 0.0119 0.4386 1.1241 0.5519 2.0270 0.3233
0.5837 0.1606 0.2365 1.3135 1.5577 1.4949 0.7624
2.2212 0.1174 1.1017 0.1751 0.1340 0.5943 0.3610
0.9145 0.2718 1.0854 0.2313 0.3495 0.2982 0.2510
0.6172 0.8000 0.2615 0.4457 0.9176 1.3263 0.1808
0.5732 1.2381 0.1725 2.8890 1.2777 1.2771 1.4634
0.3238 0.8302 1.1981 0.3750 0.2207 0.8958 0.4195
0.5248 0.6407 0.6331 0.7223 1.2787 1.6284 0.5142
0.8816 0.3646 0.9514 0.8073 1.2787 0.4745 1.2441
1.3834 2.9043 2.2853 0.9276 1.5124 0.7966 0.5477
0.1387 0.1980 0.1679 0.5019 0.4255 0.5386 0.9979
(2.43, 2.73) of XZ with a desired coverage probability can
be determined by the approximate coverage probability
function defined in Equation B2. The resulting sample
sizes are 74, 116, and 162 for coverage probabilities of
.80, .90, and .95, respectively. On the other hand, the
researcher may presume that the identical empirical
structure of predictor and moderator variables in the pilot data
will continue to occur in the investigation. Therefore, the
inference of moderating effects can be conducted with the
simplified or conditional distribution of TXZ() in
Equation A1. With the fixed modeling formulation of
Equation A3, the minimum sample sizes needed to control the
designated two-sided interval ( XZ b, XZ b) (2.43,
2.73) of XZ, with coverage probability of at least .80, .90,
and .95, are 60, 98, and 139, respectively. These sample
sizes are smaller than those reported earlier, according to
the more involved normal mixture of noncentral t
distributions in Equation B1. The sizable discrepancy between
these two procedures indicates the need to assess their
adequacy for interval estimation in achieving the nominal
coverage probability.
Furthermore, so that the resulting confidence interval
of a desired confidence coefficient will fall into a
scientifically credible range with a specified level of tolerance
probability, the numerical study is extended to illustrate
the advantage of the suggested procedure and the
deficiency of the alternative simplified method for sample
size calculations. For the MMR model with the
bivariate normal predictor and moderator variables examined
above, the minimum sample sizes required for the
suggested formula in Equation C2 to ensure that the 95%
two-sided confidence interval ( XZ tN 4,.025{ 2M}1/2,
XZ tN 4,.025{ 2M}1/2) is within the range of ( XZ w,
XZ w) (2.58 0.225, 2.58 0.225) (2.355,
2.805), with tolerance probabilities of at least .80, .90,
and .95, are 192, 239, and 285, respectively. Accordingly,
the minimum sample sizes required for the conditional
formulation in Equation C4 to ensure that the 95%
twosided confidence interval ( XZ tN 4,.025{ 2M}1/2, XZ
tN 4,.025{ 2M}1/2) is within the range of ( XZ wL, XZ
wU) (2.355, 2.805), with tolerance probabilities of at
least .80, .90, and .95, are 169, 208, and 246, respectively.
Obviously, the calculated sample sizes of the two
procedures differ considerably for the setting considered
presently. The differences between the two approaches are
further examined in the following simulation study. The
SAS/IML (SAS Institute, 2008) programs employed to
perform the sample size calculations of the proposed
approaches are presented in Appendixes D and E.
Simulation Study
In order to compare the performance and to reinforce
the fundamental distinction of the two competing
approaches, further simulation studies are conducted. For
demonstration, the MMR model with the bivariate normal
predictor and moderator variables described above is
exploited as the basis for a Monte Carlo examination. The
numerical study is conducted in two steps. First, under
the selected values of coefficient parameters, error
variance, and distribution configurations of bivariate
predictor and moderator distribution, the approximate coverage
probabilities of the two methods are calculated with the
reported sample size of the proposed approach. The
corresponding results are presented in Table 6. It follows that
the approximate coverage probabilities of .8033, .9005,
and .9507 for the proposed method are almost identical to
the desired values of .80, .90, and .95 for sample sizes 74,
116, and 162, respectively, whereas the computed
coverage probabilities of .8484, .9274, and .9661 associated
with the simplified method are somewhat greater than the
desired values of .80, .90, and .95, respectively.
In the second step, the sample size N calculated by the
proposed approach is utilized as a benchmark to assess
the simulated coverage probability. Estimates of the true
coverage probability associated with given sample size
and parameter configurations are then computed through
a Monte Carlo simulation of 10,000 independent data sets.
For each replicate, N sets of predictor and moderator
values are generated from the designated bivariate normal
distribution. These values of predictor and moderator, in
turn, determine the mean responses for generating N
normal outcomes with the MMR model. Next, the estimate
XZ is computed, and the simulated coverage probability
is the proportion of the 10,000 replicates whose values
of XZ fall between 2.43 and 2.73. The adequacy of the
examined procedure for coverage probability and sample
size calculation is determined by the formula error
simulated coverage probability approximate coverage
probability, comparing the simulated coverage probability
and approximate coverage probability that were computed
earlier. All of the calculations are performed using
programs written with SAS/IML (SAS Institute, 2008). The
simulated coverage probability and error for the proposed
and simplified methods are also summarized in Table 6.
As seen from the results, the performance of the proposed
method appears to be remarkably good for the range of
model specifications considered in the present article. In
contrast, the simplified method yielded much larger errors
and, in particular, the error is as large as 0.0501 for the
sample size of 74 with coverage probability around .80.
Comparatively, these errors associated with the simplified
method may be too large to be satisfactory.
As in the previous case, we first evaluate the
approximated tolerance probabilities with sample sizes of 192,
239, and 285 for the two distinct procedures, and the
resulting values are summarized in Table 7. Then, the
simulated tolerance probabilities for the prescribed parameter
setting and sample size are computed with the proportion
of 10,000 replicates of 95% two-sided confidence
intervals ( XZ tN 4,.025{ 2M}1/2, XZ tN 4,.025{ 2M}1/2)
that are within the range of (2.355, 2.805). The differences
between the simulated tolerance probability and
approximate tolerance probability or error simulated tolerance
probability approximate tolerance probability are also
presented in Table 7. Clearly, the errors of the simplified
method are substantially larger than those associated with
the suggested approach. Hence, it can be concluded that
the sample sizes calculated with the conditional formula
in Equation C4 are too small to ensure sufficient tolerance
probability, and the phenomenon shall continue to exist in
other settings of random explanatory variables.
Conclusions
Due to the prevalence of MMR applications in various
disciplines of social sciences, it seems prudent to ensure
and to extend the understanding of fundamental
properties of related inference procedures. When assessing the
extent of interaction effects between continuous
predictor and moderator variables, the underlying stochastic
configurations of the predictor and moderator vary from
one research study to another and inevitably necessitate
random modeling instead of the commonly used fixed or
conditional model setting. It is important that the
corresponding theoretical implications of power and precision
appraisal be well understood when MMR analyses are
adopted by researchers. As presented above, random MMR
modeling is comparatively more complex so that more
research is needed before it can be accepted in place of the
commonly used fixed linear regression model. The present
article aimed to demonstrate the technical development of
precise interval estimation and related sample size
methodology with sufficient clarity so that MMR practitioners
can perceive the applicability and usefulness of the
information. Specifically, the proposed approach fully
accommodates the arbitrary distributional formulations of the
stochastic explanatory variables. The differences and
impacts of failing to account for the randomness of predictor
and moderator variables are elucidated through rigorous
analytical presentations and numerical assessments. It is
shown that the existing fixed modeling formulation may
distort the precision analysis and lead to a poor choice of
sample sizes. More importantly, although the suggested
general procedures for sample size determinations are
derived from large-sample theory, the simulation study
demonstrates their accuracy in achieving desired levels of
coverage and tolerance for interval estimation over a wide
range of model settings. The generality and accuracy of
the proposed methodology not only facilitate the echoed
statistical practice of confidence intervals but also further
fortify the potential applicability of MMR analysis.
Accordingly, the results provide the basis for probing related
considerations in more complicated situations, such as
the three-way interactions discussed in Aiken and West
(1991) and Dawson and Richter (2006).
The author thanks the editor, Gregory Francis, and the two anonymous
reviewers for their valuable comments on earlier drafts of the article.
This research was partially supported by National Science Council Grant
NSC-97-2410-H-009-011-MY2. Correspondence concerning this article
should be addressed to G. Shieh, Department of Management Science,
National Chiao Tung University, 1001 Ta Hsueh Road, Hsinchu, Taiwan
30050 (e-mail: ).
APPENDIX A
The Distribution of TXZ
It follows from the standard assumption in Equation 1 under a fixed modeling framework that the variable
has a noncentral t distribution t(N 4, ) with N 4 degrees of freedom and noncentrality parameter , where
2 is the usual unbiased estimator of 2; M is the (3, 3) element of A1 , A Ni1 (Xi X)(Xi X)T, X
Ni1 Xi/N, and Xi ( Xi, Zi, XiZi)T is the 3 1 row vector for values of predictor Xi, moderator Zi, and their
cross product XiZi for i 1, . . . , N; is a constant; and the noncentrality parameter ( XZ )/{ 2M}1/2.
Accordingly, a particular formulation can be obtained by substituting with XZ into TXZ as follows:
and TXZ( XZ) is distributed as t(N 4)a t distribution with N 4 degrees of freedom. Note that TXZ( XZ) in
Equation A2 provides a useful tool for conducting statistical inferences of hypothesis testing and interval
estimation about the magnitude of moderating effect XZ. In this case, the coverage probability for a designated interval
( XZ bL, XZ bU) of XZ can be computed using the simple expression of
where bL 0, bU 0, U bU/{ 2M}1/2, and L bL/{ 2M}1/2.
Instead of a mere fixed or conditional formulation, we focus on the particular random regression situation in
which both the predictor X and the moderator Z are continuous random variables within the context of MMR.
Specifically, the continuous predictor and moderator variables {(Xi, Zi), i 1, . . . , N} are assumed to have a joint
probability function g(Xi, Zi) with finite moments. Moreover, the form of g(Xi, Zi) does not depend on any of the
unknown parameters ( I, X, Z, XZ) and 2. The moments of the explanatory vectors Xi (Xi, Zi, XiZi)T are
defined as
APPENDIX A (Continued)
where E[] denotes the expectation taken with respect to the joint probability density function g(Xi, Zi) of (Xi, Zi),
and represents the Kronecker product. According to the formulations of A and M presented in Equation A1
for TXZ, both A and M are functions of random variables (Xi, Zi), i 1, . . . , N, within the random regression
framework and, therefore, TXZ has a noncentral t distribution with random noncentrality . It follows from Shieh
(2009) that W * 1/{(N 1)M} has an asymptotic normal distribution:
where W* 1/(c T 1 c), W2* W4*{(cT 1 cT 1 )( 1 c 1 c) W2*}/(N 1), c (0, 0, 1)T is a 3
row vector, and and are defined in Equation A4. Therefore, the distribution of TXZ() under the random
regression setting can be well approximated by the following two-stage distribution:
TXZ() | W * ~ t{N 4, (
*/ 2]1/2} and W *
The approximate distribution of TXZ( ) is particularly useful to evaluate the cumulative probability function for XZ
in terms of FXZ(c) P{ XZ c}, where c is a constant. It can be readily shown from the definition of TXZ that
Accordingly, the cumulative distribution function FXZ(c) can be approximated by
P{TXZ(c) 0}.
EW*[P(t{N 4, [(N
where the expectation EW*[] is taken with respect to the approximate normal distribution of W * presented in
Equation A5.
APPENDIX B
Sample Size Calculations for Designated Interval Estimation of Moderating Effects
It follows from the definition of TXZ( ) given in Equation 5 and the associated asymptotic approximation
of cumulative distribution function FXZ of XZ presented in Equation A7 that the probability P{ XZ bL
XZ XZ bU} in Equation 6 can be approximated by
EW*[P{t(N 4,
EW*[P{t(N 4,
where U bU{(N 1)W */ 2}1/2, L bL{(N 1)W */ 2}1/2, and the expectation EW*[] is taken with respect
to the approximate normal distribution of W * presented in Equation A5. Hence, the suggested computation of the
smallest sample size N needed for the prescribed interval ( XZ bL, XZ bU) of XZ with coverage probability
of at least 1 is performed with the approximate coverage probability formula
EW*[P{t(N 4,
It should be noted that numerical computations of the expected value in Equation B2 require the evaluation of
a noncentral t cumulative distribution function and the one-dimensional integration with respect to a normal
distribution. This procedure is not as simple as using a z or t table, but it is not unreasonable in light of modern
computing capabilities. Moreover, two important aspects of the proposed procedure should be pointed out.
First, both probability functions P{t(N 4, U) 0} and P{t (N 4, L) 0} do not involve the regression
coefficient XZ, which corresponds to the extent of the moderating effect. However, there is a direct functional
relation between the magnitudes of cumulative probability and the bounds bL and bU. Second, the mean values
of the predictor, moderator, and their product are not included in the asymptotic distribution of W * defined in
Equation A5. Hence, the mean vector (first moments) associated with the joint distribution of explanatory
variables does not have any influence on the resulting probability levels and required sample sizes.
In a similar fashion, the corresponding sample size calculations for the prescribed lower and upper one-sided
intervals in the form of (, XZ bU) and ( XZ bL, ) for XZ with coverage probability of at least 1
can be conducted with the modified or approximate probability functions in terms of a normal mixture of a
noncentral t cumulative distribution function given by
EW*[P{t(N 4,
EW*[P{t(N 4,
respectively, where U, L, and W * are given above in Equation B1. It can be readily shown from Equation B3,
with bU bL b and U L b{(N 1)W */ 2}1/2, that
EW*[P{t(N 4, )
EW*[P{t(N 4, )
APPENDIX C
Sample Size Calculations for Confidence Intervals of Moderating Effects
With Specified Ranges and Tolerances
According to the asymptotic results for XZ presented in Appendix A, we propose to consider the following
alternative formula for computing the probability described in Equation 7:
EW*[P{t(N 4,
where U wU{(N 1)W */ 2}1/2, L wL{(N 1)W */ 2}1/2, and the expectation EW*[] is taken with
respect to the approximate normal distribution of W * presented in Equation A5. Thus, the minimum sample
size N required to ensure that the 100(1 )% two-sided confidence interval ( XZ tN4, /2{ 2M}1/2, XZ
tN4, /2{ 2M}1/2) is within the range of ( XZ wL, XZ wU) with a tolerance probability of at least 1 can
be determined by
EW*[P{t(N 4,
Moreover, the sample size calculations for the lower and upper one-sided confidence intervals in the form of
(, XZ tN4, { 2M}1/2) and ( XZ tN4, { 2M}1/2, ) that fall within the ranges of (, XZ wU) and
( XZ wL, ) with a tolerance probability of at least 1 can be performed with
EW*[P{t(N 4,
EW*[P{t(N 4,
respectively, where
wU wL w and
U and L are given above for Equation C1. It can be readily shown from Equation C3, with
L w{(N 1)W */ 2}1/2, that
EW*[P{t(N 4, )
EW*[P{t(N 4, )
In contrast, the tolerance probability with respect to the conditional distribution in Equation A1 is
XZ
U)
wU/{ 2M}1/2 and L
APPENDIX D
SAS Program to Perform Sample Size Calculations for Designated
Interval Estimation of Moderating Effects
APPENDIX E
SAS Program to Perform Sample Size Calculations for Confidence Intervals
of Moderating Effects With Specified Ranges and Tolerances