Sample size determination for confidence intervals of interaction effects in moderated multiple regression with continuous predictor and moderator variables

Behavior Research Methods, Aug 2010

Moderated multiple regression (MMR) has been widely employed to analyze the interaction or moderating effects in behavior and related disciplines of social science. Much of the methodological literature in the context of MMR concerns statistical power and sample size calculations of hypothesis tests for detecting moderator variables. Notably, interval estimation is a distinct and more informative alternative to significance testing for inference purposes. To facilitate the practice of reporting confidence intervals in MMR analyses, the present article presents two approaches to sample size determinations for precise interval estimation of interaction effects between continuous moderator and predictor variables. One approach provides the necessary sample size so that the designated interval for the least squares estimator of moderating effects attains the specified coverage probability. The other gives the sample size required to ensure, with a given tolerance probability, that a confidence interval of moderating effects with a desired confidence coefficient will be within a specified range. Numerical examples and simulation results are presented to illustrate the usefulness and advantages of the proposed methods that account for the embedded randomness and distributional characteristic of the moderator and predictor variables.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

http://link.springer.com/content/pdf/10.3758%2FBRM.42.3.824.pdf

Sample size determination for confidence intervals of interaction effects in moderated multiple regression with continuous predictor and moderator variables

GWOWEN SHIEH 0 0 National Chiao Tung University , Hsinchu, Taiwan Moderated multiple regression (MMR) has been widely employed to analyze the interaction or moderating effects in behavior and related disciplines of social science. Much of the methodological literature in the context of MMR concerns statistical power and sample size calculations of hypothesis tests for detecting moderator variables. Notably, interval estimation is a distinct and more informative alternative to significance testing for inference purposes. To facilitate the practice of reporting confidence intervals in MMR analyses, the present article presents two approaches to sample size determinations for precise interval estimation of interaction effects between continuous moderator and predictor variables. One approach provides the necessary sample size so that the designated interval for the least squares estimator of moderating effects attains the specified coverage probability. The other gives the sample size required to ensure, with a given tolerance probability, that a confidence interval of moderating effects with a desired confidence coefficient will be within a specified range. Numerical examples and simulation results are presented to illustrate the usefulness and advantages of the proposed methods that account for the embedded randomness and distributional characteristic of the moderator and predictor variables. - In view of the widespread recognition and increased use of moderated multiple regression (MMR) in behavioral and related disciplines, various attempts have been devoted to address methodological and computational issues in the detection of interaction effects. It is evident from the comprehensive review of Aguinis, Beaty, Boik, and Pierce (2005) that MMR studies focus mainly on null hypothesis significance testing for drawing conclusions about moderating effects. This dominance of hypothesis testing for making statistical inferences does not occur exclusively in MMR analysis. It more broadly reflects the longstanding and prevalent reliance of applied research on significance tests across many scientific fields. However, the dichotomous acceptreject decision of null hypothesis significance testing ignores other useful information in its analysis. As an alternative, confidence intervals are more informative about location and precision of the statistic, and they are the best reporting strategy according to the recommendations of Wilkinson and the American Psychological Association Task Force on Statistical Inference (1999), as well as of the Publication Manual of the American Psychological Association (APA, 2001). Consequently, the notion of interval estimation has been stressed repeatedly in the literature on education, psychology, and social sciences. For example, see Algina and Olejnik (2000), Kelly and Maxwell (2003), Smithson (2001), and Steiger and Fouladi (1997) for in-depth discussions on constructing confidence intervals for the squared multiple correlation coefficient, regression coefficient, and related parameters within the multiple regression framework. The most common application of MMR is in the context of simple interaction models with criterion variable Y, predictor variable X, moderator variable Z, their cross product term XZ, and an error term in the formulation of Y I X X Z Z XZ XZ . The moderator Z is essentially the second predictor variable hypothesized to moderate the XY relationship. In the present article, we consider the situation in which both the predictor X and the moderator Z are continuous variables since it is applicable to a wide range of problems encountered in applied research. Because of the nature of continuous measurements, it is conceivable that not only are the values of the response variables for each participant available only after the observations are made, but the levels of predictor and moderator variables are also outcomes of the study. In order to take account of this stochastic feature of explanatory variables, the appropriate strategy is to consider a random regression formation rather than a fixed or conditional setting. Similar emphasis and related implications can be found in Dunlap, Xin, and Myers (2004), Gatsonis and Sampson (1989), Mendoza and Stafford (2001), Shieh (2006), and Shieh and Kung (2007). In practice, the inferential procedures of hypothesis testing and interval estimation are the same under both fixed and random formulations. However, the distinction between the two modeling approaches becomes crucial when power, coverage probability, and corresponding sample size calculations are to be made. See Cramer and Appelbaum (1978) and Sampson (1974) for clear and succinct presentations on the intrinsic appropriateness and theoretical properties of fixed and random models. For the simple interaction model described above, in most illustrative and theoretical treatments of MMR, it is generally assumed that the two continuous predictor and moderator variables have a joint bivariate normal distribution (see, e.g., OConnor, 2006). Obviously, the product of two normally distributed variables does not have a normal distribution. Moreover, there are also many situations in which the predictor and moderator variables are continuous but the assumption of normality is completely untenable. These results are concerned with fixed- or multinormal-regressors settings and are thus not applicable to the great diversity of random frameworks. Recently, Shieh (2007) considered using a unified approach to accommodate arbitrary distributional formulations of the stochastic explanatory variables and demonstrated power calculation as well as sample size determination for hypothesis tests of coefficient parameters within the random regression framework. The general results of Shieh (2007) were utilized in Shieh (2009) to perform power and sample size computations in MMR to detect interaction effects between continuous predictor and moderator variables, regardless of whether they follow a jointly bivariate normal distribution. It is well known that there exists a direct connection between hypothesis testing and interval estimation, although the two procedures are philosophically different in the power and precision viewpoints. In particular, the necessary sample size required for significance testing is a function of coefficient parameters. On the other hand, it will be shown later that the sample size needed for precise interval estimation is affected by the interval width and does not depend on the magnitude of coefficient parameters. Related discussions and examples can be found in Algina and Olejnik (2000) and in Kelly and Maxwell (2003). Not surprisingly, the sample size required to test a hypothesis regarding the specific value of a parameter with desired power can be markedly different from the sample size needed to obtain adequate precision of interval estimation in the same study. The planning of sample size should be included as an integral part in the design of MMR studies, and it is of both methodological and practical importance to develop feasible methods for sample size determination considering precise interval estimation. To elucidate the key concepts in the present article, consider a study on the self-assurance of managers that examines how the impact of length of time in the position on self-assurance is moderated by managerial ability (Aiken & West, 1991, chap. 2). A sample of managers is randomly selected from the participating source corporation, and various measurements for each manager are recorded. The MMR model Y I X X Z Z XZ XZ is specifically constructed to relate managers self-assurance (Y ) with length of time in the position (X ), managerial ability (Z ), and their interaction. Note that both explanatory variables (time in the position and managerial ability) are not typically fixed in advance and that they are available after collecting the data. Therefore, there is no problem in regarding them as random, provided that the managers are drawn randomly from the relevant population. Hence, the appropriate approach is random regression modeling. The purpose of the present investigation was to find out to what extent the relation between self-assurance and time in position varies with managerial ability. Essentially, it is constructive to assess the systematic magnitude alternation for the strength of the relationship between the managers self-assurance and length of time in the managerial position that results from a one-unit change in managerial ability. In a continual effort to support analytical development and to improve the practical use of research findings in MMR, the present article contributes to the derivation and evaluation of sample size methodology in two important aspects. On one hand, it provides the necessary sample size so that the designated interval for the least squares estimator of moderating effects attains the specified coverage probability. The following discussion shows that this problem is identical to the computation of the minimum sample size so that the prescribed confidence interval formula of moderating effects attains a desired level of confidence. On the other hand, the present study gives the sample size required to ensure, with a given tolerance probability, that a confidence interval of moderating effects with a desired confidence coefficient will be within a specified range. Notably, the sample size formulas of Guenther and Thomas (1965), Hahn and Meeker (1991), Kupper and Hafner (1989), and Nelson (1994) are concerned exclusively with the length of confidence intervals. Since the actual values of the resulting confidence interval depend not only on the estimated width but also on the realized value of the location estimator, their procedures do not consider the stochastic nature of the point estimator for central tendency. Nonetheless, these previous studies focused on the interval estimation procedures in one- and two-sample problems; hence, they did not address the associated issues in an MMR application. Sample size tables are provided for a variety of situations to demonstrate the individual impact of deterministic factors and how they pertain to the two aforementioned precision considerations of confidence intervals. Furthermore, numerical examples and simulation results are presented to illustrate the usefulness and advantage of the proposed methods that account for the embedded randomness and distributional characteristic of the moderator and predictor variables. Interval Estimation Procedures of Moderated Effects Consider the simple interaction model or MMR model within the fixed modeling framework where Yi is the value of the response variable Y; Xi and Zi are the known constants of the predictor X and moderator Z; i is independent and identically distributed N(0, 2) random errors for i 1, . . . , N; and I, X, Z, and XZ are unknown parameters. To examine the moderator effect, we are concerned with the distributional property associated with the least squares estimator XZ of XZ. According to the standard results (Rencher, 2000, Section 8.6), a 100(1 )% confidence interval of XZ is where 2 is the usual unbiased estimator of 2; M is the (3, 3) element of A1 , A Ni1 (Xi X)(Xi X)T; X Ni1 Xi/N, and Xi ( Xi, Zi, XiZi)T is the 3 1 row vector for values of predictor Xi, moderator Zi, and their cross product XiZi for i 1, . . . , N. In addition, tN 4, 1 and tN 4, 2 are the 100(1 1)th and 100(1 2)th percentiles of the t distribution with N 4 degrees of freedom, respectively, and 1 2. See Rencher (2000, chap. 7 and 8) for general treatments and further details on linear models and their analysis. The most common practice is to assume 1 2 /2, and this leads to the shortest 100(1 )% two-sided confidence interval for XZ: Furthermore, the 100(1 )% one-sided lower and upper confidence intervals can be readily obtained from Equation 2 by setting either or , respectively, to zero, as follows: Here, we concentrate exclusively on the specific circumstance that both the predictor X and the moderator Z are continuous variables. Due to the nature of continuous measurements encountered in practical research, the explanatory variables typically cannot be controlled and are available only after observation. Hence, in order to extend the concept and applicability to MMR, the continuous predictor and moderator variables {(Xi, Zi), i 1, . . . , N} in Equation 1 are assumed to have a joint probability function g(Xi, Zi) with finite moments. Moreover, the form of g(Xi, Zi) does not depend on any of the unknown parameters ( I, X, Z, XZ) or 2. From the investigations of Shieh (2007, 2009), it is conceivable that the extended consideration of random features associated with the predictor X and the moderator Z complicates the fundamental statistical properties of the inferential procedures. As was noted above, however, the inferential procedures of hypothesis testing and interval estimation are the same under both fixed and random formulations. Hence, the two- and one-sided confidence limits given in Equations 3 and 4 are still valid under random predictor and moderator settings. The follow-up analyses can be performed without any alteration or extra effort. In view of the practical value of interval estimation, it is important to determine the necessary sample size so that the resulting interval estimate is not only precise enough to identify meaningful findings but also sufficiently accurate in achieving the desired reliability. In the following sections, two interval estimation approaches to sample size determination are developed. Sample Size Methodology for Designated Interval Estimation of Moderating Effects When the focus is on the inferential procedure of interval estimation, it is prudent for one to ensure that the resulting estimate is in the neighborhood of the actual or possible parameter value with sufficiently high probability. In the context of MMR analysis, therefore, it is of interest to calculate the sample size required for a designated interval so that the least squares estimator of moderating effects simultaneously satisfies the desired levels of precision and probability. Ultimately, the corresponding method for sample size determination requires considering the sampling distribution of the least squares estimator XZ of XZ. Analogous to the practical standpoint of Shieh (2009) for providing a generally useful and versatile solution without being specifically confined to any particular joint probability function g(Xi, Zi), the large-sample distribution of is presented in Equation A6 of Appendix A, where is a constant and is not necessarily equivalent to XZ. The asymptotic property of TXZ( ) will be later employed to implement varieties of probability calculations and sample size determinations. With the specified quantities of population configurations for a moderating effect XZ, error variance 2, joint distribution g(X, Z ) of (X, Z ), probability level 1 , and the designated interval ( XZ bL, XZ bU) with proper bounds bL 0 and bU 0, the smallest sample size N needed for the interval ( XZ bL, XZ bU) of XZ with coverage probability of at least 1 can be computed from Alternatively, Equation 6 can be expressed as Therefore, the sample size problem just described is equivalent to finding the minimum sample size N needed for the designated confidence interval ( XZ bU, XZ bL) of XZ to attain the desired level of confidence 1 . However, unlike the fixed predictor and moderator setting, the computation of P{ XZ bL XZ XZ bU} in Equation 6 is fairly complicated due to the arbitrary and stochastic characteristics of (X, Z ). The theoretical properties of the proposed procedure are presented in Appendix B. In order to enhance the application of precise interval estimation of the moderating effect, selected computations of sample size planning for precise interval estimation of moderating effects are performed. To improve analytical tractability in the derivation and primary focus in literature, the MMR model with bivariate normal predictor and moderator variables is used as the base for numerical exposition. Specifically, the coefficient parameters and variance of the MMR model are set as I X Z XZ 1 and 2 1, respectively. For the joint distribution of the predictor and moderator, the (X, Z ) variables are jointly normally distributed with mean (0, 0), variance (1, 1), and correlation . The minimum sample sizes that are needed to control the designated two-sided intervals ( XZ b, XZ b) of XZ with coverage probability of at least .90 and .95 are presented in Table 1 for values of ranging from 0 to .8 in increments of .2, and b 0.1, 0.125, and 0.15. Similarly, the corresponding sample size calculations for the one-sided interval ( , XZ b) and ( XZ b, ) of XZ are listed in Table 2. Note that the sample sizes presented in Table 2 are applicable for one-sided intervals (, XZ b) and ( XZ b, ) of XZ under the chosen model configurations. An inspection of both tables reveals the expected general relations: Sample sizes increase with an increasing level of confidence 1 , and they increase with decreasing value of bound b when all other factors are fixed. Moreover, the sample size reported in Table 1 for a two-sided confidence interval is greater than the corresponding value of a one-sided confidence interval in Table 2 for fixed values of , b, and 1 . Sample Size Methodology for Confidence Intervals of Moderating Effects With Specified Ranges and Tolerances It is well known that confidence intervals are superior to hypothesis tests not only in that they reveal what parameter values would be rejected if they were used in a null hypothesis, but in that the determined value of the point estimate and the width of the interval also give ideas of the inherent location and precision of the estimation. However, the interval estimation procedures are intrinsically stochastic in nature. From a study-planning point of view, researchers may wish to obtain meaningful research findings so that the resulting confidence interval will meet the prespecified assurance and precision requirements. The corresponding approach to determining the required sample size is presented next. With the specified quantities of population configurations for moderating effect XZ, error variance 2, joint distribution g(X, Z ) of (X, Z ), tolerance probability 1 , and the prescribed range ( XZ wL, XZ wU) with proper bounds wL 0 and wU 0, the minimum sample size N required to ensure that the 100(1 )% two-sided confidence interval given in Equation 3 is within the range of ( XZ wL, XZ wU) with a tolerance probability of at least 1 can be determined by XZ XZ This procedure is complex since it must consider the stochastic nature of the confidence limits XZ tN 4, /2{ 2M}1/2 and XZ tN 4, /2{ 2M}1/2 within the unconditional framework that the predictor and moderator are random variables. The corresponding analytical presentation is summarized in Appendix C. As additional illustrations, we continue to exemplify the sample size procedures for the preceding MMR model with bivariate normal predictor and moderator variables. In this case, Table 3 presents the minimum sample sizes required to ensure that the 95% two-sided conf idence interval ( XZ tN 4,.025{ 2M}1/2, XZ tN 4,.025{ 2M}1/2) is within the range of ( XZ w, XZ w) with a tolerance probability of at least .90 and .95 for values of ranging from 0 to .8 in increments of .2, and w 0.2, 0.225, and 0.25. In addition, Table 4 shows the corresponding sample sizes that ensure that the 95% one-sided confidence intervals (, XZ tN 4,.05{ 2M}) and ( XZ tN 4,.05{ 2M}, ) are within the ranges of (, XZ w) and ( XZ w, ), respectively, with tolerance probabilities of at least .90 Table 3 Minimum Sample Sizes Required to Ensure That the 95% Two-Sided Confidence Interval ( XZ tN 4,.025{ 2M}1/2, XZ tN 4,.025{ 2M}1/2) Is Within the Range of ( XZ w, XZ w) With a Tolerance Probability of at Least .90 and .95 for Bivariate Normal Predictor and Moderator Variables ( 2 1, X Z 0, X2 Z2 1) and .95. As in the numerical evaluations associated with Table 2, the sample sizes given in Table 4 are applicable to both cases of one-sided intervals because of the special feature of the noncentral t distribution. It can be seen from Tables 3 and 4 that required sample sizes increase with an increasing level of tolerance probability 1 , and with a decreasing value of bound w when all other factors are fixed. As before, the sample size reported in Table 3 for the two-sided confidence interval is greater than the corresponding value of the one-sided confidence interval reported in Table 4 for fixed values of , w, and 1 . Furthermore, although the results are not completely comparable, the sample sizes in Tables 3 and 4 are larger than those in Tables 1 and 2. Numerical Examples The following numerical assessment represents a typical research situation frequently encountered in the planning stage of a study in order to assess interaction effects in the context of MMR. The ultimate aim is to demonstrate the sample size calculations for precise interval estimation of moderating effects based on a pilot sample and to show the potential consequence of failing to account for the underlying stochastic property of the explanatory variables. As a continued exposition of the illustration of Aiken and West (1991), it is important to remember that the aim of their numerical study was to determine whether the relationship between the self-assurance of managers (Y ) and the length of time in the position (X ) changes as a function of managerial ability (Z ). To facilitate the following illustration in the context of MMR research, suppose that there are 60 pairs of observations for predictor variable X and moderator variable Z obtained from a pilot study. The values of (X, Z ) that are presented in Table 5 represent random samples generated from a bivariate normal population with X Z 0, X2 Z2 1, and correlation .4. In view of the continuous characteristics of measurements X and Z, it is clear that the sample values in the subsequent study vary from one application to another. However, the observed configurations from the pilot study can be employed as an empirical approximation for the underlying joint distribution of X and Z. Moreover, it is shown next that the suggested approach and a simplified method utilize the empirical features that are associated with the predictor and moderator variables in distinctive ways and, thus, the two formulas lead to substantially different results in sample size calculations and in accuracy in achieving satisfactory levels of precision for interval estimation. We follow the analysis results in Aiken and West (1991, p. 10) that the parameter estimates of the MMR model are chosen as I 2.54, X 1.14, Z 3.58, XZ 2.58, and 2 1. On the basis of the 60 observed configurations of pilot data in Table 5 with Xi ( Xi, Zi, XiZi)T and the empirical probability 1/60 for i 1, . . . , 60, the estimated moment matrices for the quantities in Equation A4 can be obtained by Thus, the approximate normal distribution of W* in EquatmioanteAd5vahraisanthcee esW2ti*mate2d2.m65e1an1. IWn*plan1n.i2n3g48a arensdeaersctihstudy according to the present information, the minimum sample sizes needed to control the designated two-sided interval ( XZ b, XZ b) (2.58 0.15, 2.58 0.15) Table 5 Observed Values of Predictor Variable X and Moderator Variable Z of the Pilot Study Z X Z X Z X Z 0.7970 0.3581 0.1677 0.4875 0.0481 0.2312 2.6297 0.4406 1.7096 0.0614 1.3712 0.2643 0.1967 0.6026 0.2503 1.2201 1.0737 0.3063 0.4640 0.7609 0.1105 0.7871 1.9457 0.4328 1.2158 0.8524 1.3095 0.1378 0.7407 0.0119 0.4386 1.1241 0.5519 2.0270 0.3233 0.5837 0.1606 0.2365 1.3135 1.5577 1.4949 0.7624 2.2212 0.1174 1.1017 0.1751 0.1340 0.5943 0.3610 0.9145 0.2718 1.0854 0.2313 0.3495 0.2982 0.2510 0.6172 0.8000 0.2615 0.4457 0.9176 1.3263 0.1808 0.5732 1.2381 0.1725 2.8890 1.2777 1.2771 1.4634 0.3238 0.8302 1.1981 0.3750 0.2207 0.8958 0.4195 0.5248 0.6407 0.6331 0.7223 1.2787 1.6284 0.5142 0.8816 0.3646 0.9514 0.8073 1.2787 0.4745 1.2441 1.3834 2.9043 2.2853 0.9276 1.5124 0.7966 0.5477 0.1387 0.1980 0.1679 0.5019 0.4255 0.5386 0.9979 (2.43, 2.73) of XZ with a desired coverage probability can be determined by the approximate coverage probability function defined in Equation B2. The resulting sample sizes are 74, 116, and 162 for coverage probabilities of .80, .90, and .95, respectively. On the other hand, the researcher may presume that the identical empirical structure of predictor and moderator variables in the pilot data will continue to occur in the investigation. Therefore, the inference of moderating effects can be conducted with the simplified or conditional distribution of TXZ() in Equation A1. With the fixed modeling formulation of Equation A3, the minimum sample sizes needed to control the designated two-sided interval ( XZ b, XZ b) (2.43, 2.73) of XZ, with coverage probability of at least .80, .90, and .95, are 60, 98, and 139, respectively. These sample sizes are smaller than those reported earlier, according to the more involved normal mixture of noncentral t distributions in Equation B1. The sizable discrepancy between these two procedures indicates the need to assess their adequacy for interval estimation in achieving the nominal coverage probability. Furthermore, so that the resulting confidence interval of a desired confidence coefficient will fall into a scientifically credible range with a specified level of tolerance probability, the numerical study is extended to illustrate the advantage of the suggested procedure and the deficiency of the alternative simplified method for sample size calculations. For the MMR model with the bivariate normal predictor and moderator variables examined above, the minimum sample sizes required for the suggested formula in Equation C2 to ensure that the 95% two-sided confidence interval ( XZ tN 4,.025{ 2M}1/2, XZ tN 4,.025{ 2M}1/2) is within the range of ( XZ w, XZ w) (2.58 0.225, 2.58 0.225) (2.355, 2.805), with tolerance probabilities of at least .80, .90, and .95, are 192, 239, and 285, respectively. Accordingly, the minimum sample sizes required for the conditional formulation in Equation C4 to ensure that the 95% twosided confidence interval ( XZ tN 4,.025{ 2M}1/2, XZ tN 4,.025{ 2M}1/2) is within the range of ( XZ wL, XZ wU) (2.355, 2.805), with tolerance probabilities of at least .80, .90, and .95, are 169, 208, and 246, respectively. Obviously, the calculated sample sizes of the two procedures differ considerably for the setting considered presently. The differences between the two approaches are further examined in the following simulation study. The SAS/IML (SAS Institute, 2008) programs employed to perform the sample size calculations of the proposed approaches are presented in Appendixes D and E. Simulation Study In order to compare the performance and to reinforce the fundamental distinction of the two competing approaches, further simulation studies are conducted. For demonstration, the MMR model with the bivariate normal predictor and moderator variables described above is exploited as the basis for a Monte Carlo examination. The numerical study is conducted in two steps. First, under the selected values of coefficient parameters, error variance, and distribution configurations of bivariate predictor and moderator distribution, the approximate coverage probabilities of the two methods are calculated with the reported sample size of the proposed approach. The corresponding results are presented in Table 6. It follows that the approximate coverage probabilities of .8033, .9005, and .9507 for the proposed method are almost identical to the desired values of .80, .90, and .95 for sample sizes 74, 116, and 162, respectively, whereas the computed coverage probabilities of .8484, .9274, and .9661 associated with the simplified method are somewhat greater than the desired values of .80, .90, and .95, respectively. In the second step, the sample size N calculated by the proposed approach is utilized as a benchmark to assess the simulated coverage probability. Estimates of the true coverage probability associated with given sample size and parameter configurations are then computed through a Monte Carlo simulation of 10,000 independent data sets. For each replicate, N sets of predictor and moderator values are generated from the designated bivariate normal distribution. These values of predictor and moderator, in turn, determine the mean responses for generating N normal outcomes with the MMR model. Next, the estimate XZ is computed, and the simulated coverage probability is the proportion of the 10,000 replicates whose values of XZ fall between 2.43 and 2.73. The adequacy of the examined procedure for coverage probability and sample size calculation is determined by the formula error simulated coverage probability approximate coverage probability, comparing the simulated coverage probability and approximate coverage probability that were computed earlier. All of the calculations are performed using programs written with SAS/IML (SAS Institute, 2008). The simulated coverage probability and error for the proposed and simplified methods are also summarized in Table 6. As seen from the results, the performance of the proposed method appears to be remarkably good for the range of model specifications considered in the present article. In contrast, the simplified method yielded much larger errors and, in particular, the error is as large as 0.0501 for the sample size of 74 with coverage probability around .80. Comparatively, these errors associated with the simplified method may be too large to be satisfactory. As in the previous case, we first evaluate the approximated tolerance probabilities with sample sizes of 192, 239, and 285 for the two distinct procedures, and the resulting values are summarized in Table 7. Then, the simulated tolerance probabilities for the prescribed parameter setting and sample size are computed with the proportion of 10,000 replicates of 95% two-sided confidence intervals ( XZ tN 4,.025{ 2M}1/2, XZ tN 4,.025{ 2M}1/2) that are within the range of (2.355, 2.805). The differences between the simulated tolerance probability and approximate tolerance probability or error simulated tolerance probability approximate tolerance probability are also presented in Table 7. Clearly, the errors of the simplified method are substantially larger than those associated with the suggested approach. Hence, it can be concluded that the sample sizes calculated with the conditional formula in Equation C4 are too small to ensure sufficient tolerance probability, and the phenomenon shall continue to exist in other settings of random explanatory variables. Conclusions Due to the prevalence of MMR applications in various disciplines of social sciences, it seems prudent to ensure and to extend the understanding of fundamental properties of related inference procedures. When assessing the extent of interaction effects between continuous predictor and moderator variables, the underlying stochastic configurations of the predictor and moderator vary from one research study to another and inevitably necessitate random modeling instead of the commonly used fixed or conditional model setting. It is important that the corresponding theoretical implications of power and precision appraisal be well understood when MMR analyses are adopted by researchers. As presented above, random MMR modeling is comparatively more complex so that more research is needed before it can be accepted in place of the commonly used fixed linear regression model. The present article aimed to demonstrate the technical development of precise interval estimation and related sample size methodology with sufficient clarity so that MMR practitioners can perceive the applicability and usefulness of the information. Specifically, the proposed approach fully accommodates the arbitrary distributional formulations of the stochastic explanatory variables. The differences and impacts of failing to account for the randomness of predictor and moderator variables are elucidated through rigorous analytical presentations and numerical assessments. It is shown that the existing fixed modeling formulation may distort the precision analysis and lead to a poor choice of sample sizes. More importantly, although the suggested general procedures for sample size determinations are derived from large-sample theory, the simulation study demonstrates their accuracy in achieving desired levels of coverage and tolerance for interval estimation over a wide range of model settings. The generality and accuracy of the proposed methodology not only facilitate the echoed statistical practice of confidence intervals but also further fortify the potential applicability of MMR analysis. Accordingly, the results provide the basis for probing related considerations in more complicated situations, such as the three-way interactions discussed in Aiken and West (1991) and Dawson and Richter (2006). The author thanks the editor, Gregory Francis, and the two anonymous reviewers for their valuable comments on earlier drafts of the article. This research was partially supported by National Science Council Grant NSC-97-2410-H-009-011-MY2. Correspondence concerning this article should be addressed to G. Shieh, Department of Management Science, National Chiao Tung University, 1001 Ta Hsueh Road, Hsinchu, Taiwan 30050 (e-mail: ). APPENDIX A The Distribution of TXZ It follows from the standard assumption in Equation 1 under a fixed modeling framework that the variable has a noncentral t distribution t(N 4, ) with N 4 degrees of freedom and noncentrality parameter , where 2 is the usual unbiased estimator of 2; M is the (3, 3) element of A1 , A Ni1 (Xi X)(Xi X)T, X Ni1 Xi/N, and Xi ( Xi, Zi, XiZi)T is the 3 1 row vector for values of predictor Xi, moderator Zi, and their cross product XiZi for i 1, . . . , N; is a constant; and the noncentrality parameter ( XZ )/{ 2M}1/2. Accordingly, a particular formulation can be obtained by substituting with XZ into TXZ as follows: and TXZ( XZ) is distributed as t(N 4)a t distribution with N 4 degrees of freedom. Note that TXZ( XZ) in Equation A2 provides a useful tool for conducting statistical inferences of hypothesis testing and interval estimation about the magnitude of moderating effect XZ. In this case, the coverage probability for a designated interval ( XZ bL, XZ bU) of XZ can be computed using the simple expression of where bL 0, bU 0, U bU/{ 2M}1/2, and L bL/{ 2M}1/2. Instead of a mere fixed or conditional formulation, we focus on the particular random regression situation in which both the predictor X and the moderator Z are continuous random variables within the context of MMR. Specifically, the continuous predictor and moderator variables {(Xi, Zi), i 1, . . . , N} are assumed to have a joint probability function g(Xi, Zi) with finite moments. Moreover, the form of g(Xi, Zi) does not depend on any of the unknown parameters ( I, X, Z, XZ) and 2. The moments of the explanatory vectors Xi (Xi, Zi, XiZi)T are defined as APPENDIX A (Continued) where E[] denotes the expectation taken with respect to the joint probability density function g(Xi, Zi) of (Xi, Zi), and represents the Kronecker product. According to the formulations of A and M presented in Equation A1 for TXZ, both A and M are functions of random variables (Xi, Zi), i 1, . . . , N, within the random regression framework and, therefore, TXZ has a noncentral t distribution with random noncentrality . It follows from Shieh (2009) that W * 1/{(N 1)M} has an asymptotic normal distribution: where W* 1/(c T 1 c), W2* W4*{(cT 1 cT 1 )( 1 c 1 c) W2*}/(N 1), c (0, 0, 1)T is a 3 row vector, and and are defined in Equation A4. Therefore, the distribution of TXZ() under the random regression setting can be well approximated by the following two-stage distribution: TXZ() | W * ~ t{N 4, ( */ 2]1/2} and W * The approximate distribution of TXZ( ) is particularly useful to evaluate the cumulative probability function for XZ in terms of FXZ(c) P{ XZ c}, where c is a constant. It can be readily shown from the definition of TXZ that Accordingly, the cumulative distribution function FXZ(c) can be approximated by P{TXZ(c) 0}. EW*[P(t{N 4, [(N where the expectation EW*[] is taken with respect to the approximate normal distribution of W * presented in Equation A5. APPENDIX B Sample Size Calculations for Designated Interval Estimation of Moderating Effects It follows from the definition of TXZ( ) given in Equation 5 and the associated asymptotic approximation of cumulative distribution function FXZ of XZ presented in Equation A7 that the probability P{ XZ bL XZ XZ bU} in Equation 6 can be approximated by EW*[P{t(N 4, EW*[P{t(N 4, where U bU{(N 1)W */ 2}1/2, L bL{(N 1)W */ 2}1/2, and the expectation EW*[] is taken with respect to the approximate normal distribution of W * presented in Equation A5. Hence, the suggested computation of the smallest sample size N needed for the prescribed interval ( XZ bL, XZ bU) of XZ with coverage probability of at least 1 is performed with the approximate coverage probability formula EW*[P{t(N 4, It should be noted that numerical computations of the expected value in Equation B2 require the evaluation of a noncentral t cumulative distribution function and the one-dimensional integration with respect to a normal distribution. This procedure is not as simple as using a z or t table, but it is not unreasonable in light of modern computing capabilities. Moreover, two important aspects of the proposed procedure should be pointed out. First, both probability functions P{t(N 4, U) 0} and P{t (N 4, L) 0} do not involve the regression coefficient XZ, which corresponds to the extent of the moderating effect. However, there is a direct functional relation between the magnitudes of cumulative probability and the bounds bL and bU. Second, the mean values of the predictor, moderator, and their product are not included in the asymptotic distribution of W * defined in Equation A5. Hence, the mean vector (first moments) associated with the joint distribution of explanatory variables does not have any influence on the resulting probability levels and required sample sizes. In a similar fashion, the corresponding sample size calculations for the prescribed lower and upper one-sided intervals in the form of (, XZ bU) and ( XZ bL, ) for XZ with coverage probability of at least 1 can be conducted with the modified or approximate probability functions in terms of a normal mixture of a noncentral t cumulative distribution function given by EW*[P{t(N 4, EW*[P{t(N 4, respectively, where U, L, and W * are given above in Equation B1. It can be readily shown from Equation B3, with bU bL b and U L b{(N 1)W */ 2}1/2, that EW*[P{t(N 4, ) EW*[P{t(N 4, ) APPENDIX C Sample Size Calculations for Confidence Intervals of Moderating Effects With Specified Ranges and Tolerances According to the asymptotic results for XZ presented in Appendix A, we propose to consider the following alternative formula for computing the probability described in Equation 7: EW*[P{t(N 4, where U wU{(N 1)W */ 2}1/2, L wL{(N 1)W */ 2}1/2, and the expectation EW*[] is taken with respect to the approximate normal distribution of W * presented in Equation A5. Thus, the minimum sample size N required to ensure that the 100(1 )% two-sided confidence interval ( XZ tN4, /2{ 2M}1/2, XZ tN4, /2{ 2M}1/2) is within the range of ( XZ wL, XZ wU) with a tolerance probability of at least 1 can be determined by EW*[P{t(N 4, Moreover, the sample size calculations for the lower and upper one-sided confidence intervals in the form of (, XZ tN4, { 2M}1/2) and ( XZ tN4, { 2M}1/2, ) that fall within the ranges of (, XZ wU) and ( XZ wL, ) with a tolerance probability of at least 1 can be performed with EW*[P{t(N 4, EW*[P{t(N 4, respectively, where wU wL w and U and L are given above for Equation C1. It can be readily shown from Equation C3, with L w{(N 1)W */ 2}1/2, that EW*[P{t(N 4, ) EW*[P{t(N 4, ) In contrast, the tolerance probability with respect to the conditional distribution in Equation A1 is XZ U) wU/{ 2M}1/2 and L APPENDIX D SAS Program to Perform Sample Size Calculations for Designated Interval Estimation of Moderating Effects APPENDIX E SAS Program to Perform Sample Size Calculations for Confidence Intervals of Moderating Effects With Specified Ranges and Tolerances


This is a preview of a remote PDF: http://link.springer.com/content/pdf/10.3758%2FBRM.42.3.824.pdf

Gwowen Shieh. Sample size determination for confidence intervals of interaction effects in moderated multiple regression with continuous predictor and moderator variables, Behavior Research Methods, 2010, 824-835, DOI: 10.3758/BRM.42.3.824