The impact of ignoring random features of predictor and moderator variables on sample size for precise interval estimation of interaction effects

Behavior Research Methods, May 2011

The influence of the joint distribution of predictor and moderator variables on the identification of interactions has been well described, but the impact on sample size determinations has received rather limited attention within the framework of moderated multiple regression (MMR). This article investigates the deficiency in sample size determinations for precise interval estimation of interaction effects that can result from ignoring the stochastic nature of continuous predictor and moderator variables in MMR. The primary finding of our examinations is that failure to accommodate the distributional properties of regressors can lead to underestimation of the necessary sample size and distortion of the desired interval precision. In order to take account of the randomness of regressor variables, two general and effective procedures for computing sample size estimates are presented. Moreover, corresponding programs are provided to facilitate use of the suggested approaches. This exposition helps to correct drawbacks in the existing techniques and to advance the practice of reporting confidence intervals in MMR analyses.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

http://link.springer.com/content/pdf/10.3758%2Fs13428-011-0103-y.pdf

The impact of ignoring random features of predictor and moderator variables on sample size for precise interval estimation of interaction effects

Gwowen Shieh 0 ) Department of Management Science, National Chiao Tung University , 1001 Ta Hsueh Road, Hsinchu, Taiwan 30050 , Republic of China The influence of the joint distribution of predictor and moderator variables on the identification of interactions has been well described, but the impact on sample size determinations has received rather limited attention within the framework of moderated multiple regression (MMR). This article investigates the deficiency in sample size determinations for precise interval estimation of interaction effects that can result from ignoring the stochastic nature of continuous predictor and moderator variables in MMR. The primary finding of our examinations is that failure to accommodate the distributional properties of regressors can lead to underestimation of the necessary sample size and distortion of the desired interval precision. In order to take account of the randomness of regressor variables, two general and effective procedures for computing sample size estimates are presented. Moreover, corresponding programs are provided to facilitate use of the suggested approaches. This exposition helps to correct drawbacks in the existing techniques and to advance the practice of reporting confidence intervals in MMR analyses. Moderated multiple regression (MMR) has been extensively employed to study the interaction effects between predictor - and moderator variables in management, psychology, education, and related disciplines. It follows from the comprehensive reviews of Stone-Romero, Alliger, and Aguinis (1994), Aguinis (1995), Aguinis and Stone-Romero (1997), Aguinis, Beaty, Boik, and Pierce (2005), and relatedwork that most of the methodological research in MMR has been concerned with the statistical power of hypothesis testing for detecting moderating effects. Although null hypothesis significance testing is useful in various applications, the dichotomous acceptreject decision ignores other useful information in its analysis. As an alternative, the notion of interval estimation has been stressed in studies such as Hahn and Meeker (1991), Steiger and Fouladi (1997), and Smithson (2003). Accordingly, the inferential procedures of interval estimators are strongly recommended by Wilkinson and the American Psychological Association Task Force on Statistical Inference (1999), as well as the Publication Manualof the American Psychological Association (American Psychological Association, 2001). Since confidence intervals constructed with the desired reliability are more informative about the location of a targeted parameter, they should be the best reporting strategy in practical study. However, the methodological artifacts and statistical implications associated with interval estimation of moderating effects have received little attention within the framework of MMR. The interactional formulation of MMR can be viewed as a special case of the statistical linear models, and so the inferential procedures of hypothesis testing and interval estimation of moderation can be conducted with standard methods and software packages for linear regression analysis. In this article, we consider the simple interaction models with criterion variable Y, predictor variable X, moderator variable Z, their cross-product term XZ, and a normal error term in the formulation of Y bI X bX ZbZ XZbXZ ", where both the predictor X and moderator Z are continuous variables. Naturally, the special consideration of continuous moderator and predictor variables incurs the important notion of two different regression formulations. Because of the inherent nature of continuous measurements of the two regressor variables, they are not typically fixed in advance and are available only after the data has been collected. In order to recognize this stochastic feature of regressor variables, the appropriate strategy is to consider a random regression or unconditional formation, rather than a fixed or conditional setting, in experimental designs where the factors are under the control of investigators. The intrinsic appropriateness and theoretical properties of fixed and random regression models have been discussed in Cramer and Appelbaum (1978) and Sampson (1974). Essentially, the inferential procedures of hypothesis testing and interval estimation are the same under both fixed and random formulations. The distinction between the two modeling approaches, however, becomes crucial when power, coverage probability, and corresponding sample size calculations are to be made. In the context of MMR, the distinct formulations of fixed and random modeling of the simple interaction models were especially emphasized in Shieh (2009, 2010). Specifically, Shieh (2009) considered the power calculation and sample size determination for significance tests of moderating effects. The procedure takes account of the critical factors of strength of moderator effect, magnitude of error variation, and distributional property of predictor and moderator variables. On the other hand, Shieh (2010) incorporated the random nature of continuous moderator and predictor variables into two approaches to sample size computation for precise interval estimation of interaction effects. One approach provides the necessary sample size so that the designated interval for the least squares estimator of moderating effects attains the specified coverage probability. The other approach gives the sample size required to ensure, with a given tolerance probability, that a confidence interval of moderating effects will be within a specified range. The vital discrepancies between the conditional and unconditional settings in power and precision analyses are also closely evaluated in Shieh (2009, 2010). The results reveal substantial detrimental effects of failing to account for the randomness of predictor and moderator variables. Thus, the conventional fixed modeling formulation may not be applicable to the MMR with continuous regressor variables, because ittends to give insufficient sample sizes and inevitably leads to poor statistical performance. Notably, there is a considerable recent literature pertaining to the illuminating applications of precise interval estimation in multiple linear regression (see Kelley 2008; Krishnamoorthy & Xia 2008; Kelley & Maxwell 2008; and the references therein). Instead of a direct accept-or-reject conclusion in a simple hypothesis test, it is arguable that confidence intervals generally provide more information about the interested parameter value with a quantitative bound and assurance level. Accordingly, researchers should become methodologically conscious that the mere statistical significance of a targeted parameter is inadequate to warrant the conclusion that the effect is substantial and practically important. In view of the relatively scarce description of interval estimation in MMR, it is prudent to contribute to the documentation and examination of confidence intervals in different perspectives. The precision considerations in Shieh (2010) may not be the only criteria of practical importance. Just as in the instance of Kelley and Maxwell, two other useful principles related to the statistical properties of a confidence interval deal with the control of expected width and tolerance probability of interval width within a designated value. However, the explication of Kelley and Maxwell is confined to the case of a fixed regressor modeling framework, so that the corresponding sample size procedures may not be appropriate for the interval estimation of interaction effects between continuous predictor and moderator variables. Moreover, it is also noted in Kelley and Maxwell that the sampling distribution of an estimated regression coefficient depends on whether the regressors are of fixed or random nature. Thus, the difference should be properly recognized in sample size planning for the simple interaction models described above. The aim of this article is to contribute to the design of MMR study by illustrating how the specification of inherent features associated with continuous predictor and moderator variables in sample size calculations influences the resulting precision of confidence intervals for the prescribed two interval width appraisals in the inference of an interactive effect. Consequently, the presentation here complements the work of Shieh (2010) with distinct precision criteria in sample size determinations. Due to the complexity within the random regression framework, there appears to be a lack of applicable sample size procedures in the literature that accommodate the considerations of expected confidence interval width and tolerance probability of interval width within a designated value. Therefore, it is essential to extend the development and exposition of sample size methodology for precise interval estimation of interaction effects in two distinct aspects. One method gives the minimum sample size, such that the expected confidence interval width is within the designated bound. The other provides the sample size needed to guarantee, with a given tolerance probability, that the width of a confidence interval will not exceed the planned range. It is important to realize that the simplicity of a fixed setup may be appealing for inducing computational shortcuts, but it does not involve all of the key factors in sample size calculation and, thus, is generally error prone. Accordingly, theoretical implications and numerical examinations are presented to demonstrate that the sample size procedures for fixed regression, although they share many similarities with those for random regression, have some distinct disadvantages in MMR applications. Confidence intervals of interaction effects Consider the simple interaction model or MMR model within the fixed modeling framework: Yi bI XibX ZibZ XiZibXZ "i; where Yi is the value of the response variable Y; Xi and Zi are the values of the continuous predictor X and moderator Z; i are iidN(0, 2) random errors for i = 1, , N; and I, X, Z, and XZ are unknown parameters. To examine the existence and magnitude of a moderating effect, we are concerned with the distributional property for the least squares estimator bbXZ N b ; V bbXZ of XZ, where V bbXZ is the variance of bbXZ and one useful expression is V bbXZ s2=SSEXZ where SSEXZ 1 R2XZ SX2Z is the residual sum of squares for the regression of the product term XZ on the X and Z variables, R2XZ is the corresponding coefficient of determination, SX2Z iPN1 Ui U 2, Ui = XiZi, i = 1, , N, and N U i P1Ui=N . Moreover, the natural estimator Vb bbXZ o f Vb bbXZ i s Vb bbXZ sb2=SSEXZ, w h e r e sb2 1 R2SY2 =N 4 is the usual unbiased estimator of 2, R2 is the coefficient of determination of the model in Eq. 1, 2 SY2 i PN1 Yi Y ,and Y i PN1Yi=N . For inferential purposes, we focus on the interval estimation procedure of XZ here. According to the standard results (Rencher, 2000, Section 8.6), a 100(1 )% confidence interval of XZ is tN 4; a=2nsb2=SSEXZ o1=2 ; bbXZ tN 4; a=2nsb2=SSEXZ o1=2 where tN 4, /2 is the 100(1 /2)th percentile of the t distribution with N - 4 degrees of freedom. The half-width of the 100(1 )% confidence interval is denoted by H tN 4; a=2 sb2=SSEXZ 1=2: Thus, the actual half-width H depends on the sample size N, confidence coefficient 1 - , variance estimate sb2, and observed values of predictor and moderator variables through the quantity of SSEXZ. Due to the nature of continuous measurements encountered in practical research, the regressor variables typically cannot be controlled and are available only after observation. Hence, in order to extend this concept and applicability to MMR, the continuous predictor and moderator variables {(Xi, Zi), i = 1, , N} in Eq. 1 are assumed to have a joint probability functiong (Xi, Zi) with finite moments. Moreover, the form of g(Xi, Zi) does not depend on any of the unknown parameters (I, X, Z, XZ)or 2. Consequently, both s2 and SSEXZ are b realized values of random variables. It readily follows from Eq. 2 that the statistical properties of s2 and SSEXZ jointly b determine the underlying distributional feature for the halfwidth H of a confidence interval. When planning a study, researchers wish to ensure that the confidence interval is narrow enough to produce meaningful findings despite the stochastic nature of an interval width. In order to obtain an informative interval, it is necessary to specify not only the required confidence level and desired precision, but also the appropriate sample size. The most common approach is to determine the required sample size such that the expected half-width of a 100(1 )% confidence interval is within the designated bound where the expectation E[H] is taken with respect to the joint distribution of sb2 and SSEXZ, and (>0) is a constant. Alternatively, one may compute the sample size needed to guarantee with a given tolerance probability that the width of a 100(1 )% confidence interval will not exceed the planned range PfH < wg 1 g; 4 where 1 is the specified tolerance level and (> 0) is a constant. Note that these two concerns of expected magnitude and tolerance probability of half-width have been discussed in Kupper and Hafner (1989) for simpler situations of one- and two-sample problems and in Kelley and Maxwell (2008) for multiple linear regression. Although the notion of expected width is widely covered in standard texts for sample size determination, the assurance of tolerance probability approach is recommended by Kupper and Hafner. However, it is noteworthy that the two principles are closely related to the two distinct criteria of unbiasedness and consistency in statistical point estimation. Therefore, each measure is of theoretical importance and practical interest in its own right. Furthermore, although the results are not completely comparable, it typically requires a larger sample size to meet the necessary assurance of tolerance probability than the control of a designated expected width. Sample size procedures for precise interval estimation It was mentioned above that Kelley and Maxwell (2008, p. 173) focused on the situation in which regressor variables are fixed.The particular conditional design assumes that the same set of regressor values would be examined for repeated studies. Since sb2 is an unbiased estimator of 2, the use of population notation in the expression of V bbXZ in Kelley and Maxwell (2008, rE2XqZuasti2XoZno3r4E)SimSEpXliZcit:lyNassum4es1thatr2XSZSEsX2XZZ ,: wNhere 4r2XZ1 is the coefficient of determination for the regression of product term XZ on the X and Z variables and s2XZ is the variance of the product term XZ. Although the form of V bbXZ involves some unstated clarifications, the formulation : suggests a simplified approximation for EVbbbXZ s2= N 4 1 r2XZ s2XZ under a random regression setup. Hence, the corresponding approximate expected half-width is 1=2: Accordingly, the sample size NEHS needed for the expected half-width of a 100(1 )% confidence interval to fall within the designated bound is the minimum integer N, such that tN 4; a=2 s2= N A similar expression is presented in Kelley and Maxwell (Equation 35). On the other hand, an analogous argument applies for the tolerance probability consideration given in Eq. 4. Thus, PfH < wg : PntN 4; a=2hsb2=fN 41 r2XZ s2XZ g 1=2 < wg, or equivalently, PfH <wg : PnK < w=tN 4; a=2 2N r2XZ s2XZ =s2o; where K N 4sb2=s2 x2N 4 and 2() is a ch7isquare distribution with degrees of freedom. The sample size NPHS required to guarantee with a given tolerance probability (1 ) that the width of a 100(1 )% confidence interval will not exceed the planned range is the smallest integer N such that r2XZ s2XZ =s2 8 where 2(N 4, ) is the 100(1 )th percentile of a 2(N 4) distribution. The formula in Kelley and Maxwell (Equation 35) basically provides the same inequality, although the degrees of freedom of the chi-square distribution there was incorrectly expressed as N 1 instead of N 4. To compute the necessary sample sizes, a standard iterative search can be conducted to find NEHS and NPHS with the inequalities given in Eqs. 6 and 8, respectively. As noted above, these two simplified procedures rely primarily on the straightforward approximation of ESSEXZ : N 4 1 r2XZ s2XZ, and they can alternatively be viewed as the exact methods under the fixed regressor setting as in Kelley and Maxwell. However, their usage in the random regression scenario raises natural concerns about the resulting accuracy, for the obvious reason that the underlying variability of the residual sum of squares SSEXZ is ignored. The potential deficiency of the sample sizes NEHS and NPHS for precise interval estimation of interaction effects will be examined later in the numerical investigations. The prescribed discussion emphasizes that the underlying statistical properties of a confidence interval half-width are uniquely defined by the joint distribution of the regressors. Accordingly, the underlying property of predictor and moderator variables should be incorporated into the sample size procedures as much as possible. In order to provide a unified and feasible technique, we adopt the large-sample approach to avoid reliance on anyspecific distributional assumption of the predictor and moderator variables. To this end, the theoretical properties of the suggested asymptotic approximation to the distribution of W SSEXZ =N 1 is presented in Eq. 11 in the Appendix. The results facilitate the proposed sample size procedures for constructing precise confidence intervals under the two precision criteria described in Eqs. 3 and 4. With the approximate expected half-width presented in Eq. 12, the sample size NEHP needed for the expected halfwidth of a 100(1 )% confidence interval to fall within the designated bound is the minimum integer N, such that 4; a=2N 1 1=2Esb EhW 1=2i In contrast, the approximate probability in Eq. 13 suggests another useful approach to sample size determination. Consequently, the sample size NPHP required to guarantee, with a given tolerance probability (1 ), that the width of a 100(1 )% confidence interval will not exceed the planned range is the smallest integer N, such that Clearly, the two inequalities in Eqs. 9 and 10 are more involved than the corresponding formulas in Eqs. 6 and 8. Instead of substituting the term SSEXZ with a constant estimate, as in the simplified procedures, the extra complexity in the suggested formulations is employed to reflect the embedded stochastic characteristic of SSEXZ. The usefulness of the sophisticated inequalities is shown next in the numerical example. Numerical example To illustrate a typical sample size problemmost frequently encounteredin the planning stage of an MMR study, we consider the hypothetical research framework presented in Shieh (2010) for assessing interaction effects between the length of time in the position (X) and managerial ability (Z) on the self-assurance of managers (Y). Due to the prospective nature of advanced research planning, the general guideline suggests that a successful pilot study can offer plausible and reasonable planning values of the vital characteristics for calculating the necessary sample size. On the basis of the pilot data in Table 5 in Shieh (2010), the empirical distribution of the 60 observed configurations of predictor and moderator variables may be utilized to reconstruct or approximate the actual distributional feature of the two variables. However, the prescribed sample size procedures differ in their use of the pilot information and lead to substantially distinct results. The key difference is that the simplified method treats the regressor variables as constants, but it is more reasonable to regard them as random variables, and so the suggested procedures assume they were drawn from a multivariate population. For the 60 observed values of Xi = (Xi, Zi, XiZi)T and empirical probability 1/60 for i = 1, . . . , 60, the moment matrices for the quantities in the Appendix can be obtained by 2 T =60; 2 T i=60: Moreover, it follows that mW 1 r2XZ s2XZ : 1:2348 and s2W : 22:6511=N 1, as shown in Shieh (2010). In planning a research study with these prior inputs, the minimum sample size needed to control the expected halfwidth of a 95% confidence interval of interaction effects within the designated bound 0.15 can be computed with the inequalities in Eqs. 6 and 9. The resulting sample sizes are Table 1 Computed sample size, approximate expected half-width, andsimulatedexpected half-widthfor the half-width of a 90% twosided interval of bbXZ at specified half-width = 0.15with bivariate NEHS = 145 and NEHP = 156, and the corresponding approximate expected half-widths given in Eq. 12 are 0.1560 and 0.1497, respectively. Hence, the sample size NEHS determined by the simplified method is too small to satisfy the desired precision bound. This phenomenon should continue to exist in other settings. Moreover, the smallest sample sizes required to guarantee with a given tolerance probability of .90 that the width of a 95% confidence interval will not exceed the planned range 0.15 are NPHS = 165 and NPHP = 216, based on the inequalities in Eqs. 8 and 10, respectively. Notably, the approximate tolerance probabilities obtained with Eq. 13 are .6762 and .9019. The sizable difference between NPHP NPHS = 216 165 = 51 yields the major deficit of .9019 .6762 = .2257 in tolerance probability. This numerical investigation exemplifies the fundamental deficiency that overlooking the stochastic nature in regressor variables may lead to a serious underestimation of the sample size required to obtain a designated expected half-width or to ensure the adequate probability of achieving the desired half-width for the confidence intervals. To facilitate the application of the proposed approaches, the SAS/IML (SAS Institute, 2008) programs employed to perform the sample size calculations are available as supplemental materials from brm.psychonomic-journals.org/content/supplemental. Users can easily identify the statements containing the key values in this exposition and then modify the program to accommodate their own specifications. Simulation study To further evaluate the performance of the sample size formulas with respect to the prescribed precision criteria in Eqs. 3 and 4 under various parameter specifications, the MMR model defined in Eq. 1 with bivariate normal predictor, and moderator variables is used as the base for a normal predictor and moderator variables (XZ = 1, 2 = 1, X = Z = 0, s2X = s2Z = 1, correlation ) Simplified Method Proposed Method Approximate Expected Half-Width Simulated Expected Half-Width Approximate Expected Half-Width Simulated Expected Half-Width Approximate Expected Half-width Simulated expected Half-width Approximate Expected Half-Width Simulated Expected Half-Width Monte Carlo exposition. A similar bivariate normal assumption was made for related MMR treatments in McClelland and Judd (1993), OConnor (2006), and Shieh (2009, 2010). Specifically, the coefficient parameters and variance of the simple interaction model are set as I = X = Z = XZ = 1 and 2 = 1, respectively. Moreover, the predictor and moderator (X, Z) variables are jointly normally distributed with mean (0, 0), variance (1, 1), and correlation . The correlation parameter between the predictor variable and moderator variable is set at the five levels of .1.9 in increments of .2. With the specifications described above, the simulation study is conducted in two steps. First, under the selected values of coefficient parameters, error, distribution configurations of the bivariate predictor, and moderator distribution, the estimates of sample sizes required for precise interval estimation are computed with confidence levels (1 ) = .90 and .95, = = 0.15 and 0.20, and tolerance probability (1 ) = .90. These levels were selected to represent reasonably the range of specifications and sample sizes used in typical research settings.The sample sizes NEHS and NEHP required for controlling the expected half-width are presented in Tables 1, 2, 3 and 4, while those sample sizes, NPHS and NPHP, ensuring the probability of a bounded confidence Table 3 Computed sample size, approximate expected half-width, and simulated expected half-width for the half-width of a 95% twosided interval of bbXZ at specified half-width = 0.15with bivariate interval are summarized in Tables 5, 6, 7 and 8. It can be readily seen that the computed sample sizes NEHP are generally larger than NEHS for controlling the expected halfwidth in Tables 1, 2, 3 and 4, although the difference is more pronounced between the sample size estimates NPHS and NPHP for the assurance of tolerance probability in Tables 5, 6, 7 and 8. With the reported sample sizes NEHS and NPHS, the actual values of the approximate expected half-width and approximate tolerance probability given in Eqs. 5 and 7, respectively, are also calculated. Similarly, the approximated expected half-width and approximated tolerance probability associated with computed sample sizes NEHP and NPHP are calculated on the basis of Eqs. 12 and 13, respectively. As was expected, the resulting approximate expected half-width is slightly less than the designated expected half-width, whereas the approximate tolerance probability is marginally greater than the chosen tolerance probability. In the second step, the accuracy of the sample size procedures is examined through a Monte Carlo simulation study. Estimates of the true expected half-width and tolerance probability associated with a given sample size N and various parameter configurations are computed through a Monte Carlo simulation of 10,000 independent normal predictor and moderator variables (XZ = 1, 2 = 1, X = Z = 0, s2X = s2Z = 1, correlation ) Simplified Method Proposed Method Approximate Expected Half-Width Simulated Expected Half-Width Approximate Expected Half-Width Simulated Expected Half-Width Table 2 Computed sample size, approximate expected half-width, andsimulatedexpected half-widthfor the half-width of a 90% twosided interval of bbXZ at specified half-width = 0.20with bivariate normal predictor and moderator variables (XZ = 1, 2 = 1, X = Z = 0, s2X = s2Z = 1, correlation ) Simplified Method Proposed Method Table 4 Computed sample size, approximate expected half-width, and simulated expected half-width for the half-width of a 95% twosided interval of bbXZ at specified half-width = 0.20 with bivariate Simplified Method Proposed Method normal predictor and moderator variables (XZ = 1, 2 = 1, X = Z = 0, s2X = s2Z = 1, correlation ) Approximate Expected Half-Width Simulated Expected Half-Width Approximate Expected Half-Width Simulated Expected Half-Width data sets. For each replicate, N sets of predictor and moderator values are generated from the designated bivariate normal distribution. These values of predictor and moderator, in turn, determine the mean responses for generating N normal outcomes with the simple interaction model. Next, the half-width estimate H is computed, and the simulated expected half-width is the mean of the 10,000 replicates of H. Alternatively, the simulated tolerance probability is the proportion of the 10,000 replicates whose values of H are less than or equal to the specified bound . The adequacy of the sample size procedure for precise interval estimation is determined by the following formula: error approximated expected half width simulated expected half width, or error approximated tolerance probability simulated tolerance probability. The simulated expected half-width, simulated tolerance probability, and associated error are also summarized in Tables 1, 2, 3, 4, 5, 6, 7 and 8. Examination of the sample sizes in these tables reveals the general pattern that when all other factors remain constant, the sample size increases with increasing confidence level (1 ), with decreasing half-width bound or and with decreasing correlation . Therefore, for both simplified and proposed methods, the largest sample size Table 5 Computed sample size, approximate tolerance probability, and simulated tolerance probability for the half-width of a 90% twosided interval of bbXZ at specified half-width = 0.15 and tolerance Simplified Method Proposed Method Approximate Tolerance Probability Simulated Tolerance Probability Approximate Tolerance Probability Simulated Tolerance Probability NEHS = 176 and NEHP = 179 for (1 ) = .95, = 0.15 and = .1 in Table 3, whereas the smallest sample sizes NEHS = 44 and NEHS = 52 for (1 ) = .90, = 0.20 and = .9 in Table 2. Accordingly, the largest and smallest sample sizes NPHS are 198 and 54, and NPHP are 232 and 84 in Tables 7 and 6, respectively. Furthermore, as can be seen from the errors in Tables 1, 2, 3 and 4 concerning the precision of expected half-width, the inequality in Eq. 6 for computing the sample size NEHS produces accurate expected half-width for the cases of = 0.15 in Tables 1 and 3. Most of the resulting absolute errors are less than 0.01, with only two exceptions (0.0102 and 0.0131) in Table 1. In contrast, the performance associated with the situations of = 0.20 in Tables 2 and 4 degrade slightly but are still reasonable, with the absolute errors between 0.0073 and 0.0279. In short, the simplified approach tends to give reliable yet marginally smaller than required sample sizes. However, the accuracy improves with larger sample sizes, because large errors occur with smaller sample sizes. Comparatively, the errors of the expected half-width associated with sample sizes NEHP in Tables 1, 2, 3 and 4 clearly show that the inequality of Eq. 9 performs extremely well because all absolute errors are less than or equal to 0.0055 for the 20 cases examined probability 0.90 with bivariate normal predictor and moderator variables (XZ = 1, 2 = 1, X = Z = 0, s2X = s2Z = 1, correlation ) Simplified Method Proposed Method Approximate Tolerance Probability Simulated Tolerance Probability Approximate Tolerance Probability Simulated Tolerance Probability inspection of Tables 5, 6, 7 and 8 shows that the corresponding differences between the approximated and simulated tolerance probabilities are fairly small. Since the considered approach useslarge sample approximation, the accuracy is affected, to some extent, for those situations with small sample sizes. The largest two deviations of 0.0258 and 0.0287 occur with the sample sizes 94 and 84 in Table 6 for = .7 and .9, respectively. Obviously, the advantage of the proposed procedure over the simplified method persists in the case of ensuring the required tolerance level. Due to the behavior of the simplified method, the accurate performance of the suggested inequality outweighs the extra computational requirement. In light of these detailed empirical comparisons, the proposed methods are clearly superior to the simplified procedures in sample size calculations for precise interval estimation. Concluding remarks The purpose of the present article was to discuss the sample size issues surrounding the use of confidence intervals for the inference of interaction effects in MMR probability .90 with bivariate normal predictor and moderator variables (XZ = 1, 2 = 1, X = Z = 0, s2X = s2Z = 1, correlation ) Table 6 Computed sample size, approximate tolerance probability, and simulated tolerance probabilityfor the half-width of a 90% twosided interval of bbXZ at specified half-width = 0.20and tolerance probability .90 with bivariate normal predictor and moderator variables (XZ = 1, 2 = 1, X = Z = 0, s2X = s2Z = 1, correlation ) Simplified Method Proposed Method Approximate Tolerance Probability Simulated Tolerance Probability Approximate Tolerance Probability Simulated Tolerance Probability here. Although the simplified inequality of Eq. 6 is accurate enough for practical use, it is consistently outperformed by the improved formula in Eq. 9. On the other hand, the performance of the simplified method is extremely poor and disturbing for the consideration of tolerance probability of bounded interval half-width. The discrepancies between the approximate tolerance probabilities and simulated tolerance probabilities in Tables 5, 6, 7, and 8 range from 0.2409 to 0.3321. Hence, the inequality in Eq. 8 severely overestimates the attained tolerance probability, and thus it underestimates the necessary sample size to meet the selected criterion. Similar findings were reported in Kupper and Hafner (1989) for interval estimation of one- and two-sample problems. In the present case, the parameter value of N 4 1 r2XZ s2XZ is used in place of SSEXZ, so the variability in SSEXZ has been neglected in sample size calculations. The repercussionsof ignoring the random feature of predictor and moderator variables on sample size calculation are detrimental and substantial. Hence, the simplified procedure should not be used in such a random regression setting, because it can lead to under allocation of sample size or overconfidence in the interval precision. Regarding the behavior of the proposed method, an Table 7 Computed sample size, approximate tolerance probability, and simulated tolerance probability for the half-width of a 95% twosided interval of bbXZ at specified half-width = 0.15 and tolerance Table 8 Computed sample size, approximate tolerance probability, and simulated tolerance probability for the half-width of a 95% twosided interval of bbXZ at specified half-width = 0.20 and tolerance Simplified Method Proposed Method probability .90 with bivariate normal predictor and moderator variables (XZ = 1, 2 = 1, X = Z = 0, s2X = s2Z = 1, correlation ) Approximate Tolerance Probability Simulated Tolerance Probability Approximate Tolerance Probability Simulated Tolerance Probability studies. The focus is on the information contained in the joint probability function of continuous predictor and moderator variables in a simple interaction model. During the planning stage for MMR research with limited resources, it is important to consider all possible features. We demonstrate that sample size estimates for precise interval estimation of interaction effects will generally be inadequate and misleading if they are based solely on the anticipated characteristics of the regressor variables. Thus, the simplified procedure based on a fixed modeling setup should not be applied indiscriminately. With increased computing power and the general availability of statistical software, computational simplicity is no longer an adequate criterion. An appropriate approach should involve all of the critical factors in sample size determination. Therefore, a more prudent strategy is to account for the stochastic behavior in regressor variables. This article gives explicit formulas for calculating the necessary sample size with respect to the considerations of expected confidence interval halfwidth and tolerance probability of interval half-width within a designated value. The proposed approaches have clear advantages in the flexibility of the joint distribution of predictor and moderator variables and the unification of a normal approximation for ease of computation. More important, the performance of the suggested methods appears to be remarkably good for the range of model specifications considered in the present article. The proposed methodology not only facilitates the advocated practice of interval procedures, but also further reinforces the potential usefulness of MMR analysis. Author Note The author thanks the editor, Gregory Francis, and the two anonymous reviewers for their valuable comments on earlier drafts of the article. confidence interval is H tN The properties of the confidence interval half-width It follows from the standard assumption in Eq. 1 for the simple interaction model that the half-width of the 100(1 )% o1=2 4; s=2nsb2=SSEXZ the statistical property of H is determined by the joint distribution of s2 and SSEXZ. Since the distribution of s2 s2=N 4 xb2N 4 does not depend on the predbictor and moderator variables, s2 and SSEXZ are independent. The b remaining issue in describing the feature of H is to attain the distribution of SSEXZ. Note that the distribution of SSEXZ is somewhat more complex, and an explicit expression generally does not exist. However, much of the complexity is considerably simplified if we consider the asymptotic phenomenon. For ease of discussion, the moments of the explanatory vectors Xi = (Xi, Zi, XiZi)T are defined as = EhXi where the expectationsare taken with respect to the joint probability density function g(Xi, Zi) of (Xi, Zi), and represents the Kronecker product. Analogous to the practical standpoint of Shieh (2009) for providing a generally useful and versatile solution without being specifically confined to any particular joint probability function g(Xi, Zi), we consider the large-sample distribution of W SSEXZ =N 1, (0, 0, 1)T is a 3 1 row vector, and and are defined above. It is noted in Shieh (2009) that the mean value W* is equivalent to the extra variance of the product XZ after controlling for X and Z. Therefore, it is more informative to express it as mW 1 r2XZ s2XZ here. As SSEXZ N 1W is the residual sum of squares for the regression of product term XZ on the X and Z variables, the values of both SSEXZ and W* are presumably nonnegative. It appears that the probability of negative W*, P(W* < 0), is often small enough so that the large sample normal approximation of W* is nearly adequate for practical purposes. Thus, the evaluations with respect to g(Xi, Zi) are transformed to and approximated by the corresponding assessments with respect to W*. According to the aforementioned theoretical results, the expected half-width can be simplified as 4; a=2N 1 1=2Esb EhW 1=2i It follows from the fact that s2 has a multiple of a chi-square b distribution that Esb sf2=N 4=2g: Since there is no analytic expression for E[W* 1/2], the actual quantity needs to be evaluated with numerical integration with respect to the normal distribution in Eq. 11. In addition, the probability P{H < } for > 0 can be rewritten as : PfH < wgPfK < wW g; W h e r e K N 4sb2=s2 x2N 4 a n d w N 1N 4w2 =tN2 4; a=2s2g. To permit computational simplifications, the approximation is alternatively expressed as where (c) = P{K < c}, c > 0, is the cumulative density function of 2(N 4). Since all related functions of normal and chi-square distributions are readily embedded in contemporary statistical packages, the expressions for approximate values of E[H] and P{H < } given in Eqs. 12) and 13), respectively,can be readily implemented.


This is a preview of a remote PDF: http://link.springer.com/content/pdf/10.3758%2Fs13428-011-0103-y.pdf

Gwowen Shieh. The impact of ignoring random features of predictor and moderator variables on sample size for precise interval estimation of interaction effects, Behavior Research Methods, 2011, 1075-1084, DOI: 10.3758/s13428-011-0103-y