A comparison of two indices for the intraclass correlation coefficient

Behavior Research Methods, Mar 2012

In the present study, we examined the behavior of two indices for measuring the intraclass correlation in the one-way random effects model: the prevailing ICC(1) (Fisher, 1938) and the corrected eta-squared (Bliese & Halverson, 1998). These two procedures differ both in their methods of estimating the variance components that define the intraclass correlation coefficient and in their performance of bias and mean squared error in the estimation of the intraclass correlation coefficient. In contrast with the natural unbiased principle used to construct ICC(1), in the present study it was analytically shown that the corrected eta-squared estimator is identical to the maximum likelihood estimator and the pairwise estimator under equal group sizes. Moreover, the empirical results obtained from the present Monte Carlo simulation study across various group structures revealed the mutual dominance relationship between their truncated versions for negative values. The corrected eta-squared estimator performs better than the ICC(1) estimator when the underlying population intraclass correlation coefficient is small. Conversely, ICC(1) has a clear advantage over the corrected eta-squared for medium and large magnitudes of population intraclass correlation coefficient. The conceptual description and numerical investigation provide guidelines to help researchers choose between the two indices for more accurate reliability analysis in multilevel research.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://link.springer.com/content/pdf/10.3758%2Fs13428-012-0188-y.pdf

A comparison of two indices for the intraclass correlation coefficient

Behav Res A comparison of two indices for the intraclass correlation coefficient Gwowen Shieh 0 ) Department of Management Science, National Chiao Tung University , 1001 Ta Hsueh Road, Hsinchu , Taiwan 30050 In the present study, we examined the behavior of two indices for measuring the intraclass correlation in the one-way random effects model: the prevailing ICC(1) (Fisher, 1938) and the corrected eta-squared (Bliese & Halverson, 1998). These two procedures differ both in their methods of estimating the variance components that define the intraclass correlation coefficient and in their performance of bias and mean squared error in the estimation of the intraclass correlation coefficient. In contrast with the natural unbiased principle used to construct ICC(1), in the present study it was analytically shown that the corrected eta-squared estimator is identical to the maximum likelihood estimator and the pairwise estimator under equal group sizes. Moreover, the empirical results obtained from the present Monte Carlo simulation study across various group structures revealed the mutual dominance relationship between their truncated versions for negative values. The corrected eta-squared estimator performs better than the ICC(1) estimator when the underlying population intraclass correlation coefficient is small. Conversely, ICC(1) has a clear advantage over the corrected eta-squared for medium and large magnitudes of population intraclass correlation coefficient. The conceptual description and numerical investigation provide guidelines to help researchers choose between the two indices for more accurate reliability analysis in multilevel research. Multilevel modeling; Random effect; Reliability - Because of the inherently hierarchical nature of experimental randomization by cluster and clustered sampling designs, multilevel phenomena and modeling are common in behavioral, educational, and social sciences. Although the underlying structure of multilevel models is more complex than the traditional linear regression models, their basic concept and modeling framework provide a straightforward mechanism for examining the interrelation of individual and group influences. Thus, the class of hierarchical linear models has become the most widely accepted basis for multilevel analysis (Goldstein, 2002; Raudenbush & Bryk, 2002; Snijders & Bosker, 1999). The relevant methodological and theoretical issues have been further addressed throughout various areas of research by Cools, Van den Noortgate, and Onghena (2009); Hedge and Hedberg (2007); Hoffman and Rovine (2007); Hofmann (1997, 2002); Hofmann, Griffin, and Gavin (2000); Kozlowski and Klein (2000); Murray, Varnell, and Bliestein (2004); and Tasoluk, Droge, and Calantone (2009). One of the distinctive features of multilevel data is that the primary outcomes for study participants within the same unit are more similar than the responses by individuals in different units. Hence, a problem of interest with regard to hierarchical data is the assessment of similarity or degree of resemblance among cluster members. The clustering can be expressed in terms of correlation among the measurements on individuals within the same group. The fundamental properties and general guidelines related to intraclass correlation coefficient (ICC) as an interrater reliability measure are described in Bartko (1976), McGraw and Wong (1996), and Shrout and Fleiss (1979). In particular, McGraw and Wong emphasized the distinction between various forms of the ICC developed using data from one-way and two-way random and mixed-effect ANOVA models. In a typical interrater reliability study, each of a random sample of g participants is rated independently by n judges. For the simplest case, the one-way random effect model assumes that the random participant represents the only systematic source of variation (without judge effect). On the other hand, the twoway random or mixed effect model assumes there is a systematic source of random or fixed judge effect as well as the random participant variation. Moreover, the two-way random and mixed models can be extended to accommodate the additional participant–judge interaction effect. Notably, these four two-way models differ in whether judge represents random or fixed effects and in whether the mode includes an interaction component (For ease of reference, the prescribed five models were noted as Cases 1, 2A, 3A, 2, and 3 in Table 1 of McGraw and Wong 1996). Consequently, the conceptual modeling difference among these five models ultimately leads to a distinct and unique definition of ICCs. The corresponding standard ANOVA estimators can be constructed with the mean square expectation (as shown in Tables 3 and 4 of McGraw & Wong, 1996). However, McGraw and Wong’s primary focus was to provide a comprehensive account of procedures for computing confidence intervals and constructing significance tests. Hence, the existence and comparison of alternative estimators of ICC reliability measure were not considered. Note that there is a considerable amount of recent literature pertaining to both Table 1 The bias of ICC indices for g ¼ 4 and N ¼ 20 ðn ¼ 5Þ Group Size Structure the theoretical and practical problem of meaningfully and efficiently estimating correlations among individuals in nested designs (e.g., Alferes & Kenny, 2009; Beal & Dawson, 2007; Bliese, 1998, 2000; Bliese & Halverson, 1998; Castro, 2002; Courrieu, Brand-D’abrescia, Peereman, Spieler, & Rey, 2011; Hedges & Rhoads, 2011; LeBreton & Senter, 2008; O’Connor, 2004; Wampold & Serlin, 2000). In the present article, we focused on the intraclass correlation coefficient that quantifies the homogeneity level in responses by individuals from the same group for the outcome of a one-way random-effects model, which is often useful as an initial step in the development of a multilevel analysis. Accordingly, the ICC(1) index, introduced by Fisher (1938), is the most frequently adopted measure of intraclass correlation. A comprehensive review of various inference procedures for the ICC in the one-way random-effects model was given by Donner (1986). Moreover, an extensive description of various methods for estimating and testing the within-groups and between-groups variances was presented in Searle, Casella, and McCulloch (1992). The purposes of the present study were (a) to explicate the essential issues concerning the estimation of ICC, and (b) to provide specific guidance for the use of proper ICC index in the light of new empirical evidences based on simulation technique. 0.063759 0.041917 0.023034 0.006223 −0.009490 −0.023619 −0.037888 −0.049613 −0.061150 −0.074410 −0.081787 −0.090286 −0.095447 −0.101594 −0.106299 −0.104033 −0.101521 −0.091493 −0.077342 −0.050889 −0.014940 0.037186 0.007477 −0.018711 −0.041795 −0.063138 −0.082057 −0.100120 −0.115218 −0.129277 −0.143853 −0.152420 −0.161014 −0.165508 −0.170155 −0.172178 −0.166232 −0.158582 −0.141216 −0.117232 −0.076458 −0.022268 0.067002 0.044562 0.024420 0.007667 −0.009490 −0.026342 −0.040228 −0.054630 −0.067756 −0.078682 −0.088148 −0.097730 −0.105590 −0.111976 −0.113498 −0.114432 −0.110571 −0.101177 −0.085953 −0.056408 −0.016491 0.039329 0.009193 −0.017895 −0.041060 −0.063435 −0.084796 −0.102587 −0.119937 −0.135419 −0.148031 −0.158560 −0.168433 −0.175620 −0.180779 −0.179889 −0.177056 −0.168241 −0.151939 −0.127172 −0.083098 −0.024252 0.086986 0.061296 0.036878 0.015692 −0.005943 −0.026351 −0.043018 −0.060379 −0.075783 −0.088404 −0.102256 −0.112250 −0.121990 −0.127277 −0.129185 −0.127992 −0.122973 −0.112605 −0.093294 −0.061256 −0.017316 0.051589 0.019523 −0.010710 −0.037663 −0.063879 −0.088472 −0.108729 −0.129223 −0.147017 −0.161189 −0.176245 −0.186532 −0.195793 −0.199802 −0.199079 −0.194387 −0.184157 −0.166284 −0.136623 −0.089314 −0.025345 Despite the widespread use and apparent utility of ICC in several different fields of research, the most fundamental interpretation is that it is a measure of the proportion of the total variance in the outcome that is accounted for by the group membership. In multiple linear regression, it is well known that the coefficient of determination R2 represents the overall usefulness of the regression model, making it an appealing measure for the strength of association or the percentage of total variance that is explained by the predictor variables. However, the advocated contention regarding the role of R2 in linear regression models does not generalize to a one way random-effects model in a straightforward manner. Interestingly, the corresponding measure for the proportionate reduction in error is denoted by bη2 in the context of ANOVA designs, and both R2 and bη2 have the identical formulation, as was noted by Maxwell, Camp, and Arvey (1981). Although bη2 is widely used in organizational research to draw inferences about group membership cohesion, it was shown in Bliese and Halverson (1998) that the sample eta-squared estimator η2 is an inaccurate b measure of the amount of the total variance that is due to the group level properties. Specifically, their simulation study revealed that bη2 provides a positively biased estimate of ICC and that the performance varies with group size and underlying magnitude of population intraclass correlation. Although the inappropriateness of bη2 as an estimate of ICC was effectively demonstrated in the empirical results of Bliese and Halverson, there remains the problem: Which other indices should be used, and which ones are preferable under particular conditions? Accordingly, Bliese and Halverson (1998) suggested the corrected eta-squared Eta2(C) formula as a modification of bη2 to provide more accurate estimates of the intraclass correlation. Although the proposed estimator Eta2(C) may be useful in many settings, it was noted that Eta2(C) systematically underestimates the population group level property and that the magnitude of underestimate is caused without adjusting the degrees of freedom associated with a sample when drawing inferences about the population intraclass correlation. Instead, Bliese and Halverson recommended the well-known ICC(1) formula because ICC(1) estimates are not biased by either group size or the number of groups in the sample. Since the empirical investigation of Bliese and Halverson focused on the behavior of η2, they b did not present any numerical examinations regarding the properties of Eta2(C) and ICC(1) in their estimation of intraclass correlation. From a methodological standpoint, the prescribed features of Eta2(C) and ICC(1) in Bliese and Halverson, therefore, should be clarified. In addition, the following three caveats seem to have been overlooked. First, although ICC(1) is strongly tied to normal theory and unbiased estimation of variance components, the derivation of Eta2(C) is not clearly conveyed relative to the existing estimation principles and formulas. Second, a serious disadvantage of the two estimators Eta2(C) and ICC(1) is that they can assume negative values even though ICC is defined as a nonnegative parameter. In practice, the estimate is often set equal to zero when this occurs. Although this simple and intuitive adjustment is of practical meaning, the fundamental behavior of the ICC estimator is inherently altered. Third, regarding the estimation appraisal, unbiasedness is certainly not the only criterion of theoretical importance. Mean square error (MSE) is another useful performance criterion obtained by incorporating the bias (accuracy) and variability (precision) of an estimator. Realistically, however, the existing results are arguably not detailed enough to demonstrate the explicit and relative estimation performance between the two indices. In the present research, we took up the aforementioned methodological and theoretical issues with technical clarifications and numerical investigations. Finally, Monte Carlo simulation study was conducted to assess the bias and MSE of the truncated counterparts of Eta2(C) and ICC(1) for negative values under several model configurations in terms of varied group structure and magnitude of intraclass correlation. The present account helps to clarify their unique and contrasting behavior and to choose an appropriate ICC measure in the context of studying the degree of clustering. Measures of intraclass correlation In a two-level study design, suppose the response variable is measured on each of ni individuals within each of g groups that arise from the one-way random effects model assuming the following form: given by σg2 þ σ"2, where σg2 represents the between-groups variance and σ"2 represents the within-groups variance. Accordingly, the ICC ρ is defined as ρ ¼ σg2= σg2 þ σ"2 , which can be interpreted as a simple correlation coefficient Corr (Yij, Yij’) between any two observations, Yij and Yij’, in the same group with j ≠ j’. We are interested in assessing the magnitude of ICC, which is directly interpretable as the proportion of the total variance of the response that is accounted for by the clustering or group cohesion. where SSB is the between-groups sum of squares, and SST is the total sum of squares. Although bη2 accurately describes the effect of group membership in a sample, it systematically overestimates ICC (Bliese & Halverson, 1998). The well-established ANOVA estimator bρA is obtained by replacing variance parameters in population ICC ρ with corresponding unbiased estimators: where MSB is the between-groups mean square, MSW is the g within-groups mean square, n0 ¼ N P ni2=N =ðg 1Þ; i¼1 g N ¼ P ni is the total number of observations, and F* 0 MSB/ i¼1 Table 2 The bias of ICC iNnd¼ice2s0f0orðng¼¼540Þand Group Size Structure 0.006112 −0.000862 −0.006034 −0.012225 −0.020230 −0.027562 −0.036033 −0.045367 −0.054033 −0.063636 −0.071081 −0.077796 −0.086447 −0.091229 −0.094891 −0.095397 −0.093118 −0.086670 −0.073959 −0.048859 −0.014521 0.003033 −0.015910 −0.030454 −0.044496 −0.059061 −0.072111 −0.085420 −0.098606 −0.110586 −0.122564 −0.131833 −0.139577 −0.148293 −0.152451 −0.154479 −0.152197 −0.145744 −0.133180 −0.111822 −0.073378 −0.021695 MSW. For a group size of equal number, ni 0 n, it is readily apparent that bρA reduces to the familiar formula MSB ICCð1Þ ¼ MSB þ ðn Although bρA is the most frequently adopted measure of intraclass correlation, it is not unbiased (Donner, 1986). Unfortunately, this general result contradicts the position of Bliese and Halverson that ICC(1) is not biased by either group size or the number of groups. Such conflicting results will be discussed later in the numerical investigation. Note that Olkin and Pratt (1958) derived the minimum variance unbiased estimator of the intraclass correlation, but its use has been impeded by the lack of a closed form expression. The corresponding computation requires a special purpose computer program (see, e.g., Donoghue & Collins, 1990). Apparently the computational complexity of the unbiased estimator has resulted in limited acceptance for practical use. Thus, it is worthwhile to consider alternative formulas that might yield similar results with less computation. Alternatively, the maximum likelihood estimator bρM of ρ can be derived, although an explicit expression generally 0.006619 −0.001105 −0.006738 −0.013321 −0.021807 −0.030141 −0.040701 −0.050266 −0.060300 −0.068735 −0.078283 −0.086244 −0.094372 −0.098157 −0.102811 −0.103688 −0.101316 −0.094413 −0.081207 −0.054263 −0.016146 0.003305 −0.016212 −0.031093 −0.045407 −0.060301 −0.074218 −0.089263 −0.102756 −0.115918 −0.126966 −0.138156 −0.147152 −0.155643 −0.158948 −0.162163 −0.160549 −0.154073 −0.141377 −0.119888 −0.079711 −0.023725 0.059756 0.034656 0.012869 −0.007387 −0.028802 −0.046770 −0.063765 −0.079014 −0.093468 −0.105734 −0.118549 −0.127686 −0.134336 −0.140529 −0.141644 −0.140476 −0.133989 −0.122011 −0.101866 −0.066561 −0.019153 0.031544 −0.001356 −0.030265 −0.056856 −0.083135 −0.105845 −0.126897 −0.145713 −0.162646 −0.176998 −0.190907 −0.200781 −0.207366 −0.212266 −0.211475 −0.206726 −0.195347 −0.176291 −0.146213 −0.095479 −0.027661 Table 3 The bias of ICC indices for g ¼ 10 and N ¼ 50 ðn ¼ 5Þ Group Size Structure 0.040430 0.021726 0.008721 −0.000804 −0.007738 −0.013300 −0.018631 −0.022554 −0.025994 −0.029491 −0.032805 −0.034672 −0.035881 −0.037051 −0.036726 −0.034275 −0.032257 −0.027931 −0.021124 −0.012217 −0.002759 0.041376 0.022530 0.009178 −0.000361 −0.008178 −0.013904 −0.019630 −0.025247 −0.029714 −0.034535 −0.037516 −0.040376 −0.041911 −0.042699 −0.043349 −0.040767 −0.037458 −0.033449 −0.025105 −0.014946 −0.003358 0.030547 0.006808 −0.010744 −0.023660 −0.034064 −0.041737 −0.048793 −0.055284 −0.060165 −0.065062 −0.067721 −0.069915 −0.070355 −0.069691 −0.068463 −0.063466 −0.057228 −0.049722 −0.037005 −0.021584 −0.004816 0.058236 0.036237 0.017448 0.002056 −0.010448 −0.021566 −0.031613 −0.039272 −0.047902 −0.052596 −0.057896 −0.061135 −0.063899 −0.063305 −0.062957 −0.059778 −0.054829 −0.047159 −0.035804 −0.020691 −0.004688 0.043240 0.016995 −0.005492 −0.024008 −0.039045 −0.052167 −0.063575 −0.072217 −0.081294 −0.086023 −0.090879 −0.093320 −0.094875 −0.092622 −0.090171 −0.084397 −0.076286 −0.064807 −0.048769 −0.027929 −0.006291 0.029886 0.006245 −0.011003 −0.023969 −0.033598 −0.041106 −0.047839 −0.052614 −0.056485 −0.060053 −0.063009 −0.064180 −0.064279 −0.063948 −0.061684 −0.056751 −0.051805 −0.043893 −0.032745 −0.018616 −0.004153 does not exist. However, much of the complexity is considerably simplified if each group is of the same size. As was noted by Donner (1986) concerning the special case of ni 0 n, the maximum likelihood estimator bρM coincides with the pairwise estimator bρP , defined as the Pearson productmoment correlation computed over all possible pairs of observations that can be constructed within groups: Moreover, in line with the estimation of ICC, the corrected eta-squared estimator Eta2(C) suggested by Bliese and Halverson (1998, Eq. 4) can be written in our notation as It is straightforward, then, to show from Eq. 2 that bη2 ¼ SSB=SST ¼ F =fF þ gðn 1Þ=ðg 1Þg for SST 0 SSB + SSW, MSB 0 SSB/(g – 1), and MSW 0 SSW/{g(n – 1)} under balanced design. Hence, an alternative form of Eta2(C) given in Eq. 6 can be readily established by using the new expression of bη2 that Eta2(C) 0 {F* – 1 – 1/(g – 1)}/{F* + n – 1 + (n – 1)/(g – 1)}. Consequently, the formulation of Eta2(C) yields the equivalence property for the three estimators Eta2ðCÞ ¼ bρP ¼ bρM . Interestingly, this fundamental correspondence among the three indices of Eta2(C), bρP and bρM was not addressed in Bliese and Halverson. In view of the strong connection of Eta2(C) with the maximum likelihood principle, it is natural to present a simplified approximation for the case of unequal group sizes with the advantage of giving accessible expression. Accordingly, the following formula with computational ease and general accessibility is considered Although a similar notion was presented in Bliese and Halverson, no numerical evidence was provided to demonstrate its estimation property. Thus, it is easy to see from Eqs. 4, 5 and 6 that Eta2ðCÞ ð¼ bρPÞ and ICC(1) are virtually equivalent if the number of groups g is large, although Eta2(C) is always slightly less than ICC(1) for all values of g > 1. Although there are several comparative studies about the finite-sample properties of the aforementioned estimators (e.g., Bliese & Halverson, 1998; Donner & Koval, 1980; Swiger, Harvey, Everson, & Gregory, 1964), they appear to overlook the fact Table 4 The bias of ICC indices for g ¼ 10 and N ¼ 500 ðn ¼ 50Þ Group Size Structure 0.003703 −0.000724 −0.002105 −0.004339 −0.006790 −0.010553 −0.013988 −0.016939 −0.020367 −0.024954 −0.026395 −0.029165 −0.031594 −0.032192 −0.033590 −0.032443 −0.030226 −0.026428 −0.020785 −0.012264 −0.002750 0.002560 −0.007134 −0.012461 −0.018084 −0.023463 −0.029654 −0.035135 −0.039748 −0.044405 −0.049819 −0.051703 −0.054514 −0.056522 −0.056264 −0.056331 −0.053317 −0.048613 −0.041664 −0.032068 −0.018580 −0.004140 0.003831 −0.000868 −0.002893 −0.005363 −0.008356 −0.012434 −0.015945 −0.020141 −0.024371 −0.028059 −0.031578 −0.035008 −0.036857 −0.039413 −0.039046 −0.038557 −0.036352 −0.031510 −0.024820 −0.014728 −0.003377 0.002650 −0.007291 −0.013201 −0.019019 −0.024884 −0.031351 −0.036894 −0.042685 −0.048134 −0.052668 −0.056631 −0.060129 −0.061628 −0.063400 −0.061762 −0.059484 −0.054905 −0.046959 −0.036346 −0.021246 −0.004832 0.040675 0.019723 0.002781 −0.010845 −0.022985 −0.033868 −0.043239 −0.051911 −0.059563 −0.065700 −0.071498 −0.075914 −0.078591 −0.080164 −0.078759 −0.074874 −0.069559 −0.060039 −0.045925 −0.026552 −0.006122 b2C permit negative values whenever F* < 1 that bρA and η or F* < g/(g – 1), respectively. In practice, the estimate is set to zero when this occurs. This modification incurs a truncated form of estimator, and the associated estimation performance differs from the original counterpart, especially when the underlying ICC is small. In fact, it was reported in Bliese (2000) that the ICC(1) values are typically between 0.05 and 0.20, which implies that the true magnitudes of ICC tend to be small in practical applications. Consequently, the existing numerical results for untruncated formulas do not necessarily apply to the truncated versions of the two estimators. To extend the concept and applicability of the ICC indices, it is prudent to investigate the estimation behavior of their truncated forms. Numerical study In this section, we consider the performance appraisal in point estimation of ICC following the notion of choosing a profound index for the level of reliability. In view of the common practice of reporting zero estimates of ρ when ICC measures yield negative values, it is natural to consider the truncated version for such occurrence. The corresponding modified formulas bρTA and bη2TC for the two prominent indices bρA and bη2C defined in Eqs. 3 and 7 are defined as respectively. Although this natural modification is intuitive 2 and heuristic, the theoretical properties of bρTA and bηTC are substantially different from those of their counterparts, bρA 2 and bηC , when the underlying coefficient parameter ρ is small. However, a unified and rigorous presentation of these truncated measures of ρ does not exist to our knowledge. For pedagogical purposes, it is constructive to provide informative results that not only permit new insights into their relationships but also allow clear representations of various methodological issues. Note that the estimators bρA and bη2C 2are functions of F*, as are the two modifications bρTA and bηTC It follows from the model assumption with balanced group sizes ni 0 n that the F* statistic is distributed as a multiple of an F distribution F 0.028632 0.002813 −0.018112 −0.034973 −0.049626 −0.062462 −0.073184 −0.082846 −0.091035 −0.097389 −0.102944 −0.106819 −0.108556 −0.108808 −0.105668 −0.099467 −0.091335 −0.078190 −0.059506 −0.034276 −0.007875 0.126871 0.140086 0.156674 0.174914 0.191161 0.206616 0.220869 0.232773 0.242599 0.252932 0.258661 0.263807 0.265233 0.265266 0.263710 0.253199 0.240586 0.218076 0.187181 0.132226 0.051079 0.089515 0.104359 0.126649 0.151800 0.175451 0.198060 0.219414 0.238065 0.254201 0.270399 0.281411 0.291430 0.296829 0.300634 0.301729 0.293074 0.281121 0.257323 0.222667 0.159427 0.062428 0.133112 0.145052 0.160605 0.178628 0.195257 0.210428 0.224894 0.238514 0.249982 0.259515 0.266743 0.272634 0.276465 0.276603 0.271952 0.266337 0.252732 0.230615 0.197833 0.140040 0.053840 0.094334 0.107948 0.129519 0.154568 0.178874 0.201982 0.223384 0.243910 0.261740 0.276870 0.289664 0.300768 0.308705 0.312825 0.310909 0.306847 0.294049 0.271103 0.234880 0.168739 0.065997 0.167778 0.174133 0.184015 0.197357 0.211774 0.225851 0.241047 0.254138 0.266261 0.277532 0.286215 0.292647 0.297787 0.298265 0.295369 0.285841 0.271348 0.249634 0.212656 0.151303 0.057369 0.120544 0.129706 0.146398 0.168087 0.191747 0.214796 0.237872 0.259121 0.278439 0.295883 0.311136 0.322756 0.332836 0.337392 0.337194 0.329905 0.316285 0.293104 0.252089 0.181783 0.070089 Table 5 The root mean squared error of ICC indices for g ¼ 4 and N ¼ 20 ðn ¼ 5Þ Group Size Structure (N − g) degrees of freedom. Thus, the bias and MSE of an estimator bρðF Þ of ρ are defined as where the expectation is taken with respect to the distribution of F*. However, the distributional result does not extend directly to unbalanced group size settings; see Searle et al., 1992. Because of the complexity of the estimation problem, analytical justifications of the theoretical discrepancies of the two estimators are generally not feasible. Thus, a large-scale simulation study is employed to assess the estimation properties of the prescribed truncated formulas bρTA and bη2TC given in Eqs. (8) and (9), respectively. Study design To demonstrate the potential extent of characteristics that an applied work may reflect in clustering research, the number of groups and average group size are set as g 0 4 and 10, and n ¼ N =g ¼ 5 and 50, respectively. Moreover, the group sizes are designated to have three different characteristics: equal, linear, and extreme structures so as to exemplify notable circumstances of practical importance. Specifically, the group sizes {ni, i 0 1, …, g} for the combination of g 0 4 and N ¼ 20ðn ¼ 5Þ are: Equal structure: {5, 5, 5, 5}; Linear structure: {2, 4, 6, 8}; Extreme structure: {14, 2, 2, 2}. In the case of g 0 4 and N ¼ 200ðn ¼ 5Þ, the group sizes are chosen as: Equal structure: {50, 50, 50, 50}; Linear structure: {20, 40, 60, 80}; Extreme structure: {194, 2, 2, 2}. Equal structure: {5, 5, 5, 5, 5, 5, 5, 5, 5, 5}; Linear structure: {1, 2, 3, 4, 5, 5, 6, 7, 8, 9}; Extreme structure: {32, 2, 2, 2, 2, 2, 2, 2, 2, 2}. Table 6 The root mean squared error of ICC indices for g ¼ 4 and N ¼ 200 ðn ¼ 50Þ Group Size Structure 0.013350 0.049347 0.078526 0.103505 0.124526 0.144244 0.160106 0.175440 0.187202 0.198804 0.207423 0.214131 0.220902 0.223116 0.222967 0.218452 0.209001 0.193769 0.167460 0.120026 0.046116 0.008141 0.041277 0.070023 0.096094 0.119775 0.142323 0.162077 0.181537 0.197743 0.213728 0.226570 0.237218 0.247812 0.253217 0.255745 0.253016 0.244480 0.228458 0.199434 0.144552 0.056583 0.014501 0.051273 0.082135 0.107847 0.129959 0.149043 0.166379 0.181137 0.194575 0.205096 0.215882 0.223845 0.229392 0.231298 0.231800 0.227135 0.219206 0.203304 0.176145 0.127292 0.049520 0.008901 0.042786 0.073059 0.099914 0.124637 0.146963 0.168416 0.187437 0.205328 0.220308 0.235208 0.247168 0.256710 0.261747 0.265094 0.262532 0.255252 0.238759 0.209235 0.153045 0.060565 0.119797 0.130562 0.148244 0.167467 0.186498 0.205312 0.222946 0.238632 0.253912 0.266164 0.278007 0.285798 0.290336 0.294199 0.290770 0.285366 0.271318 0.249731 0.214228 0.153682 0.058767 0.078759 0.092691 0.118187 0.146282 0.174265 0.201024 0.226225 0.249005 0.270426 0.288560 0.305756 0.318484 0.327359 0.334536 0.334144 0.330067 0.316553 0.293567 0.254353 0.184417 0.071774 Equal structure: {50, 50, 50, 50, 50, 50, 50, 50, 50, 50}; Linear structure: {10, 20, 30, 40, 50, 50, 60, 70, 80, 90}; Extreme structure: {482, 2, 2, 2, 2, 2, 2, 2, 2, 2}. Note that the linear structures are approximately but not exactly linear in group sizes because of a minor adjustment made so that their sum meets the selected total number of observations. We also conducted a simulation study for g 0 100 with n ¼ 5 and 50; however, the general phenomenon between the two indices bρTA and bη2TC was similar to that of g 0 10, and the discrepancies were rather small. To conserve space, the details are not provided here. Accordingly, the performance of bias and MSE of bρTA and 2 bηTC are examined for ρ 0 0 to 0.95, with an increment of 0.05 and 0.99 for each of the 12 combined model configurations of two numbers of groups, two average group sizes, and three group size structures. Without loss of generality, the one-way random effects model with parameter values μ ¼ 1; σg2 ¼ ρ=ð1 ρÞ; and σ"2 ¼ 1 is used as the base for Monte Carlo assessment. With the selected model configurations, the estimates of bias and MSE defined in Eq. 10 are computed through simulation of 100,000 replicate data sets. For each replicate, N observations are generated from the designated one-way random effects model. These values in turn determine the F* statistic and the estimates bρTA and bη2TC . Then, the resulting errors ðbρTA ρÞ and bη2TC ρ , and squared errors ðbρTA ρÞ2 2 ρ , are computed. The simulated bias and MSE 2 and bηTC associated with the two ICC indices are the arithmetic means of the corresponding 100,000 replicated values. The computed biases are presented in Tables 1, 2, 3 and 4 for ðg; nÞ ¼ ð4; 5Þ; ð4; 50Þ; ð10; 5Þ; andð10; 50Þ , respectively. In addition, the corresponding root mean squared errors (RMSE 0 MSE1/2) are summarized in Tables 5, 6, 7 and 8. These numerical results reveal unfamiliar and essential relations between the competing formulas. Empirical results Although it was noted previously that the bη2C index tends to underestimate ρ, it can be readily seen from Tables 1, 2 ,3 2 and 4 that the modified version bηTC incurs slightly positive Table 7 The root mean squared error of ICC indices for g ¼ 10 and N ¼ 50 ðn ¼ 5Þ Group Size Structure 0.077203 0.091038 0.107809 0.122816 0.134992 0.144663 0.151184 0.156988 0.160238 0.160818 0.161210 0.158012 0.153497 0.147126 0.137813 0.125649 0.111141 0.092836 0.068708 0.039268 0.008913 0.063381 0.078514 0.098571 0.116985 0.132239 0.144556 0.153640 0.161452 0.166538 0.168937 0.170844 0.168918 0.165404 0.159791 0.150885 0.138510 0.123478 0.103935 0.077609 0.044737 0.010250 0.078705 0.092546 0.109342 0.125067 0.138483 0.148650 0.157313 0.163122 0.167632 0.169142 0.169768 0.167367 0.163898 0.157069 0.149000 0.136353 0.120557 0.102261 0.075967 0.044259 0.010148 0.064584 0.079784 0.099922 0.118975 0.135523 0.148351 0.159502 0.167694 0.174158 0.177598 0.179784 0.178782 0.176282 0.170288 0.162696 0.149902 0.133540 0.114079 0.085474 0.050241 0.011628 bias when ρ is near zero because of truncation for negative values. In general, however, it remains negatively biased, and the estimation behavior varies with group structures. Unlike the documented claim that modified ICC(1) is not affected by group size and the number of groups (Bliese & Halverson, 1998, p. 168), the performance of bρTA differs across the aforementioned four diverse group characteristics. Explicitly, the extensive empirical results showed that the performance of the truncated indices improves with an increasing number of groups and increasing average group size. On the other hand, it is noteworthy that their accuracy deteriorates as the group size allocation changes from equal, linear to extreme structures. The same phenomenon can be seen for the RMSE results in Tables 5, 6, 7 and 8 as well. Furthermore, there is an important distinction between the two estimators. The absolute bias of bη2TC is smaller than that of bρTA at very small values of ρ. Specifically, the simulated biases associated with four groups and 20 total observations in Table 1 show that Bias bη2TC; ρ < BiasðbρTA; ρÞ for all three group-size structures when ρ ≤ 0.10. In additional, Table 3 (10 groups and 50 total observations) shows the same dominance at ρ 0 0 and 0.05 for equal and linear structures, and at ρ ≤ 0.10 for extreme structure. In the cases of four and 10 groups with the average sample size n ¼ 50 of Tables 2 and 4, the situation occurs only at ρ 0 0 for equal and linear structures, and at ρ 0 0 and 0.05 for extreme structure. In contrast, it can 2 be concluded that bρTA is consistently more accurate than bηTC at values of ρ ≥ 0.15 for all three group size structures with respect to bias assessment. On the other hand, the computed RMSE results in Tables 5, 6, 7, and 8 basically reveal the same 2 phenomenon that bηTC performs better than bρTA for small values of ρ, whereas the opposite is true for moderate and large values of ρ. However, the condition is slightly more prevalent than that in the evaluation of bias. For example, all three group size structures with the number of groups g 0 4 and N 0 20 in Table 5. The dominance situations only occurred at the values of ρ ≤ 0.25 in Tables 6 and 7. Moreover, the results in Table 8 basically maintain almost the same pattern for ρ ≤ 0.25 with the only exception at ρ 0 0.25 for the extreme group size structure. Overall, the truncated 0.108642 0.117766 0.131370 0.146871 0.161665 0.173859 0.184608 0.192277 0.198188 0.201242 0.201992 0.199958 0.196233 0.187682 0.178417 0.163600 0.145377 0.121473 0.091298 0.052779 0.012127 0.089752 0.100542 0.118227 0.138197 0.157158 0.173307 0.187593 0.198410 0.207274 0.212624 0.215301 0.214901 0.212428 0.204586 0.195627 0.180551 0.161328 0.135759 0.102710 0.059823 0.013872 Group Size Structure 0.007438 0.030336 0.049069 0.064754 0.078536 0.090571 0.100408 0.108305 0.115122 0.119736 0.123002 0.123577 0.123818 0.120291 0.116203 0.107326 0.096749 0.081501 0.062190 0.036210 0.008223 0.005794 0.028572 0.046975 0.062939 0.077332 0.090470 0.101553 0.110756 0.119002 0.125354 0.129764 0.131766 0.133182 0.130537 0.127183 0.118561 0.107689 0.091549 0.070459 0.041422 0.009512 0.007657 0.032134 0.052162 0.069436 0.084285 0.097413 0.107773 0.117004 0.124124 0.129420 0.133404 0.134870 0.134154 0.132308 0.126309 0.118666 0.107089 0.090588 0.069551 0.040916 0.009527 0.005962 0.030201 0.049924 0.067387 0.082879 0.097075 0.108672 0.119351 0.128040 0.134884 0.140309 0.143272 0.143737 0.143012 0.137650 0.130334 0.118609 0.101184 0.078380 0.046556 0.010961 0.077493 0.091051 0.110191 0.130046 0.147426 0.162264 0.175663 0.185883 0.193650 0.198600 0.202120 0.202617 0.200319 0.195680 0.185344 0.172579 0.155069 0.132144 0.100334 0.058804 0.013900 estimator bη2TC has smaller RMSE for small values of ρ (≤ 0.20) for all group structures, whereas bρTA gives better RMSE for moderate and large values of ρ (≥ 0.35). Hence, the dominance relationship between bρTA and bη2TC is closely related to group structures when the true value of ρ is between 0.20 and 0.35. In other words, each estimator is better only for certain combined configurations of ρ, numbers of groups, average group sizes, and group size structures. Implication for intraclass correlation analysis In the present article, we focused our attention on the two general formulas bρTA and bη2TC for their appealing features of overall accuracy and computational ease. The detailed numerical investigation further helps differentiate the use of a proper ICC index from others under various circumstances. First, the ICC(1) or ρ bA index is used by authors and requested by reviewers almost reflexively in multilevel analysis. The empirical evidence demonstrates that further improvement may be obtained by adopting the corrected eta-squared estimator when the magnitude of an intraclass correlation is small. Second, according to the body of accumulated knowledge in field research (Bliese, 2000), the underlying magnitudes of ICC are typically less than 0.20. Thus, Eta2(C) or bη2C appears to be a more appropriate estimate of the strength of intraclass correlation. With the reported values of F*, the number of groups, and actual group sizes in actual practice, the corrected eta-squared estimators can be readily computed with the prescribed formulas. On the other hand, it can be shown that Eta2(C) and bη2C can be recovered directly from ICC(1) and bρA through the identity n ICCð1Þ Eta2ðCÞ ¼ n þ ð1 ð1 ICCð1ÞÞ=ðg ICCð1ÞÞðn 1Þ=ðg respectively. Thus, for example, an ICC(1) of 0.20 with the number of groups g 0 10 and group size n 0 5 translates into having an Eta2(C) value of 0.17 with Eq. 11. Similarly, the 0.061586 0.077255 0.100805 0.124842 0.145995 0.164287 0.180589 0.193601 0.203788 0.210923 0.216182 0.218257 0.217084 0.213270 0.203324 0.190281 0.172040 0.147391 0.112714 0.066561 0.015871 2 correspondence between bρA and bηC can be established through Eq. 12 for a given value n0. In short, Bliese and Halverson (1998) asserted that ICC(1) and Eta2(C) provide significant improvements over η2 for describing the effect of group b membership. In the present article, we presented a comprehensive treatment to provide operational guidelines and practical implications for choosing between the two indices ICC (1) and Eta2(C) and their extended formulas. The present article concerns the use of two ICC indices as strength of association measures for clustering studies. Despite their routine and common application in empirical studies, the fundamental properties of ICC indices are not sufficiently illustrated in the literature. The well-known ANOVA estimator and the previously suggested corrected eta-squared formulas were closely examined with respect to their point estimation principle, unbalanced data extension, and practical truncation consideration. In view of estimation principle, ICC(1) was obtained by substituting the variance components in population ICC with corresponding unbiased estimators. In contrast, the corrected eta-squared estimator was identical to the maximum likelihood estimator and the pairwise estimator under equal group sizes. Although their expressions were tractable and easy to compute, the intrinsic estimation complexity prevented analytical justification. Therefore, contemporary computer capabilities were used to conduct intensive simulation study for the bias and MSE performance of the two competing formulas. The numerical examinations revealed their critical and subtle discrepancy in estimating population ICC: Their estimation properties varied with the number of groups, average group size, and group size structure. Moreover, the modification of corrected eta-squared estimator performed better for small values of ICC, and the adjusted ICC(1) was preferred for moderate and large ICC values. Recognizing the different behavior of the two estimators helps to clarify the issue of evaluating the strength of the group property and how to choose an appropriate estimate in multilevel analysis. This information may be useful in selecting an appropriate measure of the intraclass correlation when a researcher has a basic conceptual idea about the underlying ICC in conjunction with the fundamental configurations of the obtained sample, such as the number of groups, average group size, and group size structure. Author Note The author thanks Professor T. K. Peng of I-Shou University for helpful comments on an earlier version of this manuscript, and the action editor, Ira Bernstein, and two anonymous referees for constructive suggestions, which resulted in a clearer exposition. Alferes , V. R. , & Kenny , D. A. ( 2009 ). SPSS programs for the measurement of nonindependence in standard dyadic designs . Behavior Research Methods , 41 , 47 - 54 . Bartko , J. J. ( 1976 ). On various intraclass correlation reliability . Psychological Bulletin , 83 , 762 - 765 . Beal , D. J. , & Dawson , J. F. ( 2007 ). On the use of Likert-type scales in multilevel data: Influence on aggregate variables . Organizational Research Methods , 10 , 657 - 672 . Bliese , P. D. ( 1998 ). Group size, ICC values and group-level correlations: A simulation . Organizational Research Methods , 1 , 355 - 373 . Bliese , P. D. ( 2000 ). Within-group agreement, non-independence, and reliability: Implications for data aggregation and analysis . In K. J. Klein & S. W. J. Kozlowski (Eds.), Multilevel theory, research, and methods in organizations: Foundations , extensions, and new directions (pp. 349 - 381 ). San Francisco, CA: Jossey-Bass. Bliese , P. D. , & Halverson , R. R. ( 1998 ). Group size and measurements of group-level properties: An examination of eta-squared and ICC values . Journal of Management , 24 , 157 - 172 . Castro , S. L. ( 2002 ). Data analytic methods for the analysis of multilevel questions: A comparison of intraclass correlation coefficients, rwg(j), hierarchical linear modeling, within- and between-analysis, and random group resampling . The Leadership Quarterly, 13 , 69 - 93 . Cools , W. , Van den Noortgate , W., & Onghena , P. ( 2009 ). Design efficiency for imbalanced multilevel data . Behavior Research Methods , 41 , 192 - 203 . Courrieu , P. , Brand- D 'abrescia, M. , Peereman , R. , Spieler , D. , & Rey , A. ( 2011 ). Validated intraclass correlation statistics to test item performance models . Behavior Research Methods , 43 , 37 - 55 . Donner , A. ( 1986 ). A review of inference procedures for the intraclass correlation coefficient in the one-way random effects model . International Statistical Review , 54 , 67 - 82 . Donner , A. , & Koval , J. J. ( 1980 ). The estimation of intraclass correlation in analysis of family data . Biometrics , 36 , 19 - 25 . Donoghue , J. R. , & Collins , L. M. ( 1990 ). A note on the unbiased estimation of the intraclass correlation . Psychometrika , 55 , 159 - 164 . Fisher , R. A. ( 1938 ). Statistical Methods for Research Workers (7th ed.). Edinburgh, Scotland: Oliver and Boyd. Goldstein , H. ( 2002 ). Multilevel statistical models (3rd ed .). New York, NY : Wiley. Hedges , L. V. , & Hedberg , E. C. ( 2007 ). Intraclass correlation values for planning group-randomized trials in education . Educational Evaluation and Policy Analysis , 29 , 60 - 87 . Hedges , L. V. , & Rhoads , C. H. ( 2011 ). Correcting an analysis of variance for clustering . British Journal of Mathematical and Statistical Psychology , 64 , 20 - 37 . Hoffman , L. , & Rovine , M. ( 2007 ). Multilevel models for the experimental psychologist: Foundations and illustrative examples . Behavior Research Methods , 39 , 101 - 117 . Hofmann , D. A. ( 1997 ). An overview of the logic and rationale of hierarchical linear models . Journal of Management , 23 , 723 - 744 . Hofmann , D. A. ( 2002 ). Issues in multilevel research: Theory development, measurement and analysis . In S. Rogelberg (Ed.), Handbook of research methods in industrial and organizational psychology (pp. 247 - 274 ). New York, NY: Blackwell. Hofmann , D. A. , Griffin , M. A. , & Gavin , M. B. ( 2000 ). The application of hierarchical linear modeling to organizational research . In K. J. Klein & S. W. J. Kozlowski (Eds.), Multilevel theory, research, and methods in organizations: Foundations , extensions, and new directions (pp. 467 - 511 ). San Francisco, CA: Jossey-Bass. Kozlowski , S. W. J. , & Klein , K. J. ( 2000 ). A multilevel approach to theory and research in organizations: Contextual, temporal, and emergent processes . In K. J. Klein & S. W. J. Kozlowski (Eds.), Multilevel theory, research, and methods in organizations: Foundations , extensions, and new directions (pp. 467 - 511 ). San Francisco, CA: Jossey-Bass. LeBreton , J. M. , & Senter , J. L. ( 2008 ). Answers to 20 questions about interrater reliability and interrater agreement . Organizational Research Methods , 11 , 815 - 852 . Maxwell , S. E. , Camp , C. J. , & Arvey , R. D. ( 1981 ). Measures of strength of association: A comparative examination . Journal of Applied Psychology, 66 , 525 - 534 . McGraw , K. O. , & Wong , S. P. ( 1996 ). Forming inferences about some intraclass correlation coefficients . Psychological Methods , 1 , 30 - 46 . Murray , D. M. , Varnell , S. P. , & Blitstein , J. L. ( 2004 ). Design and analysis of group-randomized trials: A review of recent methodological developments . American Journal of Public Health , 94 , 423 - 432 . O 'Connor, B. P. ( 2004 ). SPSS and SAS programs for addressing interdependence and basic levels-of-analysis issues in psychological data . Behavior Research Methods , 36 , 17 - 28 . Olkin , I. , & Pratt , J. W. ( 1958 ). Unbiased estimation of certain correlation coefficients . Annals of Mathematical Statistics , 29 , 201 - 211 . Raudenbush , S. W. , & Bryk , A. S. ( 2002 ). Hierarchical linear models: Applications and data analysis methods (2nd ed .). Thousand Oaks, CA: Sage. Searle , S. R. , Casella , G. , & McCulloch , C. E. ( 1992 ). Variance Components . New York, NY : Wiley. Shrout , P. E. , & Fleiss , J. L. ( 1979 ). Intraclass correlations: Uses in assessing rater reliability . Psychological Bulletin , 86 , 420 - 428 . Snijders , T. A. B., & Bosker , R. J. ( 1999 ). Multilevel analysis: An introduction to basic and advanced multilevel modeling . London, England: Sage. Swiger , L. A. , Harvey , W. R. , Everson , D. O. , & Gregory , K. E. ( 1964 ). The variance of intraclass correlation involving groups with one observation . Biometrics , 20 , 818 - 826 . Tasoluk , B. , Droge , C. , & Calantone , R. J. ( 2009 ). Interpreting interrelations across multiple levels in HGLM models: An application in international marketing research . International Marketing Review , 28 , 34 - 56 . Wampold , B. E. , & Serlin , R. C. ( 2000 ). The consequence of ignoring a nested factor on measures of effect size in analysis of variance . Psychological Methods , 5 , 425 - 433 .


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.3758%2Fs13428-012-0188-y.pdf

Gwowen Shieh. A comparison of two indices for the intraclass correlation coefficient, Behavior Research Methods, 2012, 1212-1223, DOI: 10.3758/s13428-012-0188-y