Improving confidence intervals for normed test scores: Include uncertainty due to sampling variability (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.3758/s13428-018-1122-8.pdf

Improving confidence intervals for normed test scores: Include uncertainty due to sampling variability

Behavior Research Methods (2019) 51:826–839 https://doi.org/10.3758/s13428-018-1122-8 Improving conﬁdence intervals for normed test scores: Include uncertainty due to sampling variability Lieke Voncken1 · Casper J. Albers1 · Marieke E. Timmerman1 Published online: 6 November 2018 © The Author(s) 2018 Abstract Test publishers usually provide confidence intervals (CIs) for normed test scores that reflect the uncertainty due to the unreliability of the tests. The uncertainty due to sampling variability in the norming phase is ignored. To express uncertainty due to norming, we propose a flexible method that is applicable in continuous norming and allows for a variety of score distributions, using Generalized Additive Models for Location, Scale, and Shape (GAMLSS; Rigby & Stasinopoulos, 2005). We assessed the performance of this method in a simulation study, by examining the quality of the resulting CIs. We varied the population model, procedure of estimating the CI, confidence level, sample size, value of the predictor, extremity of the test score, and type of variance-covariance matrix. The results showed that good quality of the CIs could be achieved in most conditions. The method is illustrated using normative data of the SON-R 6-40 test. We recommend test developers to use this approach to arrive at CIs, and thus properly express the uncertainty due to norm sampling fluctuations, in the context of continuous norming. Adopting this approach will help (e.g., clinical) practitioners to obtain a fair picture of the person assessed. Keywords Continuous norming · GAMLSS · Box-Cox power exponential distribution · Posterior simulation · Psychological tests Introduction Norms are needed to give an interpretation of someone’s test score. A normed score can be expressed in different ways, like a percentile and z score. It indicates the person’s relative standing on the test to other people in the population. For instance, the normed scores of intelligence tests are typically expressed as normalized intelligence quotient (IQ) scores, with a population mean of 100 and standard deviation of 15, yielding an immediate interpretation of any observed IQ score. Normed tests are often applied as high-stakes tests, meaning that they are used to make important decisions Electronic supplementary material The online version of this article (https://doi.org/10.3758/s13428-018-1122-8) contains supplementary material, which is available to authorized users. Lieke Voncken 1 Department Psychometrics & Statistics, Faculty of Behavioural and Social Sciences, University of Groningen, Grote Kruisstraat 2/1, 9712 TS Groningen, The Netherlands about individuals. A clear example relates to the fact that mentally retarded individuals are exempted from death penalty in 18 of the United States (Death Penalty Information Center, 2015). Some states, like Idaho and Florida, use IQ scores to identify mental retardation, applying a rigid cutoff (i.e., observed IQ score ≤ 70). Another instance of the use of a rigid cutoff can be found in the Netherlands, where mental retardation indicated by an observed IQ score of 85 or below qualifies for the long-term care act (Zorginstituut Nederland, 2017), allowing the financing of supervised living and debt repayment programs. In using test scores for important individual decisions, it is essential to acknowledge the uncertainty in observed test scores. There is an increasing awareness of the importance of reflecting this uncertainty. For instance, in the fifth edition of the DSM (Diagnostic and Statistical Manual of Mental Disorders; American Psychiatric Association, 2013), unlike earlier editions, a standard error of 5 IQ points was explicitly included in defining the upper range of intellectual disability. These expressions of uncertainty in observed test scores reflect the notion that observed scores may differ across assessments, even if the individual Behav Res (2019) 51:826–839 assessed would remain exactly the same, or two individuals would be exactly the same, on the characteristic measured. In line with this increased awareness, the Dutch Committee on Testing (COTAN) recommends test publishers to report information regarding the accuracy of the test (i.e., standard error of measurement, standard error of estimate, or test information function/standard error) and the appropriate intervals (Evers et al., 2009). Nowadays, many test publishers express this uncertainty related to test reliability, e.g. the WISC-IV (Wechsler, 2003) and the Bayley-III (Bayley, 2006). Nevertheless, this is insufficient for normed scores, because it ignores another source of uncertainty, namely due to the test norming itself. Test norming takes place on the basis of a norming sample, rather than the full population, implying that the norms themselves are due to sampling fluctuations. This source of uncertainty in normed test scores has been acknowledged only recently, with the proposal of two methods to estimate CIs for normed test scores, under the assumption that the norming sample stems from a single population. Crawford et al. (2011) proposed a method to obtain CIs around percentile norms, under the assumption that the scores in the norm population are normally distributed. Recently, Oosterhuis et al. (2017) derived standard errors for four different norm statistics (standard deviation, percentile ranks, stanine boundaries, and z scores), under the assumption that the scores in the norm population stem from a multinomial distribution. As described by Oosterhuis et al. (2016), this method can be applied to residuals of raw test scores in the context of regression-based norming, in which relevant personal characteristics (e.g., age) are used to estimate the raw test score distribution. Even though the method of Oosterhuis et al. (2017) has less strict assumptions than the method of Crawford et al. (2011), it still assumes normally distributed errors and homoscedasticity of the error variances, which are often unrealistic assumptions in practice. For instance, floor- and ceiling effects may introduce skewness. We propose a method to derive CIs indicating uncertainty in normed scores that does not rely on those strict assumptions. To this end, we use the flexible Generalized Additive Models for Location, Scale, and Shape (GAMLSS; Rigby and Stasinopoulos, 2005), which has been advocated as a useful approach to continuous norming (e.g., Bayley-III (Bayley, 2006) and SON-R 2-8 (Tellegen & Laros, 2017)). GAMLSS includes a broad range of distributions, yielding a good chance of finding a well-fitting distribution for empirical normative data. Interestingly, the ordinary linear regression model described by Oosterhuis et al. (2016) is a restricted, special case of a model within the GAMLSS framework. 827 GAMLSS Applying GAMLSS implies that the score distribution is modelled conditional on predictor(s) of interest (e.g., age), based on certain distributional parameters. For instan (...truncated)