Regression Models for Count Data: Illustrations using Longitudinal Predictors of Childhood Injury (pdf)

Article PDF cannot be displayed. You can download it here:

https://academic.oup.com/jpepsy/article-pdf/33/10/1076/2827217/jsn055.pdf

Regression Models for Count Data: Illustrations using Longitudinal Predictors of Childhood Injury

Regression Models for Count Data: Illustrations using Longitudinal Predictors of Childhood Injury* Bryan T. Karazsia, MA and Manfred H. M. van Dulmen, PHD Kent State University Key words count data; injury; regression. Count data with a preponderance of zeros are frequently analyzed by pediatric psychologists. Common examples of such count data include number of patient hospitalizations (Logan, Radcliffe, & Smith-Whitley, 2002), frequency of adolescent alcohol use (Audrain-McGovern, Rodriguez, Tercyak, Neuner, & Moss, 2006), and number of childhood injuries (Morrongiello, Ondejko, & Littlejohn, 2004; Schwebel, Brezausek, Ramey, & Ramey, 2004). Distributions of such data violate fundamental assumptions of many commonly used multivariate statistical techniques [e.g., ordinary least squares (OLS) regression], leading to results that do not accurately reflect the observed data (Hammer & Landau, 1981). Fairly recently, statistical techniques that overcome these problems have been developed (Hall, 2000; Lambert, 1992). Even though these techniques are better suited to handle count data on a dependent variable than for example OLS regression, few pediatric psychologists are familiar with these techniques. The goal of the present article is therefore to illustrate the use of these techniques by offering a practical demonstration using prospective data from the National Institute of Child Health and Human Development (NICHD) Study of Early Child Care. Understanding Count Data A count refers to the number of specified events that occur in a given interval of time. By definition, count data consist of only nonnegative integers. The specified event can include any behavior of interest, and counts are utilized frequently in the field of pediatric psychology. For example, in a recent analysis of service use among adolescents with sickle cell disease, Logan and colleagues (2002) reported frequencies of hospitalizations over a one year period. Data collected from medical chart reviews were summed to create a single variable depicting the number of hospitalizations. As is common with count variables, the authors reported that >50% of participants had not been hospitalized (Logan et al., 2002). In other words, because such a large number of individuals had not experienced this event, we would refer to this count variable as being zero-inflated. Other recent examples within pediatric psychology of such zero-inflated count data include adolescent substance use (Audrain-McGovern et al., 2006), number of sexual partners (Prinstein, Meade, & Cohen, 2003), and children’s history of injuries (Hagan & Kuebli, 2007). *Portions of this article were presented at the 2008 National Conference in Child Health Psychology, Miami, FL. All correspondence concerning this article should be addressed to Bryan T. Karazsia, Department of Psychology, Kent State University, Kent, OH 44242, USA. E-mail: Journal of Pediatric Psychology 33(10) pp. 1076–1084, 2008 doi:10.1093/jpepsy/jsn055 Advance Access publication June 3, 2008 Journal of Pediatric Psychology vol. 33 no. 10 ß The Author 2008. Published by Oxford University Press on behalf of the Society of Pediatric Psychology. All rights reserved. For permissions, please e-mail: Objective To offer a practical demonstration of regression models recommended for count outcomes using longitudinal predictors of children’s medically attended injuries. Method Participants included 708 children from the NICHD child care study. Measures of temperament, attention, parent–child relationship, and safety of physical environment were used to predict medically attended injuries. Results Statistical comparisons among five estimation methods revealed that a zero-inflated Poisson (ZIP) model provided the best fit with observed data. ZIP models simultaneously model dichotomous and continuous outcomes of count variables, and different constellations of predictors emerged for each aspect of the estimated model. Conclusions This study offers a practical demonstration of techniques designed to handle dependent count variables. The conceptual and statistical advantages of these methods are emphasized, and Stata script is provided to facilitate adoption of these techniques. Analysis of Count Data Potential ‘‘Solutions’’ Traditionally, researchers have used two solutions to deal with zero-inflated count data. First, researchers have opted to transform such data. A square root transformation has been recommended for count data (Johnson & Wichern, 1998), though several problems with transformations of count variables are documented (see Sturman, 1999 for review). Most notably, they do not address the high preponderance of zeros, so meaningless values are predicted (e.g., negative values even though counts can be only positive; Hammer & Landau, 1981; Harrison & Hulin, 1989). In addition, transformed data are more difficult to interpret than nontransformed data (Tabachnick & Fidell, 2007). Another commonly used approach is to dichotomize data into groups: those who performed the behavior (nonzero counts) and those who did not (zero counts). For example, one may be interested in the factors that predict whether or not adolescents are hospitalized. This approach is problematic because dichotomization ignores meaningful variation, and as such, occasions to which dichotomization can be applied are rare (MacCallum, Zhang, Preacher, & Rucker, 2002). Alternative Models Fortunately, numerous models have been developed specifically for count data (Long & Freese, 2006; Sano, Jeong, Acock, & Zvonkovic, 2005). These models can handle nonnormality on the dependent variable and do not require the researcher to either dichotomize or transform the dependent variable. We focus on four of these models (Atkins & Gallop, 2007; Long & Freese, 2006; Sano et al., 2005): Poisson, negative binomial, zero-inflated Poisson (ZIP), and zero-inflated negative binomial (ZINB). 1 While there is no explicit assumption about distributions of dependent variables in OLS regression (Tabachnick & Fidell, 2007), they have a strong influence on the distribution of residuals (Atkins & Gallop, 2007). Poisson The Poisson distribution was developed to model discrete counts, and because it is similar to linear regression in many respects, it is relatively easy to interpret.2 This distribution becomes increasingly positively skewed as the mean of the dependent variable decreases (Long & Freese, 2006), reflecting a common property of count data. The apparent simplicity of Poisson comes with two restrictive assumptions (Sturman, 1999). First, the variance and mean of the count variable are assumed to be equal. In reality, however, the variance is usually much greater than the mean (i.e., overdispersion; Cameron & Trivedi, 1986) and therefore Poisson models—though widely used to handle count data—may not be well suited to handle some types of count outcomes. Another restrictive assumption of Poisson models is that occurrenc (...truncated)