Mixed effects: a unifying framework for statistical modelling in fisheries biology
ICES Journal of
Marine Science
ICES Journal of Marine Science (2015), 72(5), 1245– 1256. doi:10.1093/icesjms/fsu213
Review Article
Mixed effects: a unifying framework for statistical modelling
in fisheries biology
James T. Thorson 1* and Cóilı́n Minto 2
Fisheries Resource Analysis and Monitoring Division, Northwest Fisheries Science Center, National Marine Fisheries Service, National Oceanic
and Atmospheric Administration, 2725 Montlake Boulevard East, Seattle, WA 98112, USA
2
Marine and Freshwater Research Centre, Galway-Mayo Institute of Technology, Dublin Road, Galway, Ireland
*Corresponding author: tel: 1 206 302 1772; fax: 1 206 860 6792; e-mail:
Thorson, J. T., and Minto, C. Mixed effects: a unifying framework for statistical modelling in fisheries biology. – ICES Journal of
Marine Science, 72: 1245 – 1256.
Received 11 June 2014; revised 30 October 2014; accepted 31 October 2014; advance access publication 4 December 2014.
Fisheries biology encompasses a tremendous diversity of research questions, methods, and models. Many sub-fields use observational or experimental data to make inference about biological characteristics that are not directly observed (called “latent states”), such as heritability of phenotypic traits, habitat suitability, and population densities to name a few. Latent states will generally cause model residuals to be correlated, violating
the assumption of statistical independence made in many statistical modelling approaches. In this exposition, we argue that mixed-effect modelling
(i) is an important and generic solution to non-independence caused by latent states; (ii) provides a unifying framework for disparate statistical
methods such as time-series, spatial, and individual-based models; and (iii) is increasingly practical to implement and customize for problemspecific models. We proceed by summarizing the distinctions between fixed and random effects, reviewing a generic approach for parameter estimation, and distinguishing general categories of non-linear mixed-effect models. We then provide four worked examples, including state-space,
spatial, individual-level variability, and quantitative genetics applications (with working code for each), while providing comparison with conventional fixed-effect implementations. We conclude by summarizing directions for future research in this important framework for modelling and
statistical analysis in fisheries biology.
Keywords: Gaussian random field, hierarchical, individual-level variability, integration, latent variable, measurement error, mixed-effects model,
random effects, spatial variation, state space.
Introduction
Counting fish is like counting trees - except they are invisible
and they keep moving
John Shepherd, quoted in Hilborn (2002)
Role of models in fisheries science
Marine populations are dispersed across varied habitats and are
influenced by ecosystem and climatic factors that change over
time (Hilborn and Walters, 1992). Population and community dynamics arise from somatic growth, reproduction, maturation,
natural and fishing mortality, movement, between-species interactions, and spatial variation in habitat quality (Hilborn and
Walters, 1992; Quinn and Deriso, 1999). These processes are
generally not possible to measure directly at the spatial scale of the
population or community. Predicting the net effect of all these processes simultaneously therefore requires statistical models that are
used to reconcile population and community theory with available
data. These statistical models must, explicitly or implicitly, make inference from data to biological characteristics that are not directly
observed (i.e. statistical models are often used to count invisible
moving trees).
The existence of states that are not directly observed (termed
“latent states” in the following) causes complications for many statistical models in fisheries biology (see Millar and Anderson, 2004).
Spatial variability may cause unexplained residuals in data from
multiple sampling units to be statistically correlated (“spatial nonindependence”), and interannual changes in population-dynamics
Published by Oxford University Press on behalf of International Council for the Exploration of the Sea 2014. This work is written by (a) US
Government employee(s) and is in the public domain in the US.
1
1246
(i) Models using random effects are important for inference when
analysing fisheries data that exhibit non-independence.
(ii) Random effects provide a unifying statistical framework for
models that might otherwise seem unrelated, for example,
time-series models for populations, spatial models, genetics
models, and models for variation among individuals;
(iii) Models that include random effects are increasingly easy to
build and customize for specific fisheries problems using publicly available modelling tools and software.
Discussions of random effects are available elsewhere (Searle et al.,
1992; Gelman and Hill, 2007; Pinheiro and Bates, 2009) so we
instead seek to provide an accessible introduction for fisheries biologists using fishery and aquatic examples. The remainder of this
paper is devoted to quickly reviewing inference and parameter estimation for models with random effects, followed by four case-study
applications that include example code for flexible model development and parameter estimation.
Random effects
In this review, we use random effects to broadly refer to parameters
that are assumed to arise from a shared stochastic process, where
the distribution of likely values can be estimated. Fixed effects
refers to other model parameters (including the parameters governing the distribution of random effects) where there is not sufficient
data to estimate these parameters as arising from a shared distribution of likely values. Finally, a mixed-effects model is any model that
has a mix of random and fixed effects (see Gelman, 2005 for more
discussion). For example, an ecologist might analyse data from a
growth experiment involving many individual fish subject to
similar environmental conditions (e.g. Shelton et al., 2013).
Because individual growth arises from a shared stochastic process
(i.e. the conditions of the experiment), parameters representing
growth rate for each fish can be treated as if they are randomly
drawn from a shared distribution of likely values. By treating
growth parameters as if they are drawn from a shared distribution,
the ecologist can then estimate characteristics of the distribution
of likely growth rates (i.e. its mean and standard deviation). The
parameter representing average growth rate is in this case fixed,
while individual growth rate parameters are random (i.e. the
growth rates of individual fish is replicated in the experiment, so
the magnitude of variation in growth rate among individuals can
be estimated).
Random effects are particularly useful thanks to the phenomenon called shrinkage. Shrinkage occurs when random effects are
shrunk towards the average value f (...truncated)