Evaluation of the spatial linear model, random forest and gradient nearest-neighbour methods for imputing potential productivity and biomass of the Pacific Northwest forests
Forestry
An International Journal of Forest Research
Forestry 2015; 88, 131 – 142, doi:10.1093/forestry/cpu036
Advance Access publication 15 October 2014
Evaluation of the spatial linear model, random forest and gradient
nearest-neighbour methods for imputing potential productivity
and biomass of the Pacific Northwest forests
Hailemariam Temesgen1* and Jay M. Ver Hoef2
1
Department of Forest Engineering, Resources and Management, Oregon State University, Corvallis, OR, USA
National Marine Mammal Laboratory, NOAA-NMFS Alaska Fisheries Science Center, 7600 Sand Point Way NE, Bldg 4,
Seattle, WA 98115-6349, USA
2
*Corresponding author. Tel: +1 5417378548; Fax: +1 5417374613; E-mail: .
Increasingly, forest management and conservation plans require spatially explicit information within a management or conservation unit. Forest biomass and potential productivity are critical variables for forest planning and
assessment in the Pacific Northwest. Their values are often estimated from ground-measured sample data. For
unsampled locations, forest analysts and planners lack forest productivity and biomass values, so values must
be predicted. Using simulated data and forest inventory and analysis data collected in Oregon and Washington,
we examined the performance of the spatial linear model (SLM), random forest (RF) and gradient nearest neighbour (GNN) for mapping and estimating biomass and potential productivity of Pacific Northwest forests. Simulations of artificial populations and subsamplings of forest biomass and productivity data showed that the SLM
had smaller empirical root-mean-squared prediction errors (RMSPE) for a wide variety of data types, with generally
less bias and better interval coverage than RFand GNN. These patterns held for both point predictions and for population averages, with the SLM reducing RMSPE by 30.0 and 52.6 per cent over two GNN methods in predicting point
estimates for forest biomass and potential productivity.
Introduction
To manage forest resources in perpetuity, forest analysts, forest
managers and policy-makers need to plan for the future using highquality, spatially explicit, up-to-date inventory estimates. Forest
biomass and potential productivity are two variables that are critical
for land transactions, locating timber resources and new processing
facilities, writing silvicultural prescriptions, drafting conservation
plans and linking to climate change and carbon accounting (Latta
et al., 2010). Yet, due to the prohibitive cost of collecting detailed information overan extensive land base, most forests lack productivity
and biomass values for unsampled locations.
Forest productivity is measured in a wide variety of ways. In the
western US, forest productivity is assessed by the forest inventory
and analysis (FIA) program (Czaplewski, 1999; Roesch and
Reams, 1999) whereby site trees are selected to produce a site
index for forested field plots. The site index is then used in combination with normal yield tables to determine potential productivity,
i.e. the maximum potential cubic metre volume per hectare per
year (potential mean annual increment; PMAI) that would be produced over the long term at a given site for a normally stocked
stand (Hanson et al., 2002). In this study, PMAI was used as response variable to represent productivity, indicating the average
annual productivity of wood volume (m3.ha21.year21) that
would be realized over time.
Forest biomass is important attribute for quantifying the roles of
forests as carbon source or sink and for sustainable forest management. The emergence of biomass as a critical variable in assessing
sequestration of atmospheric carbon and in providing critical
information to forest resource management and policy decisionmaking has focussed attention on its estimation and prediction
for non-sampled sites. Due to variation in moisture contents,
dry forest biomass estimates (DRYBIOT) are the basis for forest
carbon inventories and most international negotiations. In this
study, DRYBIOT was used as response variable, and represents
the total above-ground oven-dry biomass of live trees.2.5 cm in
diameter.
Different parametric and non-parametric methods have been
proposed for imputing PMAI and biomass for unsampled locations
by linking measured ground variables and available auxiliary variables. Recently, nearest-neighbour (NN) methods, such as gradient
nearest-neighbour (GNN, Ohmann and Gregory, 2002), k-nearest
neighbour, (k-NN, McRoberts et al., 2002) and random forests (RF,
Breiman, 2001; Eskelson et al., 2009b), have been developed and
have gained widespread use in imputing (augmenting) data for
point (mapping) and total (block) predictions.
Published by Oxford University Press on behalf of Institute of Chartered Foresters 2014. This work is written by (a) US Government employee(s) and is
in the public domain in the US.
131
Received 25 April 2014
Forestry
The reasons for the wide use of the NN methods in forestry
include:
Despite their wide use, NN methods are neither unbiased nor consistent (LeMay and Temesgen, 2005). While in certain forest inventory applications with comprehensive field data, bias has generally
not been an issue (Packalen and Maltamo, 2007), recent studies
that compared NN methods using light detection and ranging
reported bias that ranged from 28.78 to 0.25 per cent and 0.16
to 16.16 per cent when using most similar and RF imputation,
respectively (Breidenbach et al., 2010; Gagliasso et al., 2014). In
addition, NN methods neither extrapolate nor interpolate well for
conditions with limited samples.
An alternative to the NN approach to mapping and estimating
total biomass and productivity is to use a spatial linear model
(SLM), which includes kriging and universal kriging. This approach
was initially developed for a similar goal: predicting geographic
values or totals for mining resources. Kriging has also been used
to preserve spatial and attribute correlation of FIA data (Moeur
and Hershey, 1999) and to account for spatial dependence in
forest inventory and monitoring (Lappi, 2001). Other applications
of kriging in forestry include mapping of forest resources (Gunnarsson et al., 1998), analysis of insect pests (Aleong et al., 1991) and
predicting forest biomass and productivity (Ver Hoef and Temesgen, 2013). Numerous other examples and applications can be
found in Cressie (1990, 1993).
Ver Hoef and Temesgen (2013) compared the suitability and
performance of k-NN and SLM theoretically and empirically,
and reported that SLM is a better option for point and total prediction. Despite the growing research on GNN and RF and their
wide use for mapping and estimating totals for the Pacific Northwest forests, detailed analyses that compare the performance, efficiency and suitability of RF, GNN and SLM for predicting (or
mapping) biomass and forest productivity at point and block-level
132
Gradient nearest neighbour
GNN methods are also devised for prediction (mapping), (...truncated)