Evaluation of the spatial linear model, random forest and gradient nearest-neighbour methods for imputing potential productivity and biomass of the Pacific Northwest forests

Forestry, Jan 2015

Increasingly, forest management and conservation plans require spatially explicit information within a management or conservation unit. Forest biomass and potential productivity are critical variables for forest planning and assessment in the Pacific Northwest. Their values are often estimated from ground-measured sample data. For unsampled locations, forest analysts and planners lack forest productivity and biomass values, so values must be predicted. Using simulated data and forest inventory and analysis data collected in Oregon and Washington, we examined the performance of the spatial linear model (SLM), random forest (RF) and gradient nearest neighbour (GNN) for mapping and estimating biomass and potential productivity of Pacific Northwest forests. Simulations of artificial populations and subsamplings of forest biomass and productivity data showed that the SLM had smaller empirical root-mean-squared prediction errors (RMSPE) for a wide variety of data types, with generally less bias and better interval coverage than RF and GNN. These patterns held for both point predictions and for population averages, with the SLM reducing RMSPE by 30.0 and 52.6 per cent over two GNN methods in predicting point estimates for forest biomass and potential productivity.

Article PDF cannot be displayed. You can download it here:

https://forestry.oxfordjournals.org/content/88/1/131.full.pdf

Evaluation of the spatial linear model, random forest and gradient nearest-neighbour methods for imputing potential productivity and biomass of the Pacific Northwest forests

Forestry An International Journal of Forest Research Forestry 2015; 88, 131 – 142, doi:10.1093/forestry/cpu036 Advance Access publication 15 October 2014 Evaluation of the spatial linear model, random forest and gradient nearest-neighbour methods for imputing potential productivity and biomass of the Pacific Northwest forests Hailemariam Temesgen1* and Jay M. Ver Hoef2 1 Department of Forest Engineering, Resources and Management, Oregon State University, Corvallis, OR, USA National Marine Mammal Laboratory, NOAA-NMFS Alaska Fisheries Science Center, 7600 Sand Point Way NE, Bldg 4, Seattle, WA 98115-6349, USA 2 *Corresponding author. Tel: +1 5417378548; Fax: +1 5417374613; E-mail: . Increasingly, forest management and conservation plans require spatially explicit information within a management or conservation unit. Forest biomass and potential productivity are critical variables for forest planning and assessment in the Pacific Northwest. Their values are often estimated from ground-measured sample data. For unsampled locations, forest analysts and planners lack forest productivity and biomass values, so values must be predicted. Using simulated data and forest inventory and analysis data collected in Oregon and Washington, we examined the performance of the spatial linear model (SLM), random forest (RF) and gradient nearest neighbour (GNN) for mapping and estimating biomass and potential productivity of Pacific Northwest forests. Simulations of artificial populations and subsamplings of forest biomass and productivity data showed that the SLM had smaller empirical root-mean-squared prediction errors (RMSPE) for a wide variety of data types, with generally less bias and better interval coverage than RFand GNN. These patterns held for both point predictions and for population averages, with the SLM reducing RMSPE by 30.0 and 52.6 per cent over two GNN methods in predicting point estimates for forest biomass and potential productivity. Introduction To manage forest resources in perpetuity, forest analysts, forest managers and policy-makers need to plan for the future using highquality, spatially explicit, up-to-date inventory estimates. Forest biomass and potential productivity are two variables that are critical for land transactions, locating timber resources and new processing facilities, writing silvicultural prescriptions, drafting conservation plans and linking to climate change and carbon accounting (Latta et al., 2010). Yet, due to the prohibitive cost of collecting detailed information overan extensive land base, most forests lack productivity and biomass values for unsampled locations. Forest productivity is measured in a wide variety of ways. In the western US, forest productivity is assessed by the forest inventory and analysis (FIA) program (Czaplewski, 1999; Roesch and Reams, 1999) whereby site trees are selected to produce a site index for forested field plots. The site index is then used in combination with normal yield tables to determine potential productivity, i.e. the maximum potential cubic metre volume per hectare per year (potential mean annual increment; PMAI) that would be produced over the long term at a given site for a normally stocked stand (Hanson et al., 2002). In this study, PMAI was used as response variable to represent productivity, indicating the average annual productivity of wood volume (m3.ha21.year21) that would be realized over time. Forest biomass is important attribute for quantifying the roles of forests as carbon source or sink and for sustainable forest management. The emergence of biomass as a critical variable in assessing sequestration of atmospheric carbon and in providing critical information to forest resource management and policy decisionmaking has focussed attention on its estimation and prediction for non-sampled sites. Due to variation in moisture contents, dry forest biomass estimates (DRYBIOT) are the basis for forest carbon inventories and most international negotiations. In this study, DRYBIOT was used as response variable, and represents the total above-ground oven-dry biomass of live trees.2.5 cm in diameter. Different parametric and non-parametric methods have been proposed for imputing PMAI and biomass for unsampled locations by linking measured ground variables and available auxiliary variables. Recently, nearest-neighbour (NN) methods, such as gradient nearest-neighbour (GNN, Ohmann and Gregory, 2002), k-nearest neighbour, (k-NN, McRoberts et al., 2002) and random forests (RF, Breiman, 2001; Eskelson et al., 2009b), have been developed and have gained widespread use in imputing (augmenting) data for point (mapping) and total (block) predictions. Published by Oxford University Press on behalf of Institute of Chartered Foresters 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US. 131 Received 25 April 2014 Forestry The reasons for the wide use of the NN methods in forestry include: Despite their wide use, NN methods are neither unbiased nor consistent (LeMay and Temesgen, 2005). While in certain forest inventory applications with comprehensive field data, bias has generally not been an issue (Packalen and Maltamo, 2007), recent studies that compared NN methods using light detection and ranging reported bias that ranged from 28.78 to 0.25 per cent and 0.16 to 16.16 per cent when using most similar and RF imputation, respectively (Breidenbach et al., 2010; Gagliasso et al., 2014). In addition, NN methods neither extrapolate nor interpolate well for conditions with limited samples. An alternative to the NN approach to mapping and estimating total biomass and productivity is to use a spatial linear model (SLM), which includes kriging and universal kriging. This approach was initially developed for a similar goal: predicting geographic values or totals for mining resources. Kriging has also been used to preserve spatial and attribute correlation of FIA data (Moeur and Hershey, 1999) and to account for spatial dependence in forest inventory and monitoring (Lappi, 2001). Other applications of kriging in forestry include mapping of forest resources (Gunnarsson et al., 1998), analysis of insect pests (Aleong et al., 1991) and predicting forest biomass and productivity (Ver Hoef and Temesgen, 2013). Numerous other examples and applications can be found in Cressie (1990, 1993). Ver Hoef and Temesgen (2013) compared the suitability and performance of k-NN and SLM theoretically and empirically, and reported that SLM is a better option for point and total prediction. Despite the growing research on GNN and RF and their wide use for mapping and estimating totals for the Pacific Northwest forests, detailed analyses that compare the performance, efficiency and suitability of RF, GNN and SLM for predicting (or mapping) biomass and forest productivity at point and block-level 132 Gradient nearest neighbour GNN methods are also devised for prediction (mapping), (...truncated)


This is a preview of a remote PDF: https://forestry.oxfordjournals.org/content/88/1/131.full.pdf
Article home page: http://forestry.oxfordjournals.org/content/88/1/131.abstract

Hailemariam Temesgen, Jay M. Ver Hoef. Evaluation of the spatial linear model, random forest and gradient nearest-neighbour methods for imputing potential productivity and biomass of the Pacific Northwest forests, Forestry, 2015, pp. 131-142, 88/1, DOI: 10.1093/forestry/cpu036