Spring onion seed demand forecasting using a hybrid Holt-Winters and support vector machine model
Spring onion seed demand forecasting using a hybrid Holt-Winters and support vector machine model
Yihang Zhu 0 1
Yinglei Zhao 0 1
Jingjin Zhang 0 1
Na Geng 1
Danfeng HuangID 0 1
0 Dept. of Plant Science, School of Agriculture & Biology, Shanghai Jiao Tong University , Shanghai , People's Republic of China, 2 Dept. of Industrial Engineering & Management, Shanghai Jiao Tong University , Shanghai , People's Republic of China
1 Editor: A. P. Dimri, Jawaharlal Nehru University , INDIA
Demand for spring onion seeds is variable and maintaining its supply is crucial to the success of seed companies. Spring onion seed demand forecasting, which can help reduce the high operational costs increased by long-period propagation and complex logistics, has not previously been investigated yet. This paper provides a novel perspective on spring onion seed demand forecasting and proposes a hybrid Holt-Winters and support vector machine (SVM) forecasting model. The model uses dynamic factors, including historical seed sales, seed inventory, spring onion crop market price and weather data, as inputs to forecast spring onion seed demand. Forecasting error, i.e. the difference between actual and forecasted demand, is assessed. Two advanced machine learning models are trained on the same dataset as benchmark models. Numerical experiments using actual commercial sales data for three spring onion seed varieties show the proposed hybrid model outperformed the statistical-based models for all three forecasting errors. Seed inventory, spring onion crop market price and historical seed sales are the most important dynamic factors, among which seed inventory has short-term influence while other two have mid-term influence on seed demand forecasting. The absolute minimum temperature is the only factor having long-term influence. This study provides a promising spring onion seed demand forecasting model that helps understand the relationships between seed demand and other dynamic factors and the model could potentially be applied to demand forecasting of other crop seeds to reduce total operational costs.
Data Availability Statement: All relevant data are
within the manuscript and its Supporting
Funding: This research was funded by Shanghai
Agriculture Applied Technology Development
Program of China (www.shac.gov.cn), through
project T20170304 - Construction and Application
of big data system of whole production chain for
green leaf vegetable industry in Shanghai (Y.Zhu, J.
Z. and D.H. were funded), and by Shanghai
Municipal Seed Industry Development Project
(www.shanghai.gov.cn) through project (2017)4-2
Spring onion (Allium fistulosum L., also known as Welsh onion or scallion) probably
originated in north-western China and is widely cultivated throughout South-East Asia and
Europe. Spring onion maintains vegetative growth all year round, except in winter, and is
commercially grown as an annual that is usually sown and/or transplanted as seedlings in early
spring, summer or autumn. The majority of spring onion seeds are bought by seedling
companies, agents or growers? cooperatives.
- Integration and application of vegetable seed
initiation and factory seedling technology system
(Y.Zhu, Y.Zhao and D.H. were funded). The funders
had no role in study design, data collection and
analysis, decision to publish, or preparation of the
There is large demand for spring onion seeds in China. Seed companies aim to provide a
reliable supply of seed in the right quantities at the right time to growers. However, seed
procurement, storage, packaging and logistics are complicated because of various reasons such as
marketing, supply chain and seed deterioration. They require seed companies use significant
manpower and finance to maintain their inventory to satisfy their customers, which leads to
high operational costs. Moreover, it usually takes three years to propagate spring onion seeds
from the parent plant, highlighting the importance of demand forecasting during seed
production planning. Therefore, accurate demand forecasting is crucial for inventory control, in
order to reduce operational costs and ensure market share growth for seed companies, as well
as ensuring sufficient seed supply for customers and growers. It is essential for seed companies
to develop more accurate demand forecasting methods for spring onion seeds (Fig 1).
Since 1990, both empirical and statistical-based models have been developed in an attempt
to forecast demand for a variety of crop seeds and other agricultural products. However, the
performance of these models varies dramatically from case to case, as various factors affect
demand, such as weather conditions, crop prices, seed inventory, and even growers?
Although the term ?demand? is used when referring to forecasting in this paper, the actual
demand is usually considered unknown and sales information is used as an approximation of
]. Thus, sales information is used to forecast future demand, and the terms
?demand? and ?sales? are used interchangeably.
Since demand forecasting problems were first proposed and studied in 1970?s, numerous
statistical-based models such as autoregressive integrated moving average (ARIMA), seasonal
], Holt-Winters [
] and other regression models have been developed. Many
researchers have attempted to forecast demand in agriculture using time-series and
statisticalbased models, such as models for forecasting wheat production [
] or rice yield [
] based on
historical yield and weather data. The forecasting errors were 32.5%-41.6% and 23.6%-25.7%,
respectively. Naidu (2015) used historical sales data and an ARIMA model to forecast
wholesale data for potato and onion crops, for which the forecasting errors were 28.30% and 29.51%,
]. Da Veiga et al. (2014) found the Holt-Winters model performed well for
forecasting the food demand [
]. The forecasting errors were 14.97%-15.66% among various
products. Researches showed that the forecasting performance of statistical-based models is
unstable when applied to actual data, as statistical-based models employ hard computing
based on exact models, and most are based on linear analysis [
]. However, the demand for
agricultural products is usually affected by many non-linear factors, which can significantly
reduce the accuracy of statistical-based models.
The use of advanced machine learning models in forecasting has developed rapidly. For
example, an artificial neural network (ANN) was used to forecast water resource variables in
river systems [
]. The forecasting errors were smaller than 18.00% and the authors suggested
the input independence should be high in order to reduce model outputs uncertainty. Fuzzy
rule-based systems were used to predict storage times for pork based on five pork quality
parameters with the forecasting accuracy of 93.93%-94.41% [
]. A random forest (RF) model
was used to predict sugarcane yield based on simulated biomass indices, observed climate and
seasonal climate prediction indices [
]. The forecasting accuracy reached 95.45%. A support
vector machine (SVM) approach was used to forecast sales of five computer products based on
weekly sales data and the forecasting errors were 4.09%-8.62% [
]. Compared with
statisticalbased models, whose forecasting error is difficult to be lower than 20%, all these advanced
machine learning models have demonstrated better performance in terms of forecasting error
or accuracy within their specific contexts.
2 / 18
Fig 1. Spring onion seed demand forecasting methodology and its importance. The left side shows the relationships between demand forecasting and
other processes. The right side shows the hybrid Holt-Winters (HW) and support vector machine (SVM) demand forecasting methodology proposed in
The non-linearity of ANN models, which are simulated from biological systems, enables
more accurate demand forecasting compared to statistical-based models [
researchers have used advanced machine learning models to address agricultural forecasting
problems. Co and Boosarawongse (2007) found an ANN performed well in forecasting the
weekly export price of Thai rice, but could not explain how agricultural and environmental
factors influence the price of rice [
]. In general, the accuracy and robustness of
statisticalbased models in agricultural contexts vary dramatically from case to case, and advanced
machine learning models generally outperform statistical-based models in most cases.
However, Fortin et al. (2011) suggested advanced machine learning models may not replace
previous statistical-based models, and a good forecasting model should not completely abandon
statistical methods [
The SVM approach was first proposed as a new statistical learning tool for pattern
classification by Boser et al. in 1992 [
]. SVM models have achieved higher accuracy and provided
global optimal solutions with fewer over-fitting problems than ANN models in many areas of
]. Unlike ANN models, which follow the empirical risk minimization principle,
SVM models try to minimize the upper bound of the generalization error rather than
minimizing the training error; this approach is called the structural risk minimization principle. SVM
models offer the advantage of converting complex nonlinear regression problems into linear
3 / 18
regression problems in high dimensional feature space, which is useful for forecasting
Since the introduction of the ?-insensitive loss function, SVM models have been extended
to solve regression problems and become another important tool for forecasting problems
]. Many studies of demand forecasting in areas outside agriculture have demonstrated the
outstanding performance of SVM models based on different kernel functions compared to
ANN models and other models [
]. Kumar and Thenmozhi (2014) also found SVM models
outperformed other models and stated SVM have excellent capability to be used to create
hybrid forecasting models with statistical-based models [
]. In agricultural research, SVM
has been widely applied in image and sensor data detection [
]. However, SVM applications
have not yet been reported for crop product or seed demand forecasting.
Spring onion seed demand forecasting has huge impact on the seed processing and
marketing. Thus, the main objective of this paper is to compare the performances of statistical-based
methods (ARIMA and Holt-Winters), advanced machine learning methods (ANN, random
forest and SVM) and the hybrid model in forecasting the demand for spring onion seeds based
on dynamic factors including historical sales, seed inventory, spring onion price and weather
data. Secondly, the optimal and most accurate method for spring onion seed demand
forecasting is proposed and the influences of dynamic factors on forecasting are discussed. Finally, the
contributions, perspectives and remarks are presented.
Materials and methods
The ARIMA model, also known as the Box-Jenkins model [
], has been a popular forecasting
model since the late 1970?s. The key hypothesis in the ARIMA model is that the future value of the
time series is a linear combination of past values and errors. The model is expressed as follows:
a0 ? a1Dt 1 ? a2Dt 2 ?
? apDt p
where, Dt is the actual value of demand and ?t is the random error at time t, ai and bi are
coefficients, and p and q are integers called the autoregressive and moving average parameters,
respectively. The ARIMA model is a linear data-based approach that adapts parameters from the actual
time series data. Therefore, non-linearity in the data significantly affects the performance of the
ARIMA model; hybrid ARIMA and advanced machine learning models are widely proposed to
be able to deal with non-linear data [
The Holt-Winters model is a speculation smoothing-based forecasting technique proposed in
1960 by Holt and Winters [
]. Unlike ARIMA, which uses parameters in the equations to
address the seasonal trend in the original data, Holt?s linear equations have a built-in seasonal
factor equation that can capture the seasonality directly. The Holt-Winters model is widely
applied to time series that show seasonal increases and decreases. Three smoothing equations
are designed to calculate and estimate the deseasonalized series, trend and seasonal factors.
Unlike ARIMA, Holt-Winters forecast values are generated from iterative steps, instead of
calculations based on fitting to statistical models. There are two methods of seasonal factor
modeling: additive and multiplicative Holt-Winters models. Although the multiplicative
HoltWinters (mul-HW) model cannot be utilized on data with null or negative values, it is
incompatible with ARIMA and necessary to use. Nevertheless, the additive trend and seasonality
4 / 18
As previously stated, SVM models are based on the structural risk minimization principle, and
attempt to minimize the upper bound of the generalization error rather than minimizing the
training error [
]. SVM maps data in a non-linear manner onto a high-dimensional feature
space and conducts linear regression in this space. The regression function is:
y ? ???X? ? b
where, ?(X) is the feature for which data are non-linearly mapped into space X. The
coefficients ? and b are estimated by minimizing the risk function, R(C):
Minimize R?C? ? C
s:t: L??d; y? ?
found by the additive Holt-Winters model are covered by the output of ARIMA models [
and it is not considered. The multiplicative Holt-Winters equations are:
Series : St ? a Dt ? ?1
a??St 1 ? Gt 1?
trend : Gt ? b?St
St 1? ? ?1
Seasonal factors : ct ? g Dt ? ?1
Forecast : ft?m ? ?St ? Gtm?ct N?m
where, N is the length of the seasonal cycle, Dt is the actual value of demand, St is the
deseasonalized series, Gt is the trend, ct is the seasonal factor, ft+m is the forecast value for m periods
ahead, and ?, ? and ? are smoothing constants that are theoretically between 0 and 1. Since
there are no general rules for choosing the smoothing constants and large smoothing constants
will result in less stable forecasts, the optimal values are obtained via iteration by minimizing
the squared one-step prediction error [
where, di is the actual demand value in period i, N is the length of total data. C is the
regularized constant determining the trade-off between the empirical error and the regularization
term, and ? is a prescribed parameter that determines the upper bound of the error penalty.
L?(d,y) is called the ?-insensitive loss function. The first term in Eq (7) is the empirical error
and the second term is used to measure the function flatness.
By introducing Lagrange multipliers, ?i, ?i (?i?i = 0, ?i, ?i 0), and letting the partial
derivatives of ?, b and zi equal zero, the problem can be expressed as:
ai ? ?
?ai ? ai ?
1 XN XN
2 i?1 j?1
aj ? K?Xi; Xj?
ai ? ? 0;
where, K(Xi, Xj) = ?(Xi) ?(Xj) is called the kernel function. The basic advantage of using a
kernel function is avoidance of the problem of seeking and performing mapping ?(X). Hence,
applying kernel functions gives the solution directly, regardless of the actual mapping. Note
that any function that satisfies Mercer?s condition can be used as the kernel function [
Finally, the regression function from Eq (6) can formulated explicitly, as:
y ? f ?X; ai; ai ? ?
ai ? K?X; Xi? ? b
There are various Kernel functions: including linear, radial basis and polynomial. The linear
kernel (K(Xi, Xj) = Xi Xj), the simplest, is equivalent to a statistical autoregressive model. The
radial basis function, (RBF) kernel (K(Xi, Xj) = exp(-?kXi?Xjk2), ? > 0 is a free parameter),
evaluates the similarity of two samples based on their Euclidian distance, is used to find
outliers in a time series, and has proven promising in time series forecasting [
]. The polynomial
kernels (K(Xi, Xj) = (Xi Xj + C)d, where integer d is the degree of the kernel function,
determined before model training) are extremely useful in non-linear data training. Low-degree
polynomial kernels tend to save computing time without sacrificing accuracy while
highdegree polynomial kernels require more computing time yet cannot promise to increase
accuracy. Since d = 1 is equivalent to linear kernel, d is usually set to 2, and is generally smaller than
Training and testing datasets
Commercial spring onion seed sales exhibit complex trends that are affected by biological,
seasonal and economic factors. However, seed companies usually only have limited datasets for
seed demand forecasting. Firstly, only monthly spring onion seed sales data are generally
available. Secondly, seed procurement occurs during specific windows of time, as different varieties
have different growth cycles (i.e. the planting times in each year; e.g. March, June and
September for one variety, and May, August and October for another variety) and seed inventories.
Thus, these seasonal trends present as seasonal peaks and valleys in monthly sales data. It is
worth noting that different varieties of spring onion may have different seasonal trends. For
example, seed variety A may have peak sales between September and November, whereas
variety B may have peak sales in March. Thirdly, both seed price and spring onion crop market
price influence seed sales. However, the seed prices of different spring onion varieties will
remain similar if the seed company has a fixed seed procurement plan [
spring onion seed sales are influenced by fluctuating spring onion crop market prices [
Last but not the least, while data on the climate and weather at the production sites, including
monthly temperature (average, absolute maximum and minimum) and precipitation are
available, the correlations between these meteorological data and seed demand are unclear.
In this study, monthly sales data for three varieties of spring onion seeds between August
2011 and December 2016 from one of the vegetable seed companies in China are selected for
numerical experiments. These three varieties of spring onion combined have covered more
than 85% of the sales amount of spring onion seed in the company. The authors have obtained
6 / 18
Fig 2. Historical monthly sales data for the three spring onion seed varieties between August 2011 and December 2016.
the consent of the company to publicly use these data. The detailed sales data is presented in
Fig 2 and S1 Table. The seed inventory data is presented in S2 Table. The varieties are named
as A, B and C. The three varieties represent three different types of spring onion seed demand:
variety A has high annual demand that increases yearly, variety B has high annual demand that
barely increases yearly, and variety C has relatively low annual demand that increases yearly.
Since variety A has a different growth cycle (one-year rotation cycle starting in January) to the
two other varieties (one-year rotation cycles starting in August), the data from August 2011 to
July 2016 were used as a training set and the remaining data (August to December 2016) were
employed as the testing set to measure forecasting performance. Standard leave-one-out or
10-fold cross-validation could not be used as the data are time-series based; forecasting of the
future could only depend on historical data. The spring onion crop market price data was
obtained from the Chinese agriculture information website (jgsb.agri.cn), which monitors the
price of agricultural products every month. The meteorological data of Shanghai, where the
production sites located, were obtained from the Chinese weather data website (data.cma.cn).
These data are listed in S3 Table.
Although the basic idea of time series forecasting is to investigate patterns in the historical data
and predict future trends, variables like sales can be influenced by one or more dynamic
factors. To address the influence from dynamic factors, advanced machine learning models are
applied. In the case of spring onion seeds, the sales not only follow historical seasonal trends,
but may also be influenced by seed inventories and spring onion crop market prices. As
statistical-based models only use historical data, their results reflect the linearity and seasonal trends
in the data. Inputting the results of statistical-based models into a SVM model is the key point
of forecasting using hybrid SVM models.
7 / 18
In the hybrid forecasting model (Fig 1), the monthly sales training set data is inputted into
the Holt-Winters model. These statistical-based forecasting results, which reflect the linearity
and seasonal trends in the historical data, are then prepared for the SVM model. Then, the
results of the Holt-Winters model for the training set and the dynamic factors are inputted
into the SVM model as variables. Since there are no general rules for selecting the parameters
(C, ?, ?) for the SVM model, the data are learned and the model is tuned to adjust the
parameters to optimal values using a grid search method [
]. The starting point and the boundaries
of the grid search are determined by a method proposed by Frohlich and Zell [
seasonal trends in the sales data, which would probably be recognized as outliers in
statisticalbased models, are learned in the SVM model using the RBF kernel. Thus, during training, the
RBF kernel is used to obtain the forecasting values. In addition, due to the non-linearity of the
sales data, the polynomial kernel (d is set to 2) is used to train the SVM model again, and
recalculate the results after the initial forecasting values are obtained.
The spring onion seed price, seed inventory and weather data values of the current period
are not considered as variables in the hybrid model, as the current values are unknown when
forecasting. Therefore, 27 dynamic factor variables, including historical seed sales, seed
inventories, spring onion crop market prices and weather data are constructed in the hybrid
demand forecasting model (Table 1). For seed inventories and spring onion crop market
prices, short-term, mid-term and long-term information is provided by the variables (t-1)
(t2), MA3 MA6 and (t-12), respectively. Moving averages of the temperature and precipitation
data were not calculated, as these data are already average values for the time periods.
In order to estimate the performance of the hybrid forecasting model, the results of the
ARIMA, mul-HW, RF and ANN forecasting models (as benchmarks) were compared with the
proposed hybrid model. For the RF and ANN models, all 27 dynamic factor variables were
inputted. To determine the effect of spring onion crop market price, seed inventory and
weather data on forecasting accuracy, the results of the proposed hybrid model including
different combinations of dynamic factor variables were compared. The Morris method, which
analyzes the changes in output due solely to changes in a particular input, was used to estimate
the influence and interaction between dynamic factors and the forecasting results [
ARIMA model was run in IBM SPSS Statistics 184.108.40.206 and the other models were run in R
version 3.3.1 on an Intel Core i7 PC running at 2.90 GHz with 4 GB memory. For the advanced
machine learning models and Morris method, the R packages tseries, e1071, neuralnet,
randomForest and sensitivity were used.
Forecasting performance was evaluated using three error measurements: mean absolute
error (MAE), mean squared error (MSE) and mean absolute percentage error (MAPE),
a This row represents the results of the proposed hybrid model with all variables inputted
b S(t-1) refers to seed sales one month before the current month t, S-MA3 = P
variables are expressed in a similar manner.
tt 31 S?i?=3, S-MA6 = Ptt 61 S?i?=6; other
where, Fi and Ai represent the forecasting values and actual demand values, respectively.
Notice that MAPE is calculated from the ratio of absolute error and the actual value, while
MAE and MSE is calculated in terms of the absolute error. This means that MAE and MSE can
compare different forecasting methods based on the same data set, whereas MAPE is able to
compare the forecasting methods even if different data sets are used.
Results and discussion
Forecasting performance of the different models
After all of the models were trained on the training set, forecasting was conducted on the
testing set and the three error measurements were calculated (Table 2 and Fig 3). As shown in
Table 2, the forecasting results for the two time-series models (ARIMA and mul-HW) had
PLOS ONE | https://doi.org/10.1371/journal.pone.0219889
9 / 18
a The one-year growth cycle of variety A starts in January, while the growth cycles of varieties B and C start in August
b Variety B (Sep.?Dec. 2016) refers to the results for variety B excluding the sales value for August 2016.
larger error measurements than the other three models. The mul-HW model outperformed
the ARIMA model for all three seed varieties based on all three error measurements, with the
exception that the MAE and MSE values of variety B were lower for the ARIMA model than
mul-HW model. The MAPE values of both time-series models were relatively large, in some
cases even larger than 100%, suggesting time-series models have unacceptable accuracy for
spring onion seed demand forecasting. In addition, the MAPE for variety B was unreasonably
high, while the MAPE for variety B excluding the sales value for August 2016 (31.50 kg,
mulHW forecast value was 100.03 kg) was 42.28%, which is close to the MAPE of the two other
varieties. This is because the sales value for August 2016 was extremely low compared to the
historical sales data (Fig 3), which is a typical example of non-linearity in spring onion seed
sales data. Overall, these results suggest that statistical-based models cannot be directly applied
for spring onion seed demand forecasting.
Despite the poor performance of the two statistical-based models in processing non-linear
data, the results of the mul-HW model reflected the linear trends in the sales data, which
provides important information for training of?and forecasting by?the proposed hybrid model.
Table 2 compares the forecasting performance of the mul-HW model and proposed hybrid
model. The proposed hybrid model had dramatically lower error measurement values than the
mul-HW model, indicating the forecasting accuracy of the proposed hybrid model is
promising for agricultural production planning and supply chain management in terms of MAPE.
Compared to the three advanced machine learning models (ANN, RF, SVM), the proposed
hybrid SVM model had the best forecasting performance in terms of all three error
measurements. The RF model for demand forecasting consisted of 200 trees with three variables
sampled in each tree; the ANN model was constructed with two hidden layers, each having 200
This analysis raises three points worth discussing. Firstly, the actual sales value for August
2016 was much lower than the other historical sales values, confirming that extreme variations
in spring onion seed sales are possible and do influence the forecasting error of the proposed
hybrid model. Secondly, the MAPE values of the RF model, ANN model and proposed hybrid
SVM model were higher for variety B than the other two varieties, whereas the MAE values of
all three varieties were similar. This suggests that the influence of extreme variation in sales
10 / 18
values is non-negligible and needs to be evaluated using other error measurements. However,
the MAE and MSE values of the proposed hybrid model were much lower for variety B than
the other varieties. Thirdly, with the exception of the sales value for August 2016, the MAPE
values of these three models for variety B are close to the MAPE values for the other two
varieties. Thus, it is believed that the proposed hybrid model remarkably reduces spring onion seed
demand forecasting error, though MAPE may be high?but still acceptable?in some extreme
SVM parameters on forecasting performance
In order to analyze the influence of SVM parameters on the forecasting performance of the
proposed hybrid model, MAPE was selected as the criteria to evaluate the forecasting
performance of different parameters, and variety C was selected because it had the lowest MAPE of
the three varieties. The candidate parameters are ?, ? and C. However, during training and
testing of spring onion seed demand forecasting using the proposed hybrid model, ? was
found have little effect on forecasting error. Thus, ? was set to the default value (? = 0.1).
Table 3 shows the influence of parameters ? and C on the forecasting performance of the
proposed hybrid model for variety C. Generally, when ? was larger than 1E-03, MAPE was
insensitive to ? and C. When ? was within the range of 1E-06?1E-07 and C was larger than 1E
+04, MAPE remained within the small range of 13.90 ? 0.60% and was insensitive to the
parameters C and ?. In the numerical experiment, the parameters of the proposed hybrid
model were set to C = 1E+05, ? = 1E-07 and ? = 0.1, and the MAPE of the forecasting result
was 13.35%. This analysis indicates it is necessary to properly tune these parameters for each
specific case, and the grid search method can be used for this process.
Dynamic factors on forecasting performance
The spring onion crop market price, seed inventory, temperature and precipitation data were
inputted into the proposed hybrid model. These dynamic factors provide additional
information for demand forecasting beyond the results of the historical sales data. Analysis of the
influence and interaction between dynamic factors using the Morris method (factors = 6, r = 4,
where ?factors? is the number of factors and ?r? is the number of repetitions) is shown in Fig 4.
All dynamic factors were located below the dashed line, indicating the factors influence the
output of the model independently?more so than via interaction with other factors. Based on
the results of Morris method, it is clear that historical seed sales, spring onion crop market
price and seed inventory form a cluster that has the highest importance on seed demand, while
the three temperature factors form another cluster with medium importance, and precipitation
has the lowest importance.
Since the dynamic factors influence the output independently, the forecasting performance
of the proposed hybrid model with individual dynamic factor variables omitted was compared
with the results of the original hybrid model that included all 27 variables. In Table 1, each row
shows the MAPE values for forecasting using the proposed hybrid model with the individual
variables excluded. The first row shows the MAPE value of the model that includes all 27
With respect to historical seed sales (S), omission of S(t-1) and S(t-2) increased MAPE from
17.64% to 49.89% and 8.13% to 57.55%, respectively. Exclusion of S-MA3 and S-MA6 led to
the largest increases in MAPE, from 14.90% to 67.24% and 16.72% to 82.50%, respectively.
Exclusion of S(t-12) increased MAPE from 10.70% to 33.50%. Thus, historical seed sales has a
strong influence on forecasting accuracy, with mid-term historical seed sales data having the
11 / 18
Fig 3. Comparison of actual sales and the forecasting results of the different models in the testing set.
With respect to seed inventory (I), omission of I(t-1) and I(t-2) led to the largest increases
in MAPE, from 3.44% to 16.95% and 3.27% to 14.77%, respectively. Exclusion of I-MA3 and
I-MA6 did not alter?or even decreased MAPE?and exclusion of I(t-12) increased MAPE from
1.16% to 4.12%. Thus, seed inventory has a strong influence on forecasting accuracy, with
short-term seed inventory data having the largest influence.
With respect to spring onion crop market price (P), exclusion of P(t-1) and P(t-2) increased
MAPE from 9.37% to 41.57% and 9.00% to 35.50%, respectively. Exclusion of P-MA3 and
P-MA6 led to the largest increases in MAPE, from 14.51% to 45.28% and 11.79% to 61.09%,
respectively. Exclusion of P(t-12) increased MAPE from 6.61% to 26.90%. In short, spring
onion crop market price has a strong influence on forecasting accuracy, with mid-term spring
onion crop market price data having the largest influence.
With respect to average temperature (T), exclusion of T(t-1) increased MAPE from 4.92%
to 16.58% Exclusion of T(t-2) increased MAPE from 4.83% to 16.26%. Exclusion of T(t-12)
increased MAPE from 4.33% to 14.42%. Thus, average temperature has limited influence on
forecasting accuracy, with short-term average temperature data having the largest influence.
With respect to absolute maximum temperature (TX), exclusion of TX(t-1) increased
MAPE from 6.86% to 21.40%, exclusion of TX(t-2) increased MAPE from 6.85% to 21.36%,
and exclusion of TX(t-12) increased MAPE from 5.74% to 17.78%. Therefore, absolute
maximum temperature has limited influence on forecasting accuracy, with the short-term data
having the largest influence.
With respect to absolute minimum temperature (TN), exclusion of TN(t-1) increased
MAPE from 6.67% to 26.82%, exclusion of TN(t-2) increased MAPE from 7.01% to 27.90%,
and exclusion of TN(t-12) increased MAPE from 9.31% to 35.15%. In short, absolute
minimum temperature has a strong influence on forecasting accuracy; with the long-term data
having the largest influence.
MAPE (%) of testing set
13 / 18
A bs olute ma x. tempe ra ture
Abs olute min. tempe ra ture
S pring onion price
A ve ra ge tempe ra ture
Seed inve ntory
H is torica l se ed s ales
P re cipita tion
2 3 4
Fa cto r importa nc e (?*)
Fig 4. Analysis of the influence and interaction between dynamic factors using the Morris method. The horizontal axis is the mean absolute value of the elementary
effect (? ), which represents the influence of a factor on output. The vertical axis is the standard deviation (?), which represents the interaction of a factor with other
factors. The dashed line ? = ? represents point at which interaction with other factors equals the influence on output.
With respect to precipitation (PC), exclusion of PC(t-1) and PC(t-2) did not affect?or even
decreased?MAPE and exclusion of PC(t-12) increased MAPE from 3.01% to 4.10%. In short,
precipitation barely has an effect on forecasting accuracy.
Among all these variables, historical seed sales has the largest influence on forecasting
error, followed by spring onion crop market price and seed inventory, in agreement with the
results of the Morris method. On the other hand, seed inventory, average temperature and
absolute maximum temperature have short-term influences on forecasting error and absolute
minimum temperature has a long-term influence. Moreover, I-MA3, I-MA6 and the
precipitation data could barely influence the forecasting results.
As mentioned in section Training and testing datasets, the three spring onion seed varieties
selected for the numerical experiment represent three types of demand, thus the forecasting
performance for different seed varieties reflects the performance of the proposed hybrid model
for different types of seed demand. As shown in Table 2, the three error measurements of the
14 / 18
hybrid model were larger for varieties A and B than variety C. Noting that varieties A and B
both have high annual demand, this suggests the proposed hybrid model has better forecasting
performance for varieties with low, increasing demand (variety C) than varieties with high,
constant demand (varieties A and B). When the data for August 2016 were excluded, the
MAPE values for varieties A and B were almost similar and the two other error measurements
were similar, indicating the annual demand of a seed variety does not influence the forecasting
performance of the proposed hybrid model. Consider a growing seed company with high
spring onion seed sales. In order to maintain growth in sales over the long term, it is more
important to forecast the trends for varieties that will have increased demand in the future
(variety C in this case) than those with high demand at present (varieties A and B). Thus, the
proposed hybrid model better forecasts the demand for varieties with low, but increasing,
annual demand; thus the hybrid model may help a seed company determine whether a seed
variety with low demand will undergo increased demand, and what the increase in demand
The idea of applying SVM to spring onion seed demand forecasting has been realized in this
paper, and a hybrid Holt-Winters and SVM model for spring onion seed demand forecasting
is proposed. Our analysis showed the proposed hybrid model outperforms statistical-based
models and other two advanced machine learning models, and provides accurate forecasting
results for three varieties spring onion seed with different growth cycles and levels of demand.
In addition, we discussed the forecasting performance for different varieties when different
dynamic factors were used as inputs. Analysis suggested the proposed hybrid model is more
promising for forecasting the demand of seed varieties with low, growing annual demand than
varieties with high, constant annual demand. Seed inventory, spring onion crop market price,
and historical seed sales were the three most important dynamic factors, among which seed
inventory has short-term influence while other two have mid-term influence on seed demand
forecasting. The absolute minimum temperature is the only dynamic factor having long-term
influence. The influence of the parameters of the SVM model on the proposed hybrid model
were also explored. The forecasting performance was insensitive to ?, while ? and C play
important roles in spring onion seed demand forecasting performance. While the proposed
forecasting model is based on spring onion seed production, it could also be applied to other
crop seeds whose demand is also influenced by dynamic factors such as historical sales,
inventories, crop market prices and weather conditions, etc.
In real life, it is difficult to predict and consider the growers? reaction to a specific event
such as an increase in crop market prices or extreme weather events, either of which may be
the reason for the sudden reduction in the sale of variety B in August, 2016. To explore this
point, future works should focus on additional factors, such as weather and region. However,
the key findings of this paper support the use of a hybrid SVM model for high accuracy
demand forecasting to guide decision-making processes in sustainable agricultural
S1 Table. Historical monthly sales data (kg) of the three spring onion seed varieties.
S2 Table. Historical seed inventory data (kg) of the three spring onion seed varieties.
15 / 18
S3 Table. Spring onion market price, temperature and precipitation data.
The authors acknowledge Mr. Ruogang Shen for arranging the data used in this study and
providing comments on this paper.
Conceptualization: Yihang Zhu, Yinglei Zhao, Na Geng, Danfeng Huang.
Data curation: Yihang Zhu, Na Geng.
Formal analysis: Yihang Zhu.
Funding acquisition: Danfeng Huang.
Investigation: Yihang Zhu.
Methodology: Yihang Zhu, Jingjin Zhang, Na Geng.
Project administration: Yihang Zhu, Danfeng Huang.
Resources: Yinglei Zhao.
Software: Yihang Zhu, Na Geng.
Supervision: Jingjin Zhang, Danfeng Huang.
Validation: Yihang Zhu, Yinglei Zhao, Jingjin Zhang, Na Geng, Danfeng Huang.
Visualization: Yihang Zhu.
Writing ? original draft: Yihang Zhu.
Writing ? review & editing: Yihang Zhu.
16 / 18
17 / 18
1. Xu G , Piao S , Song Z. Demand Forecasting of Agricultural Products Logistics in Community Demand Forecasting of Logistics, Agricultural Products in Community, Grey Prediction Model . Am J Ind Bus Manag Song, ZL Am J Ind Bus Manag . 2015 ; 5 : 507 - 517 . https://doi.org/10.4236/ajibm. 2015 .57050
2. Syntetos AA , Babai Z , Boylan JE , Kolassa S , Nikolopoulos K. Supply chain forecasting: Theory, practice, their gap and the future [Internet] . European Journal of Operational Research . 2016 . pp. 1 - 26 . https://doi.org/10.1016/j.ejor. 2015 . 11 .010
3. Nahmias S , Olsen TL . Production and operations analysis . Waveland Press; 2015 .
4. Chris C , Yar M . Holt-Winters forecasting: some practical issues . Stat . 1988 ; 37 : 129 - 140 . Available: http://www.jstor.org/stable/pdf/2348687.pdf
5. Iqbal N , Bakhsh K , Maqbool A , Abid Shohab A . Use of the ARIMA Model for Forecasting Wheat Area and Production in Pakistan . J Agric Soc Sci . 2005 ; 2 : 120 - 122 . Available: http://www.ijabjass.org
6. Confalonieri R , Rosenmund AS , Baruth B. An improved model to simulate rice yield. Agron Sustain Dev . EDP Sciences; 2009 ; 29 : 463 - 474 . https://doi.org/10.1051/agro/2009005
7. Naidu GM . Applicability of arima models in wholesale vegetable market . Int J Agric Stat Sci . 2015 ; 11 : 69 - 72 . https://doi.org/10.4018/ijisscm.2013070105
8. Da Veiga CP , Da Veiga PR , Catapan A , Tortato U , Da Silva W V. Demand forecasting in food retail: a comparison between the Holt-Winters and ARIMA models . WSEAS Trans Bus Econ . 2014 ; 11 : 608 - 614 . Available: https://www.researchgate.net/profile/Claudimar_Veiga/publication/286314562_ Demand_ forecasting_in_food_retail_A_comparison_between_the_Holt-Winters_and_ARIMA_models/ links/58f3b15c0f7e9b6f82e7b608/Demand-forecasting-in-food-retail-A-comparison-between-the-H
9. Du XF , Leung SCH , Zhang JL , Lai KK . Demand forecasting of perishable farm products using support vector machine . Int J Syst Sci . 2013 ; 44 : 556 - 567 . https://doi.org/10.1080/00207721. 2011 .617888
10. Maier HR , Jain A , Dandy GC , Sudheer KP . Methods used for the development of neural networks for the prediction of water resource variables in river systems: Current status and future directions [Internet] . Environmental Modelling and Software . 2010 . pp. 891 - 909 . https://doi.org/10.1016/j.envsoft. 2010 . 02 .003
11. Barbon APAC , Barbon S , Mantovani RG , Fuzyi EM , Peres LM , Bridi AM . Storage time prediction of pork by Computational Intelligence . Comput Electron Agric . 2016 ; 127 : 368 - 375 . https://doi.org/10. 1016/j.compag. 2016 . 06 .028
12. Everingham Y , Sexton J , Skocaj D , Inman-Bamber G . Accurate prediction of sugarcane yield using a random forest algorithm . Agron Sustain Dev . 2016 ; 36 . https://doi.org/10.1007/s13593-016-0364-z
13. Lu C -J. Sales forecasting of computer products based on variable selection scheme and support vector regression . Neurocomputing . 2014 ; 128 : 491 - 499 . https://doi.org/10.1016/j.neucom. 2013 . 08 .012
14. Sipahi R , Olgac N , Breda D. A stability study on first-order neutral systems with three rationally independent time delays . Int J Syst Sci. Taylor & Francis Group; 2010 ; 41 : 1445 - 1455 . https://doi.org/10.1080/ 00207720903353625
15. Co HC , Boosarawongse R. Forecasting Thailand's rice export: Statistical techniques vs . artificial neural networks. Comput Ind Eng . 2007 ; 53 : 610 - 627 . https://doi.org/10.1016/j.cie. 2007 . 06 .005
16. Fortin JG , Anctil F , Parent L E? , Bolinder MA . Site-specific early season potato yield forecast by neural network in Eastern Canada . Precis Agric . Springer US; 2011 ; 12 : 905 - 923 . https://doi.org/10.1007/ s11119-011-9233-6
17. Boser BE , Guyon IM , Vapnik VN . A training algorithm for optimal margin classifiers . Proceedings of the fifth annual workshop on Computational learning theory-COLT ' 92 . New York, New York, USA: ACM Press; 1992 . pp. 144 - 152 . https://doi.org/10.1145/130385.130401
18. Ahmad AS , Hassan MY , Abdullah MP , Rahman HA , Hussin F , Abdullah H , et al. A review on applications of ANN and SVM for building electrical energy consumption forecasting . Renew Sustain Energy Rev. Pergamon; 2014 ; 33 : 102 - 109 . https://doi.org/10.1016/j.rser. 2014 . 01 .069
19. Muller KR , Smola AJ , Ratsch G , Scholkopf B , Kohlmorgen J , Vapnik V . Using support vector machines for time series prediction. Adv kernel methods-support vector Learn MIT Press Cambridge, MA. 1999 ; 243 - 254 .
20. Dai W , Chuang Y-Y , Lu C-J. A Clustering-based Sales Forecasting Scheme Using Support Vector Regression for Computer Server . Procedia Manuf. 2015 ; 2 : 82 - 86 . https://doi.org/10.1016/j.promfg. 2015 . 07 .014
21. Kumar M , Thenmozhi M. Forecasting stock index returns using ARIMA-SVM, ARIMA-ANN, and ARIMA-random forest hybrid models . Int J Banking, Account Financ . 2014 ; 5 : 284 - 308 . https://doi.org/ 10.1504/IJBAAF. 2014 .064307
22. Goldstein A , Fink L , Meitin A , Bohadana S , Lutenberg O , Ravid G. Applying machine learning on sensor data for irrigation recommendations: revealing the agronomist's tacit knowledge . Precision Agriculture . Springer US; 2017 : 1 - 24 . https://doi.org/10.1007/s11119-017-9527-4
23. Suganthi L , Samuel AA . Energy models for demand forecasting-A review [Internet] . Renewable and Sustainable Energy Reviews . 2012 . pp. 1223 - 1240 . https://doi.org/10.1016/j.rser. 2011 . 08 .014
24. Holt CC , Modigliani F , Muth JF , Simon HA , Bonini CP , Winters PR . Planning Production, Inventory and Work Force . Prentice-Hall; 1960 .
25. Koehler AB , Snyder RD , Ord JK . Forecasting models and prediction intervals for the multiplicative HoltWinters method . Int J Forecast . 2001 ; 17 : 269 - 286 . https://doi.org/10.1016/S0169-2070( 01 ) 00081 - 4
26. Gelper S , Fried R , Croux C . Robust forecasting with exponential and holt-winters smoothing . J Forecast . 2010 ; 29 : 285 - 300 . https://doi.org/10.1002/for.1125
27. Ru ?ping S. SVM kernels for time series analysis [Internet] . Universita?tsbibliothek Dortmund. Dortmund: SFB 475 , Universita?t Dortmund; 2001 . https://doi.org/10.1.1.23.9841
28. Chang Y-W , Hsieh C-J , Chang K-W , Ringgaard M , Lin C-J. Training and Testing Low-degree Polynomial Data Mappings via Linear SVM . J Mach Learn Res . 2010 ; 11 : 1471 - 1490 . Available: http://www. jmlr.org/papers/volume11/chang10a/chang10a.pdf
29. Zhu Y , Zhang J , Huang D , Geng N. The optimization of crop seeds packaging production planning based on dynamic lot-sizing model . Comput Electron Agric . 2017 ; 136 : 79 - 85 . https://doi.org/10.1016/j. compag. 2017 . 02 .023
30. Borodin V , Bourtembourg J , Hnaien F , Labadie N. Handling uncertainty in agricultural supply chain management: A state of the art . Eur J Oper Res . 2016 ; 254 : 348 - 359 . https://doi.org/10.1016/j.ejor. 2016 . 03 .057
31. Lu CJ , Lee TS , Chiu CC . Financial time series forecasting using independent component analysis and support vector regression . Decis Support Syst . North-Holland; 2009 ; 47 : 115 - 125 . https://doi.org/10. 1016/j.dss. 2009 . 02 .001
32. Frohlich H , Zell A . Efficient parameter selection for support vector machines in classification and regression via model-based global optimization . Proceedings of 2005 IEEE International Joint Conference on Neural Networks, IJCNN '05 . IEEE; 2005 . pp. 1431 - 1436 vol. 3 . https://doi.org/10.1109/IJCNN. 2005 . 1556085
33. Marko O , Brdar S , Pani? M , S? as?i? I , Despotovi? D , Knez?evi? M , et al. Portfolio optimization for seed selection in diverse weather scenarios . PLoS One . 2017 ; 12 . https://doi.org/10.1371/journal.pone. 0184198 PMID: 28863173