Using Harris hawk optimization towards support vector regression to ozone prediction
Stochastic Environmental Research and Risk Assessment (2022) 36:429–449
https://doi.org/10.1007/s00477-022-02178-2
(0123456789().,-volV)(0123456789().
,- volV)
ORIGINAL PAPER
Using Harris hawk optimization towards support vector regression
to ozone prediction
Robert Kurniawan1 • I. Nyoman Setiawan2 • Rezzy Eko Caraka3,4
•
Bahrul Ilmi Nasution5
Accepted: 13 September 2021 / Published online: 30 January 2022
Ó The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2022
Abstract
As an area experiencing air pollution, especially ozone concentrations that often exceed the threshold or are unhealthy,
JABODETABEK (Jakarta, Bogor, Depok, Tangerang, and Bekasi) seeks to prevent and control pollution as well as restore
air quality. Therefore, this study aims to build a predictive model of ozone concentration using Harris hawks optimizationsupport vector regression (HHO-SVR) in 14 sub-districts in JABODETABEK. This goal is achieved by collecting data on
ozone concentration as a response variable and meteorological factors as predictor variables from the website that provides
the data. Other predictor variables such as time and significant lag detected with partial autocorrelation function of ozone
concentration were also used. Then the variables will be selected using the recursive feature elimination-support vector
regression (RFE-SVR) to obtain a significant predictor variable that affects the ozone concentration. After that, the
prediction model will be built using the HHO-SVR method, support vector regression (SVR) whose parameter values are
optimized with the Harris hawks optimization (HHO) algorithm. When the model has been formed, several evaluation
metrics used to determine the best model include mean absolute error (MAE), root mean square error (RMSE), mean
absolute percentage error (MAPE), Coefficient of Determination (R2), Variance Ratio (VR), and Diebold–Mariano test.
The results of this study indicate that lag 1, lag 2, air temperature, humidity, and UV index are significant predictor
variables of the RFE-SVR results for most sub-districts. In general, the HHO process takes longer than other metaheuristic
algorithms. On average, 7 of the 14 sub-districts using the HHO-SVR model yielded the best predictions with MAE below
10, RMSE and MAPE below 20, R2 around 0.97, and VR around 0.98. Then, the results of the Diebold–Mariano test also
show that the accuracy of the prediction results and the stability of the performance of the HHO-SVR model is better,
especially for the Ciputat and South Bekasi sub-districts. This shows that the two sub-districts are very suitable to use
HHO-SVR in predicting ozone concentrations.
Keywords Ozone SVR HHO RFE JABODETABEK
1 Introduction
& Rezzy Eko Caraka
1
Department of Statistical Computing, Polytechnic Statistics
STIS, 13330, DKI Jakarta, Indonesia
2
Directorate of Statistical Analysis and Development, BPSStatistics Indonesia, 10710, DKI Jakarta, Indonesia
3
National Research and Innovation Agency (BRIN), Gedung
BJ Habibie, 10340 DKI Jakarta, Indonesia
4
Faculty of Economics and Business, Universitas Indonesia,
Campus UI Depok, 16424 Depok, West Java, Indonesia
5
Department of Communication, Informatics, and Statistics,
Jakarta Smart City, 10110, Jakarta, Indonesia
Prior to the Republic of Indonesia’s Government Regulation No. 41 Year 1999, air pollution is the entry of substances, energy, and other components into ambient air by
human activities, so that the quality of ambient air drops to
a certain level which causes ambient air to be unable to
fulfill its function. Ambient air is free air that is in the
troposphere or the atmosphere closest to the earth’s surface. Currently, poor ambient air quality is a problem that
is being faced by various countries in the world.
Good or bad ambient air quality is strongly influenced
by human activities. According to the World Health
Organization (WHO 2006), human activities that are the
123
430
main factors affecting ambient air quality are transportation, industry, agriculture, and energy generation and use.
Most of these activities occur in urban areas and produce
hazardous waste that can increase the concentration of air
pollutants and thus affect ambient air quality (Permadi and
Kim Oanh 2008). Several types of air pollutants affect
ambient air quality, including particulate matter (PM),
ozone (O3), nitrogen dioxide (NO2), and sulfur dioxide
(SO2).
One of the most significant pollutants in the atmosphere
is ozone (Zhao et al. 2015). Ozone is formed by photochemical reactions in the troposphere. Ozone is a secondary pollutant formed from the reaction between
nitrogen oxides (NOx) and volatile organic compounds
(VOCs) in the atmosphere with solar irradiation (Zhang
et al. 2019). In addition, the decomposition process and
ozone concentration are influenced by meteorological
factors with very dynamic changes (Wasi’ah and Driejana
2017). Air temperature, solar radiation, and air pressure can
increase ozone formation, while air humidity can reduce
ozone concentrations (Souza et al. 2018). In addition, the
ozone concentration in the future also tends to be correlated
or influenced by the ozone concentration in the past.
High ground level ozone concentrations can affect
health and the environment (World Bank Group 1998).
Exposure to ozone pollutants can cause decreased performance of the human body due to disruption of the respiratory system. Acute diseases that can occur due to this,
namely eye and nose irritation, respiratory diseases, and
decreased lung function (Zhang et al. 2019). Moreover, the
environmental aspects, especially agricultural crops and
trees, will also experience growth disturbances. The visible
responses of these plants are defoliation and changes in leaf
color, which reduces plant productivity.
Air pollution control has actually become a program of
the Indonesian government through pollution prevention
and control also restoration of air quality. However, these
activities must begin with continuous monitoring and
research to determine developments in the ambient air
condition (Masseran and Safari 2020). The development of
the air condition can be seen by observing data on air
pollutants, one of which is ozone, from time to time. Data
of past, present, and future are types of data that can be
used in observations to determine developments in the air
condition. Past data can be known through the results of
measurements that have been made before, but current and
future data can only be known through prediction
techniques.
Prediction technique is a technique to predict something
that is happening now and in the future. The concentration
of air pollutants such as ozone is one of the conditions that
123
Stochastic Environmental Research and Risk Assessment (2022) 36:429–449
can be predicted through prediction techniques. Ozone
concentrations can be predicted by constructing models
that utilize suitable predic (...truncated)