Using Harris hawk optimization towards support vector regression to ozone prediction

Stochastic Environmental Research and Risk Assessment, Jan 2022

As an area experiencing air pollution, especially ozone concentrations that often exceed the threshold or are unhealthy, JABODETABEK (Jakarta, Bogor, Depok, Tangerang, and Bekasi) seeks to prevent and control pollution as well as restore air quality. Therefore, this study aims to build a predictive model of ozone concentration using Harris hawks optimization-support vector regression (HHO-SVR) in 14 sub-districts in JABODETABEK. This goal is achieved by collecting data on ozone concentration as a response variable and meteorological factors as predictor variables from the website that provides the data. Other predictor variables such as time and significant lag detected with partial autocorrelation function of ozone concentration were also used. Then the variables will be selected using the recursive feature elimination-support vector regression (RFE-SVR) to obtain a significant predictor variable that affects the ozone concentration. After that, the prediction model will be built using the HHO-SVR method, support vector regression (SVR) whose parameter values are optimized with the Harris hawks optimization (HHO) algorithm. When the model has been formed, several evaluation metrics used to determine the best model include mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), Coefficient of Determination (R2), Variance Ratio (VR), and Diebold–Mariano test. The results of this study indicate that lag 1, lag 2, air temperature, humidity, and UV index are significant predictor variables of the RFE-SVR results for most sub-districts. In general, the HHO process takes longer than other metaheuristic algorithms. On average, 7 of the 14 sub-districts using the HHO-SVR model yielded the best predictions with MAE below 10, RMSE and MAPE below 20, R2 around 0.97, and VR around 0.98. Then, the results of the Diebold–Mariano test also show that the accuracy of the prediction results and the stability of the performance of the HHO-SVR model is better, especially for the Ciputat and South Bekasi sub-districts. This shows that the two sub-districts are very suitable to use HHO-SVR in predicting ozone concentrations.

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007/s00477-022-02178-2.pdf

Using Harris hawk optimization towards support vector regression to ozone prediction

Stochastic Environmental Research and Risk Assessment (2022) 36:429–449 https://doi.org/10.1007/s00477-022-02178-2 (0123456789().,-volV)(0123456789(). ,- volV) ORIGINAL PAPER Using Harris hawk optimization towards support vector regression to ozone prediction Robert Kurniawan1 • I. Nyoman Setiawan2 • Rezzy Eko Caraka3,4 • Bahrul Ilmi Nasution5 Accepted: 13 September 2021 / Published online: 30 January 2022 Ó The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2022 Abstract As an area experiencing air pollution, especially ozone concentrations that often exceed the threshold or are unhealthy, JABODETABEK (Jakarta, Bogor, Depok, Tangerang, and Bekasi) seeks to prevent and control pollution as well as restore air quality. Therefore, this study aims to build a predictive model of ozone concentration using Harris hawks optimizationsupport vector regression (HHO-SVR) in 14 sub-districts in JABODETABEK. This goal is achieved by collecting data on ozone concentration as a response variable and meteorological factors as predictor variables from the website that provides the data. Other predictor variables such as time and significant lag detected with partial autocorrelation function of ozone concentration were also used. Then the variables will be selected using the recursive feature elimination-support vector regression (RFE-SVR) to obtain a significant predictor variable that affects the ozone concentration. After that, the prediction model will be built using the HHO-SVR method, support vector regression (SVR) whose parameter values are optimized with the Harris hawks optimization (HHO) algorithm. When the model has been formed, several evaluation metrics used to determine the best model include mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), Coefficient of Determination (R2), Variance Ratio (VR), and Diebold–Mariano test. The results of this study indicate that lag 1, lag 2, air temperature, humidity, and UV index are significant predictor variables of the RFE-SVR results for most sub-districts. In general, the HHO process takes longer than other metaheuristic algorithms. On average, 7 of the 14 sub-districts using the HHO-SVR model yielded the best predictions with MAE below 10, RMSE and MAPE below 20, R2 around 0.97, and VR around 0.98. Then, the results of the Diebold–Mariano test also show that the accuracy of the prediction results and the stability of the performance of the HHO-SVR model is better, especially for the Ciputat and South Bekasi sub-districts. This shows that the two sub-districts are very suitable to use HHO-SVR in predicting ozone concentrations. Keywords Ozone  SVR  HHO  RFE  JABODETABEK 1 Introduction & Rezzy Eko Caraka 1 Department of Statistical Computing, Polytechnic Statistics STIS, 13330, DKI Jakarta, Indonesia 2 Directorate of Statistical Analysis and Development, BPSStatistics Indonesia, 10710, DKI Jakarta, Indonesia 3 National Research and Innovation Agency (BRIN), Gedung BJ Habibie, 10340 DKI Jakarta, Indonesia 4 Faculty of Economics and Business, Universitas Indonesia, Campus UI Depok, 16424 Depok, West Java, Indonesia 5 Department of Communication, Informatics, and Statistics, Jakarta Smart City, 10110, Jakarta, Indonesia Prior to the Republic of Indonesia’s Government Regulation No. 41 Year 1999, air pollution is the entry of substances, energy, and other components into ambient air by human activities, so that the quality of ambient air drops to a certain level which causes ambient air to be unable to fulfill its function. Ambient air is free air that is in the troposphere or the atmosphere closest to the earth’s surface. Currently, poor ambient air quality is a problem that is being faced by various countries in the world. Good or bad ambient air quality is strongly influenced by human activities. According to the World Health Organization (WHO 2006), human activities that are the 123 430 main factors affecting ambient air quality are transportation, industry, agriculture, and energy generation and use. Most of these activities occur in urban areas and produce hazardous waste that can increase the concentration of air pollutants and thus affect ambient air quality (Permadi and Kim Oanh 2008). Several types of air pollutants affect ambient air quality, including particulate matter (PM), ozone (O3), nitrogen dioxide (NO2), and sulfur dioxide (SO2). One of the most significant pollutants in the atmosphere is ozone (Zhao et al. 2015). Ozone is formed by photochemical reactions in the troposphere. Ozone is a secondary pollutant formed from the reaction between nitrogen oxides (NOx) and volatile organic compounds (VOCs) in the atmosphere with solar irradiation (Zhang et al. 2019). In addition, the decomposition process and ozone concentration are influenced by meteorological factors with very dynamic changes (Wasi’ah and Driejana 2017). Air temperature, solar radiation, and air pressure can increase ozone formation, while air humidity can reduce ozone concentrations (Souza et al. 2018). In addition, the ozone concentration in the future also tends to be correlated or influenced by the ozone concentration in the past. High ground level ozone concentrations can affect health and the environment (World Bank Group 1998). Exposure to ozone pollutants can cause decreased performance of the human body due to disruption of the respiratory system. Acute diseases that can occur due to this, namely eye and nose irritation, respiratory diseases, and decreased lung function (Zhang et al. 2019). Moreover, the environmental aspects, especially agricultural crops and trees, will also experience growth disturbances. The visible responses of these plants are defoliation and changes in leaf color, which reduces plant productivity. Air pollution control has actually become a program of the Indonesian government through pollution prevention and control also restoration of air quality. However, these activities must begin with continuous monitoring and research to determine developments in the ambient air condition (Masseran and Safari 2020). The development of the air condition can be seen by observing data on air pollutants, one of which is ozone, from time to time. Data of past, present, and future are types of data that can be used in observations to determine developments in the air condition. Past data can be known through the results of measurements that have been made before, but current and future data can only be known through prediction techniques. Prediction technique is a technique to predict something that is happening now and in the future. The concentration of air pollutants such as ozone is one of the conditions that 123 Stochastic Environmental Research and Risk Assessment (2022) 36:429–449 can be predicted through prediction techniques. Ozone concentrations can be predicted by constructing models that utilize suitable predic (...truncated)


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1007/s00477-022-02178-2.pdf
Article home page: https://link.springer.com/article/10.1007/s00477-022-02178-2

Kurniawan, Robert, Setiawan, I. Nyoman, Caraka, Rezzy Eko, Nasution, Bahrul Ilmi. Using Harris hawk optimization towards support vector regression to ozone prediction, Stochastic Environmental Research and Risk Assessment, 2022, pp. 429-449, Volume 36, Issue 2, DOI: 10.1007/s00477-022-02178-2