A hybrid deep learning framework for air quality prediction with spatial autocorrelation during the COVID-19 pandemic (pdf)

Article PDF cannot be displayed. You can download it here:

https://www.nature.com/articles/s41598-023-28287-8.pdf

A hybrid deep learning framework for air quality prediction with spatial autocorrelation during the COVID-19 pandemic

www.nature.com/scientificreports OPEN A hybrid deep learning framework for air quality prediction with spatial autocorrelation during the COVID‑19 pandemic Zixi Zhao 1, Jinran Wu 2 , Fengjing Cai 1* , Shaotong Zhang 3 & You‑Gan Wang 2 China implemented a strict lockdown policy to prevent the spread of COVID-19 in the worst-affected regions, including Wuhan and Shanghai. This study aims to investigate impact of these lockdowns on air quality index (AQI) using a deep learning framework. In addition to historical pollutant concentrations and meteorological factors, we incorporate social and spatio-temporal influences in the framework. In particular, spatial autocorrelation (SAC), which combines temporal autocorrelation with spatial correlation, is adopted to reflect the influence of neighbouring cities and historical data. Our deep learning analysis obtained the estimates of the lockdown effects as − 25.88 in Wuhan and − 20.47 in Shanghai. The corresponding prediction errors are reduced by about 47% for Wuhan and by 67% for Shanghai, which enables much more reliable AQI forecasts for both cities. Air pollution has long been a major matter of concern in China1. Exposure to harmful air pollution for a long time will result in a range of respiratory ailments, cardiovascular diseases, and even lung cancer in h umans2. Furthermore, high concentrations of air pollutants harm food production and imperil animal survival3. Hence, rational prediction of air quality provides a level of protection for humans and nature. As a typical time series, air quality is affected by not only seasonal factors but also by significant social factors4. For example, at the end of 2019, a new coronavirus broke out in Wuhan, China, which was easily transmitted through the air. To cut off the transmission of the virus, the Wuhan government implemented a 76-day lockdown policy limiting human activities, which in turn positively improved air q uality5,6, because the concentrations of PM10, PM2.5, NO2 and CO from vehicle exhaust and industry decreased dramatically7. According to Lian et al.8, the NO2 concentration and AQI decreased by 53.2% and 33.9%, respectively, during the lockdown period in Wuhan. To some extent, the improvement of air quality during the epidemic is an opportunity to spark new pollution management ideas from the government, such as the scheduling of traffic and industrial production. Therefore, accurate air quality prediction during the epidemic is of social importance. Literature review. Air quality prediction is a hot topic in the environmental field, and the common prediction methods are three main categories: numerical simulation, statistical methods, and machine learning. Earlier studies on air quality prediction mostly used numerical simulation. Using mathematical knowledge, it builds models to simulate changes in air quality based on chemical and physical processes in the atmosphere. The classical models are the nested air quality prediction modelling s ystem9, weather research and forecasting model10,11, and community multiscale air quality m odel12,13. However, these models place high demands on the dataset and assume that the pollution discharge is constant, which is not true since pollutants are emitted randomly in fact14. Besides, numerical simulation methods often produce complex calculations, which are not user-friendly. In view of these inadequacies, statistical methods to predict air quality have become increasingly popular among researchers. The statistical method does not involve meteorological theories; instead, it mainly explores patterns from the data to construct prediction models15–17. Considering that air quality data is a typical time series, auto regressive moving average model (ARMA) is widely used. Kumar et al.18 used ARMA to predict O3, CO, NO and NO2 1 College of Mathematics and Physics, Wenzhou University, Wenzhou 325035, People’s Republic of China. 2The Institute for Learning Sciences and Teacher Education, Australian Catholic University, Brisbane 4000, Australia. 3Frontiers Science Center for Deep Ocean Multispheres and Earth System, Key Lab of Submarine Geosciences and Prospecting Techniques, MOE and College of Marine Geosciences, Ocean University of China, Qingdao 266100, People’s Republic of China. *email: Scientific Reports | (2023) 13:1015 | https://doi.org/10.1038/s41598-023-28287-8 1 Vol.:(0123456789) www.nature.com/scientificreports/ concentrations, and the model achieved good performance at an urban traffic site in Delhi, India. Regression models are also well suited to address prediction problems. Stadlober et al.19 constructed a multiple linear regression model that combined current data with the next day meteorological forecasts to predict the daily PM10 concentrations, which assisted the government in making traffic control decisions. However, most statistical methods require the independent and dependent variables to be linearly correlated, while there is significant nonlinearity between air quality d ata20. Therefore, statistical methods sometimes do not achieve satisfying results. Machine learning has been a popular choice for air quality forecasting because it is good at dealing with nonlinear problems. Dai et al.21 set up a hybrid model by using a multilayer perception that could predict the PM2.5 concentration and fluctuation in different regions more effectively. Ketu et al.22 combined the adjustment of kernel scales with a support vector m achine23, which allows for an accurate classification of air quality. Lim 24 et al. combined multiple machine learning algorithms to construct a land use regression model for PM2.5 concentration prediction in Seoul, Korea, and experimentally demonstrated that machine learning can further improve model performance. Ma et al.25 used a nonlinear extreme gradient boosting to predict air quality in the U.S. which also measured the importance of the variables. Although machine learning algorithms have usually performed well, they still have limitations in terms of their capacity to make multistep predictions and collect long-term data properties. Deep learning is a branch of machine learning26. Among the many algorithms for deep learning, the long short-term memory network (LSTM) is often used to predict air quality due to its effectiveness in solving longdistance dependence27–29. For example, Li et al.30 used LSTM to predict hourly PM2.5 concentrations in Beijing, and the experimental results proved that the model outperformed ARMA and support vector regression. Cheng et al.31 used a variant of LSTM, the bidirectional LSTM (Bi-LSTM), for air quality prediction at stations with missing data, and the strategy reduced the root mean square error by 35.21% on average. Therefore, given the above, it is viable to adopt deep learning models for air quality studies. Feature selection is often used in combination with deep learning to improve algorithm efficiency. Metaheuristic algorithm (...truncated)