A hybrid deep learning framework for air quality prediction with spatial autocorrelation during the COVID-19 pandemic
www.nature.com/scientificreports
OPEN
A hybrid deep learning framework
for air quality prediction
with spatial autocorrelation
during the COVID‑19 pandemic
Zixi Zhao 1, Jinran Wu
2
, Fengjing Cai
1*
, Shaotong Zhang
3
& You‑Gan Wang
2
China implemented a strict lockdown policy to prevent the spread of COVID-19 in the worst-affected
regions, including Wuhan and Shanghai. This study aims to investigate impact of these lockdowns
on air quality index (AQI) using a deep learning framework. In addition to historical pollutant
concentrations and meteorological factors, we incorporate social and spatio-temporal influences in
the framework. In particular, spatial autocorrelation (SAC), which combines temporal autocorrelation
with spatial correlation, is adopted to reflect the influence of neighbouring cities and historical data.
Our deep learning analysis obtained the estimates of the lockdown effects as − 25.88 in Wuhan and −
20.47 in Shanghai. The corresponding prediction errors are reduced by about 47% for Wuhan and by
67% for Shanghai, which enables much more reliable AQI forecasts for both cities.
Air pollution has long been a major matter of concern in China1. Exposure to harmful air pollution for a long
time will result in a range of respiratory ailments, cardiovascular diseases, and even lung cancer in h
umans2.
Furthermore, high concentrations of air pollutants harm food production and imperil animal survival3. Hence,
rational prediction of air quality provides a level of protection for humans and nature.
As a typical time series, air quality is affected by not only seasonal factors but also by significant social factors4.
For example, at the end of 2019, a new coronavirus broke out in Wuhan, China, which was easily transmitted
through the air. To cut off the transmission of the virus, the Wuhan government implemented a 76-day lockdown
policy limiting human activities, which in turn positively improved air q
uality5,6, because the concentrations
of PM10, PM2.5, NO2 and CO from vehicle exhaust and industry decreased dramatically7. According to Lian
et al.8, the NO2 concentration and AQI decreased by 53.2% and 33.9%, respectively, during the lockdown period
in Wuhan. To some extent, the improvement of air quality during the epidemic is an opportunity to spark new
pollution management ideas from the government, such as the scheduling of traffic and industrial production.
Therefore, accurate air quality prediction during the epidemic is of social importance.
Literature review. Air quality prediction is a hot topic in the environmental field, and the common prediction methods are three main categories: numerical simulation, statistical methods, and machine learning.
Earlier studies on air quality prediction mostly used numerical simulation. Using mathematical knowledge, it
builds models to simulate changes in air quality based on chemical and physical processes in the atmosphere.
The classical models are the nested air quality prediction modelling s ystem9, weather research and forecasting
model10,11, and community multiscale air quality m
odel12,13. However, these models place high demands on
the dataset and assume that the pollution discharge is constant, which is not true since pollutants are emitted
randomly in fact14. Besides, numerical simulation methods often produce complex calculations, which are not
user-friendly. In view of these inadequacies, statistical methods to predict air quality have become increasingly
popular among researchers.
The statistical method does not involve meteorological theories; instead, it mainly explores patterns from the
data to construct prediction models15–17. Considering that air quality data is a typical time series, auto regressive
moving average model (ARMA) is widely used. Kumar et al.18 used ARMA to predict O3, CO, NO and NO2
1
College of Mathematics and Physics, Wenzhou University, Wenzhou 325035, People’s Republic of
China. 2The Institute for Learning Sciences and Teacher Education, Australian Catholic University, Brisbane 4000,
Australia. 3Frontiers Science Center for Deep Ocean Multispheres and Earth System, Key Lab of Submarine
Geosciences and Prospecting Techniques, MOE and College of Marine Geosciences, Ocean University of China,
Qingdao 266100, People’s Republic of China. *email:
Scientific Reports |
(2023) 13:1015
| https://doi.org/10.1038/s41598-023-28287-8
1
Vol.:(0123456789)
www.nature.com/scientificreports/
concentrations, and the model achieved good performance at an urban traffic site in Delhi, India. Regression
models are also well suited to address prediction problems. Stadlober et al.19 constructed a multiple linear regression model that combined current data with the next day meteorological forecasts to predict the daily PM10
concentrations, which assisted the government in making traffic control decisions. However, most statistical
methods require the independent and dependent variables to be linearly correlated, while there is significant
nonlinearity between air quality d
ata20. Therefore, statistical methods sometimes do not achieve satisfying results.
Machine learning has been a popular choice for air quality forecasting because it is good at dealing with
nonlinear problems. Dai et al.21 set up a hybrid model by using a multilayer perception that could predict the
PM2.5 concentration and fluctuation in different regions more effectively. Ketu et al.22 combined the adjustment
of kernel scales with a support vector m
achine23, which allows for an accurate classification of air quality. Lim
24
et al. combined multiple machine learning algorithms to construct a land use regression model for PM2.5
concentration prediction in Seoul, Korea, and experimentally demonstrated that machine learning can further
improve model performance. Ma et al.25 used a nonlinear extreme gradient boosting to predict air quality in the
U.S. which also measured the importance of the variables. Although machine learning algorithms have usually
performed well, they still have limitations in terms of their capacity to make multistep predictions and collect
long-term data properties.
Deep learning is a branch of machine learning26. Among the many algorithms for deep learning, the long
short-term memory network (LSTM) is often used to predict air quality due to its effectiveness in solving longdistance dependence27–29. For example, Li et al.30 used LSTM to predict hourly PM2.5 concentrations in Beijing,
and the experimental results proved that the model outperformed ARMA and support vector regression. Cheng
et al.31 used a variant of LSTM, the bidirectional LSTM (Bi-LSTM), for air quality prediction at stations with
missing data, and the strategy reduced the root mean square error by 35.21% on average. Therefore, given the
above, it is viable to adopt deep learning models for air quality studies.
Feature selection is often used in combination with deep learning to improve algorithm efficiency. Metaheuristic algorithm (...truncated)