Abstract
In the age of big data, the Internet big data can finely reflect public attention to air pollution, which greatly impact ambient PM2.5 concentrations; however, it has not been applied to PM2.5 prediction yet. Therefore, this study introduces such informative Internet big data as an effective predictor for PM2.5, in addition to other big data. To capture the multi-scale relationship between PM2.5 concentrations and multi-source big data, a novel multi-source big data and multi-scale forecasting methodology is proposed for PM2.5. Three major steps are taken: 1) Multi-source big data process, to collect big data from different sources (e.g., devices and Internet) and extract the hidden predictive features; 2) Multi-scale analysis, to address the non-uniformity and nonalignment of timescales by withdrawing the scale-aligned modes hidden in multi-source data; 3) PM2.5 prediction, entailing individual prediction at each timescale and ensemble prediction for the final results. The empirical study focuses on the top highly-polluted cities and shows that the proposed multi-source big data and multi-scale forecasting method outperforms its original forms (with neither big data nor multi-scale analysis), semi-extended variants (with big data and without multi-scale analysis) and similar counterparts (with big data but from a single source and multi-scale analysis) in accuracy.
Similar content being viewed by others
References
Cao D and Ramirez C D, Air pollution, government pollution regulation, and industrial production in China, Journal of Systems Science and Complexity, 2020, 33(4): 1064–1079.
Du P, Wang J, Hao Y, et al., A novel hybrid model based on multi-objective Harris hawks optimization algorithm for daily PM2.5 and PM10 forecasting, Applied Soft Computing, 2020, 96: 106620.
Lim S S, Vos T, Flaxman A D, et al., A comparative risk assessment of burden of disease and injury attributable to 67 risk factors and risk factor clusters in 21 regions, 1990–2010: A systematic analysis for the Global Burden of Disease Study 2010, The Lancet, 2012, 380(9859): 2224–2260.
IHME and HEI, State of global air 2017: A special report on global exposure to air pollution and its disease burden, 2017. Available on: Https://www.stateofglobalair.org/report.
Lelieveld J, Evans J S, Fnais M, et al., The contribution of outdoor air pollution sources to premature mortality on a global scale, Nature, 2015, 525(7569): 367–371.
Mahajan S, Chen L J, and Tsai T C, Short-term PM2.5 forecasting using exponential smoothing method: A comparative analysis, Sensors, 2018, 18(10): 3223.
Gao X and Li W, A graph-based LSTM model for PM2.5 forecasting, Atmospheric Pollution Research, 2021, 12(9): 101150.
Samal K K R, Babu K S, and Das S K, Multi-directional temporal convolutional artificial neural network for PM2.5 forecasting with missing values: A deep learning approach, Urban Climate, 2021, 36: 100800.
Jie Y, Rui Y, Mn A, et al., PM2.5 concentrations forecasting in Beijing through deep learning with different inputs, model structures and forecast time, Atmospheric Pollution Research, 2021, 12(9): 101168.
Fu J S, Hsu N C, Gao Y, et al., Evaluating the influences of biomass burning during 2006 BASE-ASIA: A regional chemical transport modeling, Atmospheric Chemistry and Physics, 2012, 12(9): 3837–3855.
Yuan W Y, Wang K, Bo X, et al., A novel multi-factor & multi-scale method for PM2.5 concentration forecasting, Environmental Pollution, 2019, 255: 113187.
Yang W, Deng M, Xu F, et al., Prediction of hourly PM2.5 using a space-time support vector regression model, Atmospheric Environment, 2018, 181: 12–19.
Wu H, Liu H, and Duan Z, PM2.5 concentrations forecasting using a new multi-objective feature selection and ensemble framework, Atmospheric Pollution Research, 2020, 11(7): 1187–1198.
Saggi M K and Jain S, Reference evapotranspiration estimation and modeling of the Punjab northern India using deep learning, Computers and Electronics in Agriculture, 2019, 156: 387–398.
Theuer F, van Dooren M F, von Bremen L, et al., Minute-scale power forecast of offshore wind turbines using long-range single-Doppler lidar measurements, Wind Energy Science, 2020, 5(4): 1449–1468.
Yang L, Gao X, Hua J, et al., Very short-term surface solar irradiance forecasting based on FengYun-4 geostationary satellite, Sensors, 2020, 20(9): 2606.
Liu T, Lau A K H, Sandbrink K, et al., Time series forecasting of air quality based on regional numerical modeling in Hong Kong, Journal of Geophysical Research: Atmospheres, 2018, 123(8): 4175–4196.
Huang C J and Kuo P H, A deep CNN-LSTM model for particulate matter (PM2.5) forecasting in smart cities, Sensors, 2018, 18(7): 2220.
Xie J, Wang X, Liu Y, et al., Autoencoder-based deep belief regression network for air particulate matter concentration forecasting, Journal of Intelligent & Fuzzy Systems, 2018, 34(6): 3475–3486.
Liu H, Jin K, and Duan Z, Air PM2.5 concentration multi-step forecasting using a new hybrid modeling method: Comparing cases for four cities in China, Atmospheric Pollution Research, 2019, 10(5): 1588–1600.
Zhang Z, Wu L, and Chen Y, Forecasting PM2.5 and PM10 concentrations using GMCN (1, N) model with the similar meteorological condition: Case of Shijiazhuang in China, Ecological Indicators, 2020, 119: 106871.
Zhou Y, Chang F J, Chen H, et al., Exploring copula-based Bayesian model averaging with multiple ANNs for PM2.5 ensemble forecasts, Journal of Cleaner Production, 2020, 263: 121528.
Lu G, Yu E, Wang Y, et al., A novel hybrid machine learning method (OR-ELM-AR) used in forecast of PM2.5 concentrations and its forecast performance evaluation, Atmosphere, 2021, 12(1): 78.
Zhang L, Na J, Zhu J, et al., Spatiotemporal causal convolutional network for forecasting hourly PM2.5 concentrations in Beijing, China, Computers & Geosciences, 155(2021): 104869.
Zulfadhilah M, Prayudi Y, and Riadi I, Cyber profiling using log analysis and k-means clustering, International Journal of Advanced Computer Science and Applications, 2016, 7(7): 430–435.
Dong D, Xu X, Xu W, et al., The relationship between the actual level of air pollution and residents concern about air pollution: Evidence from Shanghai, China, International Journal of Environmental Research and Public Health, 2019, 16(23): 4784.
Li C, Ma X, Fu T, et al., Does public concern over haze pollution matter? Evidence from Beijing-Tianjin-Hebei region, China, Science of the Total Environment, 2021, 755: 142397.
Li K, Lu W, Liang C, et al., Intelligence in tourism management: A hybrid FOA-BP method on daily tourism demand forecasting with web search data, Mathematics, 2019, 7(6): 531.
Ho A F W, To B Z Y S, Koh J M, et al., Forecasting hospital emergency department patient volume using internet search data, IEEE Access, 2019, 7: 93387–93395.
Zhang Y, Bambrick H, Mengersen K, et al., Using Google Trends and ambient temperature to predict seasonal influenza outbreaks, Environment International, 2018, 117: 284–291.
Pan B and Yang Y, Forecasting destination weekly hotel occupancy with big data, Journal of Travel Research, 2017, 56(7): 957–970.
Lin J, Wu Z, and Li X, Measuring inter-city connectivity in an urban agglomeration based on multi-source data, International Journal of Geographical Information Science, 2019, 33(5): 1062–1081.
Fung W Y and Wu R, Relationship between intraseasonal variations of air pollution and meteorological variables in Hong Kong, Annals of GIS, 2014, 20(3): 217–226.
Deng X J, Liao L Q, and Hu G P, Air pollution index and their correlation with meteorological data in major cities of China during the last decades, Environ. Sci. Technol., 2013, 36(9): 70–75.
Zhang L, Liu Y, and Zhao F, Important meteorological variables for statistical long-term air quality prediction in Eastern China, Theoretical and Applied Climatology, 2018, 134(1): 25–36.
Li W, Yang G, and Li X, Correlation between PM2.5 pollution and its public concern in China: Evidence from Baidu Index, Journal of Cleaner Production, 2021, 293: 126091.
Ni X Y, Huang H, and Du W P, Relevance analysis and short-term prediction of PM2.5 concentrations in Beijing based on multi-source data, Atmospheric Environment, 2017, 150: 146–161.
He K, Chen Y, and Tso G K F, Price forecasting in the precious metal market: A multivariate EMD denoising approach, Resources Policy, 2017, 54: 9–24.
Tang L, Wu Y, and Yu L, A non-iterative decomposition-ensemble learning paradigm using RVFL network for crude oil price forecasting, Applied Soft Computing, 2018, 70: 1097–1108.
Rehman N and Mandic D P, Multivariate empirical mode decomposition, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2010, 466(2117): 1291–1302.
Adarsh S, Finer scale rainfall projections for Kerala meteorological subdivision, India based on multivariate empirical mode decomposition, International Journal of Environmental Science and Development, 2016, 7(12): 896.
Wu Q and Lin H, A novel optimal-hybrid model for daily air quality index prediction considering air pollutant factors, Science of the Total Environment, 2019, 683: 808–821.
Shi Y, Gu W, Zhang L, et al., Some new methods to fractal image compression, Communications in Nonlinear Science and Numerical Simulation, 1997, 2(2): 80–85.
Judrupa I and Berzina I, Role of innovations in the increasing of regional competitiveness in latvia, Management and Sustainable Development, 2015, 51(2): 19–23.
Wang J, Hou R, Wang C, et al., Improved v-support vector regression model based on variable selection and brain storm optimization for stock price forecasting, Applied Soft Computing, 2016, 49: 164–178.
Adarsh S, Sanah S, Murshida K K, et al., Scale dependent prediction of reference evapotranspiration based on multi-Variate Empirical mode decomposition, Ain Shams Engineering Journal, 2018, 9(4): 1839–1848.
Cortes C and Vapnik V, Support-vector networks, Machine Learning, 1995, 20(3): 273–297.
Huang G B, Zhu Q Y, and Siew C K, Extreme learning machine: Theory and applications, Neurocomputing, 2006, 70(1–3): 489–501.
Tang L, Dai W, Yu L, et al., A novel CEEMD-based EELM ensemble learning paradigm for crude oil price forecasting, International Journal of Information Technology & Decision Making, 2015, 14(1): 141–169.
McClelland J L, Rumelhart D E, and Hinton G E, The appeal of parallel distributed processing, MIT Press, Cambridge MA, 1986, 3–44.
Banerjee K S, Rao C R, and Mitra S K, Generalized inverse of matrices and its applications, Technometrics, 1973, 15(11): 197, DOI: https://doi.org/10.2307/1266840.
Pao Y H, Park G H, and Sobajic D J, Learning and generalization characteristics of the random vector functional-link net, Neurocomputing, 1994, 6(2): 163–180.
Yang J, Yan R, Nong M, et al., PM2.5 concentrations forecasting in Beijing through deep learning with different inputs, model structures and forecast time, Atmospheric Pollution Research, 2021, 12(9): 101168.
Ghude S D, Kumar R, Jena C, et al., Evaluation of PM2.5 forecast using chemical data assimilation in the WRF-Chem model: A novel initiative under the ministry of earth sciences air quality early warning system for delhi india, Current Science, 2020, 118: 1803–1815.
Tang L, Wu Y, and Yu L, A randomized-algorithm-based decomposition-ensemble learning methodology for energy price forecasting, Energy, 2018, 157: 526–538.
Russo D P, Zorn K M, Clark A M, et al., Comparing multiple machine learning algorithms and metrics for estrogen receptor binding prediction, Molecular Pharmaceutics, 2018, 15(10): 4361–4370.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research was supported by the National Natural Science Foundation of China under Grant Nos. 72004144 and 71971007, and the Fundamental Research Funds for the Beijing Municipal Colleges and Universities in Capital University of Economics and Business under Grant No. XRZ2020026.
Supplementary information is available at https://doi.org/10.57760/sciencedb.07778 and https://github.com/dhc12a/DHC.
Supplementary Material
Rights and permissions
About this article
Cite this article
Yuan, W., Du, H., Li, J. et al. A Multi-Scale Method for PM2.5 Forecasting with Multi-Source Big Data. J Syst Sci Complex 36, 771–797 (2023). https://doi.org/10.1007/s11424-023-1378-7
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11424-023-1378-7