Skip to main content
Log in

A Multi-Scale Method for PM2.5 Forecasting with Multi-Source Big Data

  • Published:
Journal of Systems Science and Complexity Aims and scope Submit manuscript

Abstract

In the age of big data, the Internet big data can finely reflect public attention to air pollution, which greatly impact ambient PM2.5 concentrations; however, it has not been applied to PM2.5 prediction yet. Therefore, this study introduces such informative Internet big data as an effective predictor for PM2.5, in addition to other big data. To capture the multi-scale relationship between PM2.5 concentrations and multi-source big data, a novel multi-source big data and multi-scale forecasting methodology is proposed for PM2.5. Three major steps are taken: 1) Multi-source big data process, to collect big data from different sources (e.g., devices and Internet) and extract the hidden predictive features; 2) Multi-scale analysis, to address the non-uniformity and nonalignment of timescales by withdrawing the scale-aligned modes hidden in multi-source data; 3) PM2.5 prediction, entailing individual prediction at each timescale and ensemble prediction for the final results. The empirical study focuses on the top highly-polluted cities and shows that the proposed multi-source big data and multi-scale forecasting method outperforms its original forms (with neither big data nor multi-scale analysis), semi-extended variants (with big data and without multi-scale analysis) and similar counterparts (with big data but from a single source and multi-scale analysis) in accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Cao D and Ramirez C D, Air pollution, government pollution regulation, and industrial production in China, Journal of Systems Science and Complexity, 2020, 33(4): 1064–1079.

    Article  MATH  Google Scholar 

  2. Du P, Wang J, Hao Y, et al., A novel hybrid model based on multi-objective Harris hawks optimization algorithm for daily PM2.5 and PM10 forecasting, Applied Soft Computing, 2020, 96: 106620.

    Article  Google Scholar 

  3. Lim S S, Vos T, Flaxman A D, et al., A comparative risk assessment of burden of disease and injury attributable to 67 risk factors and risk factor clusters in 21 regions, 1990–2010: A systematic analysis for the Global Burden of Disease Study 2010, The Lancet, 2012, 380(9859): 2224–2260.

    Article  Google Scholar 

  4. IHME and HEI, State of global air 2017: A special report on global exposure to air pollution and its disease burden, 2017. Available on: Https://www.stateofglobalair.org/report.

  5. Lelieveld J, Evans J S, Fnais M, et al., The contribution of outdoor air pollution sources to premature mortality on a global scale, Nature, 2015, 525(7569): 367–371.

    Article  Google Scholar 

  6. Mahajan S, Chen L J, and Tsai T C, Short-term PM2.5 forecasting using exponential smoothing method: A comparative analysis, Sensors, 2018, 18(10): 3223.

    Article  Google Scholar 

  7. Gao X and Li W, A graph-based LSTM model for PM2.5 forecasting, Atmospheric Pollution Research, 2021, 12(9): 101150.

    Article  Google Scholar 

  8. Samal K K R, Babu K S, and Das S K, Multi-directional temporal convolutional artificial neural network for PM2.5 forecasting with missing values: A deep learning approach, Urban Climate, 2021, 36: 100800.

    Article  Google Scholar 

  9. Jie Y, Rui Y, Mn A, et al., PM2.5 concentrations forecasting in Beijing through deep learning with different inputs, model structures and forecast time, Atmospheric Pollution Research, 2021, 12(9): 101168.

    Article  Google Scholar 

  10. Fu J S, Hsu N C, Gao Y, et al., Evaluating the influences of biomass burning during 2006 BASE-ASIA: A regional chemical transport modeling, Atmospheric Chemistry and Physics, 2012, 12(9): 3837–3855.

    Article  Google Scholar 

  11. Yuan W Y, Wang K, Bo X, et al., A novel multi-factor & multi-scale method for PM2.5 concentration forecasting, Environmental Pollution, 2019, 255: 113187.

    Article  Google Scholar 

  12. Yang W, Deng M, Xu F, et al., Prediction of hourly PM2.5 using a space-time support vector regression model, Atmospheric Environment, 2018, 181: 12–19.

    Article  Google Scholar 

  13. Wu H, Liu H, and Duan Z, PM2.5 concentrations forecasting using a new multi-objective feature selection and ensemble framework, Atmospheric Pollution Research, 2020, 11(7): 1187–1198.

    Article  Google Scholar 

  14. Saggi M K and Jain S, Reference evapotranspiration estimation and modeling of the Punjab northern India using deep learning, Computers and Electronics in Agriculture, 2019, 156: 387–398.

    Article  Google Scholar 

  15. Theuer F, van Dooren M F, von Bremen L, et al., Minute-scale power forecast of offshore wind turbines using long-range single-Doppler lidar measurements, Wind Energy Science, 2020, 5(4): 1449–1468.

    Article  Google Scholar 

  16. Yang L, Gao X, Hua J, et al., Very short-term surface solar irradiance forecasting based on FengYun-4 geostationary satellite, Sensors, 2020, 20(9): 2606.

    Article  Google Scholar 

  17. Liu T, Lau A K H, Sandbrink K, et al., Time series forecasting of air quality based on regional numerical modeling in Hong Kong, Journal of Geophysical Research: Atmospheres, 2018, 123(8): 4175–4196.

    Article  Google Scholar 

  18. Huang C J and Kuo P H, A deep CNN-LSTM model for particulate matter (PM2.5) forecasting in smart cities, Sensors, 2018, 18(7): 2220.

    Article  Google Scholar 

  19. Xie J, Wang X, Liu Y, et al., Autoencoder-based deep belief regression network for air particulate matter concentration forecasting, Journal of Intelligent & Fuzzy Systems, 2018, 34(6): 3475–3486.

    Article  Google Scholar 

  20. Liu H, Jin K, and Duan Z, Air PM2.5 concentration multi-step forecasting using a new hybrid modeling method: Comparing cases for four cities in China, Atmospheric Pollution Research, 2019, 10(5): 1588–1600.

    Article  Google Scholar 

  21. Zhang Z, Wu L, and Chen Y, Forecasting PM2.5 and PM10 concentrations using GMCN (1, N) model with the similar meteorological condition: Case of Shijiazhuang in China, Ecological Indicators, 2020, 119: 106871.

    Article  Google Scholar 

  22. Zhou Y, Chang F J, Chen H, et al., Exploring copula-based Bayesian model averaging with multiple ANNs for PM2.5 ensemble forecasts, Journal of Cleaner Production, 2020, 263: 121528.

    Article  Google Scholar 

  23. Lu G, Yu E, Wang Y, et al., A novel hybrid machine learning method (OR-ELM-AR) used in forecast of PM2.5 concentrations and its forecast performance evaluation, Atmosphere, 2021, 12(1): 78.

    Article  Google Scholar 

  24. Zhang L, Na J, Zhu J, et al., Spatiotemporal causal convolutional network for forecasting hourly PM2.5 concentrations in Beijing, China, Computers & Geosciences, 155(2021): 104869.

  25. Zulfadhilah M, Prayudi Y, and Riadi I, Cyber profiling using log analysis and k-means clustering, International Journal of Advanced Computer Science and Applications, 2016, 7(7): 430–435.

    Article  Google Scholar 

  26. Dong D, Xu X, Xu W, et al., The relationship between the actual level of air pollution and residents concern about air pollution: Evidence from Shanghai, China, International Journal of Environmental Research and Public Health, 2019, 16(23): 4784.

    Article  Google Scholar 

  27. Li C, Ma X, Fu T, et al., Does public concern over haze pollution matter? Evidence from Beijing-Tianjin-Hebei region, China, Science of the Total Environment, 2021, 755: 142397.

    Article  Google Scholar 

  28. Li K, Lu W, Liang C, et al., Intelligence in tourism management: A hybrid FOA-BP method on daily tourism demand forecasting with web search data, Mathematics, 2019, 7(6): 531.

    Article  Google Scholar 

  29. Ho A F W, To B Z Y S, Koh J M, et al., Forecasting hospital emergency department patient volume using internet search data, IEEE Access, 2019, 7: 93387–93395.

    Article  Google Scholar 

  30. Zhang Y, Bambrick H, Mengersen K, et al., Using Google Trends and ambient temperature to predict seasonal influenza outbreaks, Environment International, 2018, 117: 284–291.

    Article  Google Scholar 

  31. Pan B and Yang Y, Forecasting destination weekly hotel occupancy with big data, Journal of Travel Research, 2017, 56(7): 957–970.

    Article  Google Scholar 

  32. Lin J, Wu Z, and Li X, Measuring inter-city connectivity in an urban agglomeration based on multi-source data, International Journal of Geographical Information Science, 2019, 33(5): 1062–1081.

    Article  Google Scholar 

  33. Fung W Y and Wu R, Relationship between intraseasonal variations of air pollution and meteorological variables in Hong Kong, Annals of GIS, 2014, 20(3): 217–226.

    Article  Google Scholar 

  34. Deng X J, Liao L Q, and Hu G P, Air pollution index and their correlation with meteorological data in major cities of China during the last decades, Environ. Sci. Technol., 2013, 36(9): 70–75.

    Google Scholar 

  35. Zhang L, Liu Y, and Zhao F, Important meteorological variables for statistical long-term air quality prediction in Eastern China, Theoretical and Applied Climatology, 2018, 134(1): 25–36.

    Article  Google Scholar 

  36. Li W, Yang G, and Li X, Correlation between PM2.5 pollution and its public concern in China: Evidence from Baidu Index, Journal of Cleaner Production, 2021, 293: 126091.

    Article  Google Scholar 

  37. Ni X Y, Huang H, and Du W P, Relevance analysis and short-term prediction of PM2.5 concentrations in Beijing based on multi-source data, Atmospheric Environment, 2017, 150: 146–161.

    Article  Google Scholar 

  38. He K, Chen Y, and Tso G K F, Price forecasting in the precious metal market: A multivariate EMD denoising approach, Resources Policy, 2017, 54: 9–24.

    Article  Google Scholar 

  39. Tang L, Wu Y, and Yu L, A non-iterative decomposition-ensemble learning paradigm using RVFL network for crude oil price forecasting, Applied Soft Computing, 2018, 70: 1097–1108.

    Article  Google Scholar 

  40. Rehman N and Mandic D P, Multivariate empirical mode decomposition, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2010, 466(2117): 1291–1302.

    Article  MathSciNet  MATH  Google Scholar 

  41. Adarsh S, Finer scale rainfall projections for Kerala meteorological subdivision, India based on multivariate empirical mode decomposition, International Journal of Environmental Science and Development, 2016, 7(12): 896.

    Article  Google Scholar 

  42. Wu Q and Lin H, A novel optimal-hybrid model for daily air quality index prediction considering air pollutant factors, Science of the Total Environment, 2019, 683: 808–821.

    Article  Google Scholar 

  43. Shi Y, Gu W, Zhang L, et al., Some new methods to fractal image compression, Communications in Nonlinear Science and Numerical Simulation, 1997, 2(2): 80–85.

    Article  MATH  Google Scholar 

  44. Judrupa I and Berzina I, Role of innovations in the increasing of regional competitiveness in latvia, Management and Sustainable Development, 2015, 51(2): 19–23.

    Google Scholar 

  45. Wang J, Hou R, Wang C, et al., Improved v-support vector regression model based on variable selection and brain storm optimization for stock price forecasting, Applied Soft Computing, 2016, 49: 164–178.

    Article  Google Scholar 

  46. Adarsh S, Sanah S, Murshida K K, et al., Scale dependent prediction of reference evapotranspiration based on multi-Variate Empirical mode decomposition, Ain Shams Engineering Journal, 2018, 9(4): 1839–1848.

    Article  Google Scholar 

  47. Cortes C and Vapnik V, Support-vector networks, Machine Learning, 1995, 20(3): 273–297.

    Article  MATH  Google Scholar 

  48. Huang G B, Zhu Q Y, and Siew C K, Extreme learning machine: Theory and applications, Neurocomputing, 2006, 70(1–3): 489–501.

    Article  Google Scholar 

  49. Tang L, Dai W, Yu L, et al., A novel CEEMD-based EELM ensemble learning paradigm for crude oil price forecasting, International Journal of Information Technology & Decision Making, 2015, 14(1): 141–169.

    Article  Google Scholar 

  50. McClelland J L, Rumelhart D E, and Hinton G E, The appeal of parallel distributed processing, MIT Press, Cambridge MA, 1986, 3–44.

    Google Scholar 

  51. Banerjee K S, Rao C R, and Mitra S K, Generalized inverse of matrices and its applications, Technometrics, 1973, 15(11): 197, DOI: https://doi.org/10.2307/1266840.

    Article  Google Scholar 

  52. Pao Y H, Park G H, and Sobajic D J, Learning and generalization characteristics of the random vector functional-link net, Neurocomputing, 1994, 6(2): 163–180.

    Article  Google Scholar 

  53. Yang J, Yan R, Nong M, et al., PM2.5 concentrations forecasting in Beijing through deep learning with different inputs, model structures and forecast time, Atmospheric Pollution Research, 2021, 12(9): 101168.

    Article  Google Scholar 

  54. Ghude S D, Kumar R, Jena C, et al., Evaluation of PM2.5 forecast using chemical data assimilation in the WRF-Chem model: A novel initiative under the ministry of earth sciences air quality early warning system for delhi india, Current Science, 2020, 118: 1803–1815.

    Article  Google Scholar 

  55. Tang L, Wu Y, and Yu L, A randomized-algorithm-based decomposition-ensemble learning methodology for energy price forecasting, Energy, 2018, 157: 526–538.

    Article  Google Scholar 

  56. Russo D P, Zorn K M, Clark A M, et al., Comparing multiple machine learning algorithms and metrics for estrogen receptor binding prediction, Molecular Pharmaceutics, 2018, 15(10): 4361–4370.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ling Li.

Additional information

This research was supported by the National Natural Science Foundation of China under Grant Nos. 72004144 and 71971007, and the Fundamental Research Funds for the Beijing Municipal Colleges and Universities in Capital University of Economics and Business under Grant No. XRZ2020026.

Supplementary information is available at https://doi.org/10.57760/sciencedb.07778 and https://github.com/dhc12a/DHC.

Supplementary Material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yuan, W., Du, H., Li, J. et al. A Multi-Scale Method for PM2.5 Forecasting with Multi-Source Big Data. J Syst Sci Complex 36, 771–797 (2023). https://doi.org/10.1007/s11424-023-1378-7

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11424-023-1378-7

Keywords

Navigation