Skip to main content
Log in

Prediction of PM2.5 concentration based on the weighted RF-LSTM model

  • Review
  • Published:
Earth Science Informatics Aims and scope Submit manuscript

Abstract

Accurate prediction of PM2.5 concentrations can provide a solid foundation for preventing and controlling air pollution. When the Long Short-Term Memory (LSTM) is applied to predict PM2.5 concentration, the influential factors strongly correlated with PM2.5 concentration are directly fed into the LSTM network. However, the influence of these factors on PM2.5 concentration is different. To address this issue, a weighted Random Forest (RF)-LSTM model was proposed to predict PM2.5 concentration for the next six hours in this study. This model first uses the RF to select the factors that are more important for predicting PM2.5 concentration and then uses a fully connected neural network to learn the weight value of each factor. Finally, the weighted data is fed into the LSTM network. The model is trained, validated, and tested using hourly air pollutant and meteorological data collected from four monitoring stations in Beijing, China, from November 1, 2019 to February 28, 2022. The prediction performance of the weighted RF-LSTM model was compared to the RF-LSTM and LSTM models. The results show that the RMSE and MAE of the weighted RF-LSTM model are the smallest, and the R2 is the largest for the next six hours’ prediction of PM2.5 concentration at four stations. Compared to the LSTM model, the weighted RF-LSTM model decreases RMSE by 2.3%-5.3%, MAE by 5.6%-9.6%, and improves R2 by 2.0%-4.8%, showing that the weighted RF-LSTM model proposed in this study can achieve better prediction performance and has strong generalization ability.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data Availability

The data that support the findings of this study are available on request from the corresponding author.

Abbreviations

RF:

Random Forest

LSTM:

Long Short-Term Memory

a weighted RF-LSTM:

a weighted Random Forest- Long Short-Term Memory

RF-LSTM:

Random Forest- Long Short-Term Memory

PM2.5:

PM2.5 concentration

PM10:

PM10 concentration

NO2 :

NO2 concentration

CO:

CO concentration

SO2 :

SO2 concentration

O3 :

O3 concentration

RNN:

Recurrent Neural Network

BP:

Back Propagation

FC:

Fully Connected

RMSE:

root mean square error

MAE:

mean absolute error

R2 :

coefficient of determination

References

  • Agarwal S, Sharma S, Suresh R, Rahman MH, Vranckx S, Maiheu B, Blyth L, Janssen S, Gargava P, Shukla VK, Batra S (2020) Air quality forecasting using artificial neural networks with real time dynamic error correction in highly polluted regions. Sci Total Environ 735:139454

    Google Scholar 

  • Al-Janabi S, Mohammad M, Al-Sultan A (2020) A new method for prediction of air pollution based on intelligent computation. Soft Comput 24(1):661–680

    Google Scholar 

  • Byun D, Schere KL (2006) Review of the governing equations, computational algorithms, and other components of the Models-3 Community Multiscale Air Quality (CMAQ) modeling system. Appl Mech Rev 59(2):51–77

    Google Scholar 

  • Cabaneros SM, Calautit JK, Hughes BR (2019) A review of artificial neural network models for ambient air pollution prediction. Environ Model Softw 119:285–304

    Google Scholar 

  • Chang YS, Chiao HT, Abimannan S, Huang YP, Tsai YT, Lin KM (2020) An LSTM-based aggregated model for air pollution forecasting. Atmospheric Pollution Research 11(8):1451–1463

    Google Scholar 

  • Chen G, Li S, Knibbs LD, Hamm NA, Cao W, Li T, Guo J, Ren H, Abramson MJ, Guo Y (2018) A machine learning method to estimate PM2.5 concentrations across China with remote sensing, meteorological and land use information. Sci Total Environ 636:52–60

    Google Scholar 

  • Cheng X, Liu Y, Xu X, You W, Zang Z, Gao L, Chen Y, Su D, Yan P (2019) Lidar data assimilation method based on CRTM and WRF-Chem models and its application in PM2.5 forecasts in Beijing. Sci Total Environ 682:541–552

    Google Scholar 

  • Ding, W., & Liu, J. (2023). Nonlinear and spatial spillover effects of urbanization on air pollution and ecological resilience in the Yellow River Basin. Environ Sci Pollut Res, 1-16.

  • Ding W, Zhu Y (2022) Prediction of PM2.5 Concentration in Ningxia Hui Autonomous Region Based on PCA-Attention-LSTM. Atmosphere 13(9):1444

    Google Scholar 

  • Ding W, Leung Y, Zhang J, Fung T (2021) A hierarchical Bayesian model for the analysis of space-time air pollutant concentrations and an application to air pollution analysis in Northern China. Stoch Env Res Risk A 35(11):2237–2271

    Google Scholar 

  • Gao X, Li W (2021) A graph-based LSTM model for PM2.5 forecasting. Atmospheric Pollution Research 12(9):101150

    Google Scholar 

  • Gautam A, Sharma PK, Baredar P, Warudkar V, Bhagoria JL, Ahmed S (2021) Modeling of atmospheric boundary flows using experimental investigation over complex terrain in a non-neutral condition. Materials Today: Proceedings 46:5681–5686

    Google Scholar 

  • Gautam A, Warudkar V, Bhagoria JL (2022) Comparison of Weibull parameter estimation methods using LiDAR and mast wind data in an Indian offshore site: The Gulf of Khambhat. Ocean Eng 266:112927

    Google Scholar 

  • Gupta, P., & Christopher, S.A. (2009). Particulate matter air quality assessment using integrated surface, satellite, and meteorological products: Multiple regression approach. J Geophys Res: Atmospheres, 114(D14)

  • Huang G, Li X, Zhang B, Ren J (2021) PM2.5 concentration forecasting at surface monitoring sites using GRU neural network based on empirical mode decomposition. Sci Total Environ 768:144516

    Google Scholar 

  • Jian L, Zhao Y, Zhu YP, Zhang MB, Bertolatti D (2012) An application of ARIMA model to predict submicron particle concentrations from meteorological factors at a busy roadside in Hangzhou, China. Sci Total Environ 426:336–345

    Google Scholar 

  • Jiang N, Li L, Wang S, Li Q, Dong Z, Duan S, Zhang R, Li S (2019) Variation tendency of pollution characterization, sources, and health risks of PM2.5-bound polycyclic aromatic hydrocarbons in an emerging megacity in China: Based on three-year data. Atmos Res 217:81–92

    Google Scholar 

  • Jiang X, Yoo EH (2018) The importance of spatial resolutions of Community Multiscale Air Quality (CMAQ) models on health impact assessment. Sci Total Environ 627:1528–1543

    Google Scholar 

  • Jin H, Chen X, Zhong R, Liu M (2022) Influence and prediction of PM2.5 through multiple environmental variables in China. Sci Total Environ 849:157910

    Google Scholar 

  • Kappos AD, Bruckmann P, Eikmann T, Englert N, Heinrich U, Höppe P, Koch E, Krause GHM, Kreyling WG, Rauchfuss K, Rombout P, Klemp VS, Thiel WR, Wichmann HE (2004) Health effects of particles in ambient air. Int J Hyg Environ Health 207(4):399–407

    Google Scholar 

  • Karimian H, Li Y, Chen Y, Wang Z (2023) Evaluation of different machine learning approaches and aerosol optical depth in PM2.5 prediction. Environ Res 216:114465

    Google Scholar 

  • Kim YB, Park SB, Lee S, Park YK (2023) Comparison of PM2.5 prediction performance of the three deep learning models: A case study of Seoul, Daejeon, and Busan. J Ind Eng Chem 120:159–169

    Google Scholar 

  • Li T, Hua M, Wu XU (2020) A hybrid CNN-LSTM model for forecasting particulate matter (PM2.5). IEEE Access 8:26933–26940

    Google Scholar 

  • Li X, Jin L, Kan H (2019) Air pollution: a global problem needs local fixes. Nature 570(7762):437–439

    Google Scholar 

  • Lipton ZC, Berkowitz J, Elkan C (2015). A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019

  • Liu DR, Lee SJ, Huang Y, Chiu CJ (2020) Air pollution forecasting based on attention-based LSTM neural network and ensemble learning. Expert Syst 37(3):e12511

    Google Scholar 

  • Liu W, Guo G, Chen F, Chen Y (2019) Meteorological pattern analysis assisted daily PM2.5 grades prediction using SVM optimized by PSO algorithm. Atmospheric Pollution Research 10(5):1482–1491

    Google Scholar 

  • Ma J, Ding Y, Cheng JC, Jiang F, Gan VJ, Xu Z (2020) A Lag-FLSTM deep learning network based on Bayesian Optimization for multi-sequential-variant PM2.5 prediction. Sustain Cities Soc 60:102237

    Google Scholar 

  • Pak U, Ma J, Ryu U, Ryom K, Juhyok U, Pak K, Pak C (2020) Deep learning-based PM2.5 prediction considering the spatiotemporal correlations: A case study of Beijing, China. Sci Total Environ 699:133561

    Google Scholar 

  • Pun VC, Kazemiparkouhi F, Manjourides J, Suh HH (2017) Long-term PM2.5 exposure and respiratory, cancer, and cardiovascular mortality in older US adults. Am J Epidemiol 186(8):961–969

    Google Scholar 

  • Qiao W, Tian W, Tian Y, Yang Q, Wang Y, Zhang J (2019) The forecasting ofPM2.5 using a hybrid model based on wavelet transform and an improved deep learning algorithm. IEEE Access 7:142814–142825

    Google Scholar 

  • Shang K, Chen Z, Liu Z, Song L, Zheng W, Yang B et al (2021) Haze prediction model using deep recurrent neural network. Atmosphere 12(12):1625

    Google Scholar 

  • Shao Y, Ma Z, Wang J, Bi J (2020) Estimating daily ground-level PM2.5 in China with random-forest-based spatiotemporal kriging. Sci Total Environ 740:139761

    Google Scholar 

  • Sharma PK, Gautam A, Baredar P, Warudkar V, Bhagoria JL, Ahmed S (2021) Analysis of terrain of site Mamatkheda Ratlam through wind modeling tool ArcGIS and WAsP. Materials Today: Proceedings 46:5661–5665

    Google Scholar 

  • Suleiman A, Tight MR, Quinn AD (2016) Hybrid neural networks and boosted regression tree models for predicting roadside particulate matter. Environ Model Assess 21:731–750

    Google Scholar 

  • Sun W, Sun J (2017) Daily PM2.5 concentration prediction based on principal component analysis and LSSVM optimized by cuckoo search algorithm. J Environ Manag 188:144–152

    Google Scholar 

  • Sun W, Xu Z (2021) A novel hourly PM2.5 concentration prediction model based on feature selection, training set screening, and mode decomposition-reorganization. Sustain Cities Soc 75:103348

    Google Scholar 

  • Wu X, Liu Z, Yin L, Zheng W, Song L, Tian J et al (2021) A haze prediction model in chengdu based on LSTM. Atmosphere 12(11):1479

    Google Scholar 

  • Xu Y, Ho HC, Wong MS, Deng C, Shi Y, Chan TC, Knudby A (2018) Evaluation of machine learning techniques with multiple remote sensing datasets in estimating monthly concentrations of ground-level PM2.5. Environ Pollut 242:1417–1426

    Google Scholar 

  • Yan X, Zang Z, Jiang Y, Shi W, Guo Y, Li D, Zhao C, Husi L (2021) A Spatial-Temporal Interpretable Deep Learning Model for improving interpretability and predictive accuracy of satellite-based PM2.5. Environ Pollut 273:116459

    Google Scholar 

  • Yin L, Wang L, Huang W, Tian J, Liu S, Yang B, Zheng W (2022) Haze grading using the convolutional neural networks. Atmosphere 13(4):522

    Google Scholar 

  • Zhang B, Zou G, Qin D, Lu Y, Jin Y, Wang H (2021a) A novel Encoder-Decoder model based on read-first LSTM for air pollutant prediction. Sci Total Environ 765:144507

    Google Scholar 

  • Zhang L, Liu P, Zhao L, Wang G, Zhang W, Liu J (2021b) Air quality predictions with a semi-supervised bidirectional LSTM neural network. Atmospheric Pollution Research 12(1):328–339

    Google Scholar 

  • Zhang M, Wu D, Xue R (2021c) Hourly prediction of PM2.5 concentration in Beijing based on Bi-LSTM neural network. Multimed Tools Appl 80(16):24455–24468

    Google Scholar 

  • Zhao CN, Xu Z, Wu GC, Mao YM, Liu LN, Dan YL, Tao SS, Zhang Q, Sam NB, Fan YG, Zou YF, Ye DQ, Pan HF (2019a) Emerging role of air pollution in autoimmune diseases. Autoimmun Rev 18(6):607–614

    Google Scholar 

  • Zhao J, Deng F, Cai Y, Chen J (2019b) Long short-term memory-Fully connected (LSTM-FC) neural network for PM2.5 concentration prediction. Chemosphere 220:486–492

    Google Scholar 

  • Zhu H, Lu X (2016) The prediction of PM2.5 value based on ARMA and improved BP neural network model. In 2016 International Conference on Intelligent Networking and Collaborative Systems (INCoS) (pp. 515-517). IEEE

  • Zhu S, Lian X, Wei L, Che J, Shen X, Yang L, Qiu X, Liu X, Gao W, Ren X, Li J (2018) PM2.5 forecasting using SVR with PSOGSA algorithm based on CEEMD, GRNN and GCA considering meteorological factors. Atmos Environ 183:20–32

    Google Scholar 

Download references

Funding

This work was supported by the National Natural Science Foundation, Research on air pollution and ecological toughness and time and space in the Yellow River Basin City under Grant 12361108, the Ningxia Natural Science Foundation under Grant no. 2023AAC03278, First Class Disciplines Foundation of Ningxia under Grant NXYLXK2017B09, the 2023 Ningxia Youth Top Talent Program.

Author information

Authors and Affiliations

Authors

Contributions

Weifu Ding: Funding acquisition; investigation; visualization; Huihui Sun: methodology; resources; supervision; Data availability. All authors read and approved the final Manuscript.

Corresponding author

Correspondence to Huihui Sun.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Communicated by: H. Babaie

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ding, W., Sun, H. Prediction of PM2.5 concentration based on the weighted RF-LSTM model. Earth Sci Inform 16, 3023–3037 (2023). https://doi.org/10.1007/s12145-023-01111-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12145-023-01111-7

Keywords

Navigation