Skip to main content

Advertisement

Log in

Regression trees modeling of time series for air pollution analysis and forecasting

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Solving the problems related to air pollution is crucial for human health and the ecosystems in many urban areas throughout the world. The accumulation of large arrays of data with measurements of various air pollutants makes it possible to analyze these in order to predict and control pollution. This study presents a common approach for building quality nonlinear models of environmental time series by using the powerful data mining technique of classification and regression trees (CART). Predictors for modeling are time series with meteorological, atmospheric or other data, date-time variables and lagged variables of the dependent variable and predictors, involved as groups. The proposed approach is tested in empirical studies of the daily average concentrations of atmospheric PM10 (particulate matter 10 μm in diameter) in the cities of Ruse and Pernik, Bulgaria. A 1-day-ahead forecasts are obtained. All models are cross-validated against overfitting. The best models are selected using goodness-of-fit measures, such as root-mean-square error and coefficient of determination. Relative importance of the predictors and predictor groups is obtained and interpreted. The CART models are compared with the corresponding models built by using ARIMA transfer function methodology, and the superiority of CART over ARIMA is demonstrated. The practical applicability of the models is assessed using 2 × 2 contingency tables. The results show that CART models fit well the data and correctly predict about 90% of measured values of PM10 with respect to the average daily European threshold value of 50 µg/m3.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Directive 2008/50/EC of the European Parliament and of the council of 21 May 2008 on ambient air quality and cleaner air for Europe (2008) Official Journal of the European Union L 152/1. https://eur-lex.europa.eu/eli/dir/2008/50/oj. Accessed 15 July 2019

  2. Air Quality Standards (2015) European Commission. Environment. http://ec.europa.eu/environment/air/quality/standards.htm. Accessed 15 July 2019

  3. Box GEP, Jenkins GM, Reinsel GS (1994) Time series analysis, forecasting and control, 3rd edn. Prentice-Hall Inc., Upper Saddle River

    MATH  Google Scholar 

  4. Liu PWG (2009) Simulation of the daily average PM10 concentrations at Ta-Liao with Box–Jenkins time series models and multivariate analysis. Atmos Environ 43:2104–2113. https://doi.org/10.1016/j.atmosenv.2009.01.055

    Article  Google Scholar 

  5. Pohoata A, Lungu E (2017) A complex analysis employing ARIMA model and statistical methods on air pollutants recorded in Ploiesti, Romania. Rev Chim 68(4):818–823

    Google Scholar 

  6. Stoimenova M (2016) Stochastic modeling of problematic air pollution with particulate matter in the city of Pernik, Bulgaria. Ecol Balk 8(2):33–41

    Google Scholar 

  7. Zheleva I, Veleva E, Filipova M (2017) Analysis and modeling of daily air pollutants in the city of Ruse. Bulgaria. AIP Conf Proc 1895:030007. https://doi.org/10.1063/1.5007366

    Article  Google Scholar 

  8. Zhang PG (2003) Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing 50:159–175. https://doi.org/10.1016/S0925-2312(01)00702-0

    Article  MATH  Google Scholar 

  9. Lee NU, Shim JS, Ju YW, Park SC (2017) Design and implementation of the SARIMA–SVM time series analysis algorithm for the improvement of atmospheric environment forecast accuracy. Soft Comput. https://doi.org/10.1007/s00500-017-2825-y

    Article  Google Scholar 

  10. Nieto PJG, Lasheras FS, García-Gonzalo E, de Cos Juez FJ (2018) PM10 concentration forecasting in the metropolitan area of Oviedo (Northern Spain) using models based on SVM, MLP, VARMA and ARIMA: a case study. Sci Total Environ 621:753–761. https://doi.org/10.1016/j.scitotenv.2017.11.291

    Article  Google Scholar 

  11. Zhang H, Zhang S, Wang P, Qin Y, Wang H (2017) Forecasting of particulate matter time series using wavelet analysis and wavelet-ARMA/ARIMA model in Taiyuan, China. J Air Waste Manag Assoc 67(7):776–788. https://doi.org/10.1080/10962247.2017.1292968

    Article  Google Scholar 

  12. Biancofiore F, Busilacchio M, Verdecchia M, Tomassetti B, Aruffo E, Bianco S, Di Tommaso S, Colangeli C, Rosatelli G, Di Carlo P (2017) Recursive neural network model for analysis and forecast of PM10 and PM2.5. Atmos Pollut Res 8(4):652–659. https://doi.org/10.1016/j.apr.2016.12.014

    Article  Google Scholar 

  13. Franceschi F, Cobo M, Figueredo M (2018) Discovering relationships and forecasting PM10 and PM2.5 concentrations in Bogotá, Colombia, using Artificial Neural Networks, Principal Component Analysis, and k-means clustering. Atmos Pollut Res 9(5):912–922. https://doi.org/10.1016/j.apr.2018.02.006

    Article  Google Scholar 

  14. Bougoudis I, Demertzis K, Iliadis L (2016) HISYCOL a hybrid computational intelligence system for combined machine learning: the case of air pollution modelling in Athens. Neural Comput Appl 27(5):1191–1206. https://doi.org/10.1007/s00521-015-1927-7

    Article  Google Scholar 

  15. Abderrahim H, Chellali MR, Hamou A (2016) Forecasting PM10 in Algiers: efficacy of multilayer perceptron networks. Environ Sci Pollut Res 23(2):1634–1641. https://doi.org/10.1007/s11356-015-5406-6

    Article  Google Scholar 

  16. Prakash A, Kumar U, Kumar K, Jain V (2011) A wavelet-based neural network model to predict ambient air pollutants’ concentration. Environ Model Assess 16(5):503–517. https://doi.org/10.1007/s10666-011-9270-6

    Article  Google Scholar 

  17. Morgan JN, Sonquist JA (1963) Problems in an analysis of survey data and a proposal. J Am Stat Assoc 58:415–434

    Article  Google Scholar 

  18. Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth Advanced Books and Software, Belmont

    MATH  Google Scholar 

  19. Burrows WR, Benjamin M, Beauchamp S, Lord ER, McCollor D, Thomson B (1995) CART decision-tree statistical analysis and prediction of summer season maximum surface ozone for the Vancouver, Montreal, and Atlantic regions of Canada. J Appl Meteorol 34:1848–1862. https://doi.org/10.1175/1520-0450(1995)034<1848:CDTSAA>2.0.CO;2

    Article  Google Scholar 

  20. Slini T, Kaprara A, Karatzas K, Moussiopoulos N (2006) PM10 forecasting for Thessaloniki, Greece. Environ Model Softw 21(4):559–565. https://doi.org/10.1016/j.envsoft.2004.06.011

    Article  Google Scholar 

  21. Zickus M, Greig AJ, Niranjan M (2002) Comparison of four machine learning methods for predicting PM10 concentrations in Helsinki, Finland. Water Air Soil Pollut Focus 2:717–729. https://doi.org/10.1023/A:1021321820639

    Article  Google Scholar 

  22. Choi W, Paulson SE, Casmassi J, Winer AM (2013) Evaluating meteorological comparability in air quality studies: classification and regression trees for primary pollutants in California’s South Coast Air Basin. Atmos Environ 64:150–159. https://doi.org/10.1016/j.atmosenv.2012.09.049

    Article  Google Scholar 

  23. Sayegh A, Tate JE, Ropkins K (2016) Understanding how roadside concentrations of NOx are influenced by the background levels, traffic density, and meteorological conditions using Boosted Regression Trees. Atmos Environ 127:163–175. https://doi.org/10.1016/j.atmosenv.2015.12.024

    Article  Google Scholar 

  24. Stoimenova M, Voynikova D, Ivanov A, Gocheva-Ilieva S, Iliev I (2017) Regression trees modeling and forecasting of PM10 air pollution in urban areas. AIP Conf Proc 1895:030005. https://doi.org/10.1063/1.5007364

    Article  Google Scholar 

  25. Lewis PAW, Stevens JG (1991) Nonlinear modeling of time series using multivariate adaptive regression splines (MARS). J Am Stat Assoc 86(416):864–877. https://doi.org/10.1080/01621459.1991.10475126

    Article  MATH  Google Scholar 

  26. Weber G-W, Batmaz I, Köksal G, Taylan P, Yerlikaya-Özkurt F (2012) CMARS: a new contribution to nonparametric regression with multivariate adaptive regression splines supported by continuous optimization. Inverse Probl Sci Eng 20(3):371–400. https://doi.org/10.1080/17415977.2011.624770

    Article  MathSciNet  MATH  Google Scholar 

  27. Özmen A, Weber G-W, Batmaz I (2010) The new robust CMARS (RCMARS) method. In: Kasımbeyli R, Dinçer C, Özpeynirci S, Sakalauskas L (eds) 24th mini EURO conference on continuous optimization and information-based technologies in the financial sector, MEC EurOPT 2010, pp 362–368

  28. Özmen A, Weber GW (2012) Robust conic generalized partial linear models using RCMARS method—a robustification of CGPLM. AIP Conf Proc 1499:337–343. https://doi.org/10.1063/1.4769011

    Article  Google Scholar 

  29. Özmen A, Weber G-W (2014) RMARS: Robustification of multivariate adaptive regression spline under polyhedral uncertainty. J Comput Appl Math 259(Part B):914–924. https://doi.org/10.1016/j.cam.2013.09.055

    Article  MathSciNet  MATH  Google Scholar 

  30. Özmen A, Batmaz İ, Weber G-W (2014) Precipitation modeling by polyhedral RCMARS and comparison with MARS and CMARS. Environ Model Assess 19(5):425–435. https://doi.org/10.1007/s10666-014-9404-8

    Article  Google Scholar 

  31. Kuter S, Weber G-W, Akyürek Z, Özmen A (2015) Inversion of top of atmospheric reflectance values by conic multivariate adaptive regression splines. Inverse Probl Sci Eng 23(4):651–669. https://doi.org/10.1080/17415977.2014.933828

    Article  Google Scholar 

  32. Kartal-Koç E, Iyigun C, Batmaz I, Weber G-W (2014) Efficient adaptive regression spline algorithms based on mapping approach with a case study on finance. J Glob Optim 60(1):103–120. https://doi.org/10.1007/s10898-014-0211-1

    Article  MathSciNet  MATH  Google Scholar 

  33. Çevik A, Weber G-W, Eyüboğlu BM, Oğuz KK (2017) Voxel-MARS: a method for early detection of Alzheimer’s disease by classification of structural brain MRI. Ann Oper Res 258(1):31–57. https://doi.org/10.1007/s10479-017-2405-7

    Article  MathSciNet  MATH  Google Scholar 

  34. Özmen A, Yılmaz Y, Weber G-W (2018) Natural gas consumption forecast with MARS and CMARS models for residential users. Energy Econ 70:357–381. https://doi.org/10.1016/j.eneco.2018.01.022

    Article  Google Scholar 

  35. Roy SS, Pratyush C, Barna C (2018) Predicting ozone layer concentration using multivariate adaptive regression splines, random forest and classification and regression tree. Adv Intell Syst Comput 634:140–152. https://doi.org/10.1007/978-3-319-62524-9_11

    Article  Google Scholar 

  36. Nieto PJG, Álvarez JCA (2014) Nonlinear air quality modeling using multivariate adaptive regression splines in Gijón urban area (Northern Spain) at local scale. Appl Math Comput 235:50–65. https://doi.org/10.1016/j.amc.2014.02.096

    Article  Google Scholar 

  37. Shahraiyni TH, Sodoudi S (2016) Statistical modeling approaches for PM10 prediction in urban areas: a review of 21st-century studies. Atmosphere 7(2):15. https://doi.org/10.3390/atmos7020015

    Article  Google Scholar 

  38. Bai L, Wang J, Ma X, Lu H (2018) Air pollution forecasts: an overview. Int J Environ Res Public Health 15(780):1–44. https://doi.org/10.3390/ijerph15040780

    Article  Google Scholar 

  39. Salford Systems Data Mining and Predictive Analytics Software Modeler, SPM Version 8.0 (2016). Salford Systems, San Diego, CA

  40. SPSS IBM Statistics. https://www.ibm.com/analytics/data-science/predictive-analytics/spss-statistical-software. Accessed 15 July 2019

  41. Wolfram Mathematica system. http://www.wolfram.com/mathematica/. Accessed 15 July 2019

  42. Steinberg D, Golovnya M (2007) CART 6.0 user’s guide. Salford Systems, San Diego

    Google Scholar 

  43. Death G, Fabricius KE (2000) Classification and regression trees: a powerful yet simple technique for ecological data analysis. Ecology 81:3178–3192. https://doi.org/10.1890/0012-9658(2000)081[3178:CARTAP]2.0.CO;2

    Article  Google Scholar 

  44. Wu X, Kumar V (2009) The top ten algorithms in data mining. Chapman & Hall/CRC, Boca Raton

    Book  Google Scholar 

  45. Izenman J (2008) Modern multivariate statistical techniques: regression, classification, and manifold learning. Springer, New York

    Book  Google Scholar 

  46. Burnham KP, Anderson DR (2002) Model selection and inference: a practical information-theoretic approach, 2nd edn. Springer, New York

    MATH  Google Scholar 

  47. Ljung GM, Box GEP (1978) On a measure of lack of fit in time series models. Biometrika 65:297–303. https://doi.org/10.1093/biomet/65.2.297

    Article  MATH  Google Scholar 

  48. De Gooijer JG, Kumar K (1992) Some recent developments in non-linear time series modelling, testing, and forecasting. Int J Forecast 8:135–156. https://doi.org/10.1016/0169-2070(92)90115-P

    Article  Google Scholar 

  49. Wilks DS (2011) Statistical methods in the atmospheric sciences, 3rd edn. Elsevier, Amsterdam

    Google Scholar 

  50. Dockery DW, Pope CA (1994) Acute respiratory effects of particulate air pollution. Annu Rev Public Health 15:107–132. https://doi.org/10.1146/annurev.pu.15.050194.000543

    Article  Google Scholar 

  51. Yin P, He G, Fan M, Chiu KY, Fan M, Liu C, Xue A, Liu T, Pan Y, Mu Q, Zhou M (2017) Particulate air pollution and mortality in 38 of China’s largest cities: time series analysis. Brit Med J 356:j667. https://doi.org/10.1136/bmj.j667

    Article  Google Scholar 

  52. Katsouyanni K, Touloumi G, Spix C, Schwartz J, Balducci F, Medina S, Rossi G, Wojtyniak B, Sunyer J, Bacharova L (1997) Short term effects of ambient sulphur dioxide and particulate matter on mortality in 12 European cities: results from time series data from the APHEA project. Brit Med J 314:1658–1663. https://doi.org/10.1136/bmj.314.7095.1658

    Article  Google Scholar 

  53. European Environment Agency (2017) Air quality in Europe—2017 report, EEA Report 13. https://www.eea.europa.eu/publications/air-quality-in-europe-2017. Accessed 15 July 2019

  54. European Environment Agency (2018) Air quality in Europe—2018 report, EEA Report 12. https://www.eea.europa.eu//publications/air-quality-in-europe-2018. Accessed 15 July 2019

  55. National System for Environmental Monitoring, Bulgaria (2013). http://eea.government.bg/en/nsmos/index.html. Accessed 15 July 2019

  56. Executive Environment Agency (ExEA), Bulgaria. http://eea.government.bg/en Accessed 15 July 2019

  57. Air Quality Guidelines for Europe (2000) 2nd edn, World Health Organization (WHO), Regional Office for Europe, Copenhagen. http://apps.who.int/iris/handle/10665/107335. Accessed 15 July 2019

  58. Regional Inspectorate of Environment and Water—Ruse, Reports on the state of the environment (2011–2016). http://www.riosv-ruse.org/doklad-za-sastoyanieto-na-okolnata-sreda.html. Accessed 15 July 2019 (in Bulgarian)

  59. RIOSV Pernik: Report on the state of air quality (2010–2014). http://pk.riosv-pernik.com/index.php?option=com_content&view=category&id=74:revisheniq&Itemid=28&layout=default (in Bulgarian). Accessed 15 July 2019

  60. Ruse Historical Weather. https://www.worldweatheronline.com/ruse-weather-history/ruse/bg.aspx. Accessed 15 July 2019

  61. Pernik Historical Weather. https://www.worldweatheronline.com/pernik-weather-history/pernik/bg.aspx. Accessed 15 July 2019

  62. ALADIN Project for weather forecasts, Bulgaria (2019). http://www.weather.bg/0index.php?koiFail=cities1&lng=1&ci=Ruse&gr=Ruse. Accessed 15 July 2019

Download references

Acknowledgements

This work was supported by the Grant No. BG05M2OP001-1.001-0003, financed by the Science and Education for Smart Growth Operational Program (2014–2020), co-financed by the European Union through the European structural and Investment funds. We want to express our gratitude to the independent reviewers for the valuable advice and feedback, which helped improve the scientific value of this study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Snezhana Georgieva Gocheva-Ilieva.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gocheva-Ilieva, S.G., Voynikova, D.S., Stoimenova, M.P. et al. Regression trees modeling of time series for air pollution analysis and forecasting. Neural Comput & Applic 31, 9023–9039 (2019). https://doi.org/10.1007/s00521-019-04432-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-019-04432-1

Keywords

Mathematics Subject Classification

Navigation