Skip to main content

Model Selection of Symbolic Regression to Improve the Accuracy of PM2.5 Concentration Prediction

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9441))

Abstract

As one of the main components of haze, topics with respect to PM2.5 are coming into people’s sight recently in China. In this paper, we try to predict PM2.5 concentrations in Dalian, China via symbolic regression (SR) based on genetic programming (GP). During predicting, the key problem is how to select accurate models by proper interestingness measures. In addition to the commonly used measures, such as R-squared value, mean squared error, number of parameters, etc., we also study the effectiveness of a set of potentially useful measures, such as AIC, BIC, HQC, AICc and EDC. Besides, a new interestingness measure, namely Interestingness Elasticity (IE), is proposed in this paper. From the experimental results, we find that the new measure gains the best performance on selecting candidate models and shows promising extrapolative capability.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Chan, C.K., Yao, X.: Air pollution in mega cities in China. Atmos. Environ. 42(1), 1–42 (2008)

    Article  Google Scholar 

  2. Pope III, C.A., Dockery, D.W.: Health effects of fine particulate air pollution: lines that connect. J. Air Waste Manage. Assoc. 56(6), 709–742 (2006)

    Article  Google Scholar 

  3. Vladislavleva, E.J., Smits, G.F., Den Hertog, D.: Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming. Trans. Evol. Comput. IEEE 13(2), 333–349 (2009)

    Article  Google Scholar 

  4. Cherkassky, V., Ma, Y.: Comparison of model selection for regression. Neural Comput. 15(7), 1691–1714 (2003)

    Article  MATH  Google Scholar 

  5. Wagenmakers, E.J., Farrell, S.: AIC model selection using Akaike weights. Psychon. Bull. Rev. 11(1), 192–196 (2004)

    Article  MathSciNet  Google Scholar 

  6. Chen, H., Huang, S.: A comparative study on model selection and multiple model fusion. In: 2005 8th International Conference on Information Fusion, pp. 820–826. IEEE (2005)

    Google Scholar 

  7. Garg, A., Sriram, S., Tai, K.: Empirical analysis of model selection criteria for genetic programming in modeling of time series system. In: Conference on Computational Intelligence for Financial Engineering and Economics (CIFEr), pp. 90–94. IEEE (2013)

    Google Scholar 

  8. Posada, D., Buckley, T.R.: Model selection and model averaging in phylogenetics: advantages of Akaike information criterion and Bayesian approaches over likelihood ratio tests. Syst. Biol. 53(5), 793–808 (2004)

    Article  Google Scholar 

  9. Koza, J.R., Rice, J.P.: Genetic programming II: automatic discovery of reusable programs. MIT Press, Cambridge (1994)

    MATH  Google Scholar 

  10. Kaboudan, M.A.: A measure of time series’ predictability using genetic programming applied to stock returns. J. Forecast. 18(5), 345–357 (1999)

    Article  Google Scholar 

  11. Montaña, J.L., Alonso, C.L., Borges, C.E., de la Dehesa, J.: Penalty functions for genetic programming algorithms. In: Murgante, B., Gervasi, O., Iglesias, A., Taniar, D., Apduhan, B.O. (eds.) ICCSA 2011, Part I. LNCS, vol. 6782, pp. 550–562. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  12. Myung, I.J.: The importance of complexity in model selection. J. Math. Psychol. 44(1), 190–204 (2000)

    Article  MATH  Google Scholar 

  13. Akaike, H.: An information criterion (AIC). Math. Sci. 14(153), 5–9 (1976)

    Google Scholar 

  14. Yamaoka, K., Nakagawa, T., Uno, T.: Application of Akaike’s information criterion (AIC) in the evaluation of linear pharmacokinetic equations. J. Pharmacokinet. Biopharm. 6(2), 165–175 (1978)

    Article  Google Scholar 

  15. Bozdogan, H.: Model selection and Akaike’s information criterion (AIC): the general theory and its analytical extensions. Psychometrika 52(3), 345–370 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  16. Seghouane, A.K., Bekara, M.: A small sample model selection criterion based on Kullback’s symmetric divergence. Trans. Signal Process. IEEE 52(12), 3314–3323 (2004)

    Article  MathSciNet  Google Scholar 

  17. Hurvich, C.M., Tsai, C.L.: Regression and time series model selection in small samples. Biometrika 76(2), 297–307 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  18. Burnham, K.P., Anderson, D.R.: Multimodel inference understanding AIC and BIC in model selection. Sociol. Methods Res. 33(2), 261–304 (2004)

    Article  MathSciNet  Google Scholar 

  19. Hannan, E.J., Quinn, B.G.: The determination of the order of an autoregression. J. R. Stat. Soc. Ser. B (Methodol.), 190–195 (1979)

    Google Scholar 

  20. Kundu, D., Murali, G.: Model selection in linear regression. Comput. Stat. Data Anal. 22(5), 461–469 (1996)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (71001016).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guangfei Yang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Yang, G., Huang, J. (2015). Model Selection of Symbolic Regression to Improve the Accuracy of PM2.5 Concentration Prediction. In: Li, XL., Cao, T., Lim, EP., Zhou, ZH., Ho, TB., Cheung, D. (eds) Trends and Applications in Knowledge Discovery and Data Mining. Lecture Notes in Computer Science(), vol 9441. Springer, Cham. https://doi.org/10.1007/978-3-319-25660-3_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25660-3_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25659-7

  • Online ISBN: 978-3-319-25660-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics