Skip to main content

Nonparametric Statistical Analysis of Machine Learning Algorithms for Regression Problems

  • Conference paper
Knowledge-Based and Intelligent Information and Engineering Systems (KES 2010)

Abstract

Several experiments aimed to apply recently proposed statistical procedures which are recommended for analysing multiple 1×n and n×n comparisons of machine learning algorithms were conducted. 11 regression algorithms comprising 5 deterministic and 6 neural network ones implemented in the data mining system KEEL were employed. All experiments were performed using 29 benchmark datasets for regression. The investigation proved the usefulness and strength of multiple comparison statistical procedures to analyse and select machine learning algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Alcalá-Fdez, J., et al.: KEEL: A Software Tool to Assess Evolutionary Algorithms for Data Mining Problems. Soft Computing 13(3), 307–318 (2009)

    Article  Google Scholar 

  2. Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html

  3. Bergmann, G., Hommel, G.: Improvements of general multiple test procedures for redundant systems of hypotheses. In: Bauer, P., Hommel, G., Sonnemann, E. (eds.) Multiple Hypotheses Testing, pp. 100–115. Springer, Berlin (1988)

    Google Scholar 

  4. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)

    Google Scholar 

  5. Dunn, O.J.: Multiple comparisons among means. Journal of the American Statistical Association 56(238), 52–64 (1961)

    Article  MATH  MathSciNet  Google Scholar 

  6. Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. of the American Statistical Assoc. 32(200), 675–701 (1937)

    Article  Google Scholar 

  7. García, S., Fernandez, A., Luengo, J., Herrera, F.: A Study of Statistical Techniques and Performance Measures for Genetics-Based Machine Learning: Accuracy and Interpretability. Soft Computing 13(10), 959–977 (2009)

    Article  Google Scholar 

  8. García, S., Herrera, F.: An Extension on “Statistical Comparisons of Classifiers over Multiple Data Sets” for all Pairwise Comparisons. Journal of Machine Learning Research 9, 2677–2694 (2008)

    Google Scholar 

  9. Graczyk, M., Lasota, T., Trawiński, B.: Comparative Analysis of Premises Valuation Models Using KEEL, RapidMiner, and WEKA. In: Nguyen, N.T., Kowalczyk, R., Chen, S.-M. (eds.) ICCCI 2009. LNCS (LNAI), vol. 5796, pp. 800–812. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  10. Güvenir, H.A., Uysal, I.: Function Approximation Repository, Bilkent University (2000), http://funapp.cs.bilkent.edu.tr

  11. Hochberg, Y.: A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75, 800–803 (1988)

    Article  MATH  MathSciNet  Google Scholar 

  12. Holm, S.: A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6, 65–70 (1979)

    MATH  MathSciNet  Google Scholar 

  13. Hommel, G., Bernhard, G.: A rapid algorithm and a computer program for multiple test procedures using procedures using logical structures of hypotheses. Computer Methods and Programs in Biomedicine 43, 213–216 (1994)

    Article  Google Scholar 

  14. Iman, R.L., Davenport, J.M.: Approximations of the critical region of the Friedman statistic. Communications in Statistics 18, 571–595 (1980)

    Article  Google Scholar 

  15. KEEL (Knowledge Extraction based on Evolutionary Learning), KEEL-dataset, http://www.keel.es

  16. Krzystanek, M., Lasota, T., Trawiński, B.: Comparative Analysis of Evolutionary Fuzzy Models for Premises Valuation Using KEEL. In: Nguyen, N.T., Kowalczyk, R., Chen, S.-M. (eds.) ICCCI 2009. LNCS (LNAI), vol. 5796, pp. 838–849. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  17. Lasota, T., Makos, M., Trawiński, B.: Comparative Analysis of Regression Tree Models for Premises Valuation Using Statistica Data Miner. In: Nguyen, N.T., et al. (eds.) New Challenges in Computational Collective Intelligence. SCI, vol. 244, pp. 337–348. Springer, Berlin (2009)

    Chapter  Google Scholar 

  18. Lasota, T., Mazurkiewicz, J., Trawiński, B., Trawiński, K.: Comparison of Data Driven Models for the Validation of Residential Premises using KEEL. International Journal of Hybrid Intelligent Systems 7(1), 3–16 (2010)

    MATH  Google Scholar 

  19. Lasota, T., Sachnowski, P., Trawiński, B.: Comparative Analysis of Regression Tree Models for Premises Valuation Using Statistica Data Miner. In: Nguyen, N.T., Kowalczyk, R., Chen, S.-M. (eds.) ICCCI 2009. LNCS, vol. 5796, pp. 776–787. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  20. Luengo, J., García, S., Herrera, F.: A Study on the Use of Statistical Tests for Experimentation with Neural Networks: Analysis of Parametric Test Conditions and Non-Parametric Tests. Expert Systems with Applications 36, 7798–7808 (2009)

    Article  Google Scholar 

  21. Nemenyi, P.B.: Distribution-free Multiple comparisons. PhD thesis, Princeton University (1963)

    Google Scholar 

  22. Salzberg, S.L.: On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach. Data Mining and Knowledge Discovery 1, 317–327 (1997)

    Article  Google Scholar 

  23. Shaffer, J.P.: Modified sequentially rejective multiple test procedures. Journal of the American Statistical Association 81(395), 826–831 (1986)

    Article  MATH  Google Scholar 

  24. Shaffer, J.P.: Multiple hypothesis testing. Ann. Rev. of Psych. 46, 561–584 (1995)

    Article  Google Scholar 

  25. Sheskin, D.J.: Handbook of Parametric and Nonparametric Statistical Procedures, 4th edn. Chapman & Hall/CRC, Boca Raton (2007)

    MATH  Google Scholar 

  26. Torgo, L.: University of Porto (LIACC), Regression DataSets, http://www.liaad.up.pt/~ltorgo/Regression/DataSets.html

  27. Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics 1, 80–83 (1945)

    Article  Google Scholar 

  28. Wright, S.P.: Adjusted p-values for simultaneous inference. Biometrics 48, 1005–1013 (1992)

    Article  Google Scholar 

  29. Yeh, I.-C.: Modeling of strength of high performance concrete using artificial neural networks. Cement and Concrete Research 28(12), 1797–1808 (1998)

    Article  Google Scholar 

  30. Zar, J.H.: Biostatistical Analysis, 5th edn. Prentice-Hall, Englewood Cliffs (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Graczyk, M., Lasota, T., Telec, Z., Trawiński, B. (2010). Nonparametric Statistical Analysis of Machine Learning Algorithms for Regression Problems. In: Setchi, R., Jordanov, I., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based and Intelligent Information and Engineering Systems. KES 2010. Lecture Notes in Computer Science(), vol 6276. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15387-7_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15387-7_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15386-0

  • Online ISBN: 978-3-642-15387-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics