Skip to main content

Investigation of Random Subspace and Random Forest Regression Models Using Data with Injected Noise

  • Conference paper
Knowledge Engineering, Machine Learning and Lattice Computing with Applications (KES 2012)

Abstract

The ensemble machine learning methods incorporating random subspace and random forest employing genetic fuzzy rule-based systems as base learning algorithms were developed in Matlab environment. The methods were applied to the real-world regression problem of predicting the prices of residential premises based on historical data of sales/purchase transactions. The accuracy of ensembles generated by the proposed methods was compared with bagging, repeated holdout, and repeated cross-validation models. The tests were made for four levels of noise injected into the benchmark datasets. The analysis of the results was performed using statistical methodology including nonparametric tests followed by post-hoc procedures designed especially for multiple N×N comparisons.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Atla, A., Tada, R., Sheng, V., Singireddy, N.: Sensitivity of different machine learning algorithms to noise. Journal of Computing Sciences in Colleges 26(5), 96–103 (2011)

    Google Scholar 

  2. Breiman, L.: Bagging Predictors. Machine Learning 24(2), 123–140 (1996)

    MathSciNet  MATH  Google Scholar 

  3. Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  4. Bryll, R.: Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets. Pattern Recognition 20(6), 1291–1302 (2003)

    Article  Google Scholar 

  5. Bühlmann, P., Yu, B.: Analyzing bagging. Annals of Statistics 30, 927–961 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  6. Cordón, O., Gomide, F., Herrera, F., Hoffmann, F., Magdalena, L.: Ten years of genetic fuzzy systems: current framework and new trends. Fuzzy Sets and Systems 141, 5–31 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  7. Cordón, O., Herrera, F.: A Two-Stage Evolutionary Process for Designing TSK Fuzzy Rule-Based Systems. IEEE Tr. on Sys., Man, and Cyb.-Part B 29(6), 703–715 (1999)

    Article  Google Scholar 

  8. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)

    MATH  Google Scholar 

  9. Fumera, G., Roli, F., Serrau, A.: A theoretical analysis of bagging as a linear combination of classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(7), 1293–1299 (2008)

    Article  Google Scholar 

  10. García, S., Herrera, F.: An Extension on “Statistical Comparisons of Classifiers over Multiple Data Sets” for all Pairwise Comparisons. Journal of Machine Learning Research 9, 2677–2694 (2008)

    MATH  Google Scholar 

  11. Gashler, M., Giraud-Carrier, C., Martinez, T.: Decision Tree Ensemble: Small Heterogeneous Is Better Than Large Homogeneous. In: 2008 Seventh International Conference on Machine Learning and Applications, ICMLA 2008, pp. 900–905 (2008)

    Google Scholar 

  12. Graczyk, M., Lasota, T., Trawiński, B.: Comparative analysis of premises valuation models using keel, rapidminer, and weka. In: Nguyen, N.T., Kowalczyk, R., Chen, S.-M. (eds.) ICCCI 2009. LNCS (LNAI), vol. 5796, pp. 800–812. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  13. Ho, T.K.: The Random Subspace Method for Constructing Decision Forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998)

    Article  Google Scholar 

  14. Kalapanidas, E., Avouris, N., Craciun, M., Neagu, D.: Machine Learning Algorithms: A study on noise sensitivity. In: Manolopoulos, Y., Spirakis, P. (eds.) Proc. 1st Balcan Conference in Informatics 2003, Thessaloniki, pp. 356–365 (November 2003)

    Google Scholar 

  15. Kempa, O., Lasota, T., Telec, Z., Trawiński, B.: Investigation of bagging ensembles of genetic neural networks and fuzzy systems for real estate appraisal. In: Nguyen, N.T., Kim, C.-G., Janiak, A. (eds.) ACIIDS 2011, Part II. LNCS, vol. 6592, pp. 323–332. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  16. Kotsiantis, S.: Combining bagging, boosting, rotation forest and random subspace methods. Artificial Intelligence Review 35(3), 223–240 (2011)

    Article  Google Scholar 

  17. Król, D., Lasota, T., Trawiński, B., Trawiński, K.: Investigation of Evolutionary Optimization Methods of TSK Fuzzy Model for Real Estate Appraisal. International Journal of Hybrid Intelligent Systems 5(3), 111–128 (2008)

    MATH  Google Scholar 

  18. Lasota, T., Mazurkiewicz, J., Trawiński, B., Trawiński, K.: Comparison of Data Driven Models for the Validation of Residential Premises using KEEL. International Journal of Hybrid Intelligent Systems 7(1), 3–16 (2010)

    MATH  Google Scholar 

  19. Lasota, T., Telec, Z., Trawiński, B., Trawiński, G.: Evaluation of Random Subspace and Random Forest Regression Models Based on Genetic Fuzzy Systems. In: Graña, M., et al. (eds.) Advances in Knowledge-Based and Intelligent Information and Engineering Systems, pp. 88–97. IOS Press, Amsterdam (2012)

    Google Scholar 

  20. Lasota, T., Telec, Z., Trawiński, B., Trawiński, K.: Investigation of the eTS Evolving Fuzzy Systems Applied to Real Estate Appraisal. Journal of Multiple-Valued Logic and Soft Computing 17(2-3), 229–253 (2011)

    Google Scholar 

  21. Lasota, T., Telec, Z., Trawiński, G., Trawiński, B.: Empirical comparison of resampling methods using genetic fuzzy systems for a regression problem. In: Yin, H., Wang, W., Rayward-Smith, V. (eds.) IDEAL 2011. LNCS, vol. 6936, pp. 17–24. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  22. Lasota, T., Telec, Z., Trawiński, G., Trawiński, B.: Empirical comparison of resampling methods using genetic neural networks for a regression problem. In: Corchado, E., Kurzyński, M., Woźniak, M. (eds.) HAIS 2011, Part II. LNCS (LNAI), vol. 6679, pp. 213–220. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  23. Lughofer, E., Trawiński, B., Trawiński, K., Kempa, O., Lasota, T.: On Employing Fuzzy Modeling Algorithms for the Valuation of Residential Premises. Information Sciences 181, 5123–5142 (2011)

    Article  Google Scholar 

  24. Nettleton, D.F., Orriols-Puig, A., Fornells, A.: A study of the effect of different types of noise on the precision of supervised learning techniques. Artificial Intelligence Review 33(4), 275–306 (2010)

    Article  Google Scholar 

  25. Opitz, D.W., Maclin, R.F.: Popular Ensemble Methods: An Empirical Study. Journal of Artificial Intelligence Research 11, 169–198 (1999)

    MATH  Google Scholar 

  26. Schapire, R.E.: The strength of weak learnability. Mach. Learning 5(2), 197–227 (1990)

    Google Scholar 

  27. Trawiński, B., Smętek, M., Telec, Z., Lasota, T.: Nonparametric Statistical Analysis for Multiple Comparison of Machine Learning Regression Algorithms. International Journal of Applied Mathematics and Computer Science 22(4),867–881 (2012)

    Google Scholar 

  28. Wolpert, D.H.: Stacked Generalization. Neural Networks 5(2), 241–259 (1992)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lasota, T., Telec, Z., Trawiński, B., Trawiński, G. (2013). Investigation of Random Subspace and Random Forest Regression Models Using Data with Injected Noise. In: Graña, M., Toro, C., Howlett, R.J., Jain, L.C. (eds) Knowledge Engineering, Machine Learning and Lattice Computing with Applications. KES 2012. Lecture Notes in Computer Science(), vol 7828. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37343-5_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37343-5_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37342-8

  • Online ISBN: 978-3-642-37343-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics