Skip to main content

Ensembles of Optimum-Path Forest Classifiers Using Input Data Manipulation and Undersampling

  • Conference paper
Multiple Classifier Systems (MCS 2013)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7872))

Included in the following conference series:

Abstract

The combination of multiple classifiers was proven to be useful in many applications to improve the classification task and stabilize results. In this paper we used the Optimum-Path Forest (OPF) classifier to investigate input data manipulation techniques in order to use less data from the training set without hampering the classification accuracy. The data undersampling can be useful to speed-up the classification task, and could be specially useful with large datasets. The results indicate that the OPF-based ensemble methods allow a significant reduction on the size of the training set, while maintaining or slightly improving accuracy. We provide intuition for a case of failure and report the results of synthetic and real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Breiman, L.: Bagging predictors. Machine Learning Journal 24(2), 123–140 (1996)

    MathSciNet  MATH  Google Scholar 

  2. Breiman, L.: Pasting small votes for classification in large databases and on-line. Machine Learning 36, 85–103 (1999)

    Article  Google Scholar 

  3. Brown, G., Wyatt, J., Harris, R., Yao, X.: Diversity creation methods: a survey and categorisation. J. Information Fusion 6(1), 1–28 (2005)

    Article  Google Scholar 

  4. Chawla, N.V., Hall, L.O., Bowyer, K.W., Moore Jr., T.E., Kegelmeyer, W.P.: Distributed pasting of small votes. In: Roli, F., Kittler, J. (eds.) MCS 2002. LNCS, vol. 2364, pp. 52–61. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  5. Domingos, P.: A few useful things to know about machine learning. Communications of the ACM 55(10), 78–87 (2012)

    Article  Google Scholar 

  6. Duin, R.P.W.: Prtools v.3 - a matlab toolbox for pattern recognition. In: Proc. of SPIE, p. 1331 (2000)

    Google Scholar 

  7. Frank, A., Asuncion, A.: UCI machine learning repository (2010), http://archive.ics.uci.edu/ml

  8. Geng, G.G., Wang, C.H., Li, Q.D., Xu, L., Jin, X.B.: Boosting the performance of web spam detection with ensemble under-sampling classification. In: Fourth International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2007, vol. 4, pp. 583–587 (2007)

    Google Scholar 

  9. Ho, T.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998)

    Article  Google Scholar 

  10. Li, J., Wang, J.Z.: Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Trans. Pattern Analysis and Machine Intelligence 25(9), 1075–1088 (2003)

    Article  Google Scholar 

  11. Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 39(2), 539–550 (2009)

    Article  Google Scholar 

  12. Louppe, G., Geurts, P.: Ensembles on random patches. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012, Part I. LNCS, vol. 7523, pp. 346–361. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  13. Martinez-Munoz, G., Suarez, A.: Out-of-bag estimation of the optimal sample size in bagging. Pattern Recognition 43, 143–152 (2010)

    Article  MATH  Google Scholar 

  14. Papa, J.P., Falcao, A.X., Suzuki, C.T.N.: LibOPF: a library for optimum-path forest (OPF) classifiers (2009), http://www.ic.unicamp.br/~afalcao/libopf/

  15. Papa, J., Falcao, A.X., Suzuki, C.T.N.: Supervised pattern classification based on optimum-path forest. Int. J. Imaging Systems and Technology 19(2), 120–131 (2009)

    Article  Google Scholar 

  16. Papa, J., Pagnin, A., Schellini, S., Ponti Jr., M., Spadotto, A., Guido, R.C., Chiachia, G., Falcao, A.X.: Feature selection through gravitational search algorithm. In: 36th Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), pp. 2052–2055. IEEE, Prague (2011)

    Google Scholar 

  17. Ponti Jr., M.P.: Classifier combination: from the creation of ensembles to the decision fusion. In: IEEE Proceedings of the 24th SIBGRAPI Conference on Graphics, Patterns and Images Tutorials (SIBGRAPI-T), pp. 1–10. IEEE, Alagoas (2011)

    Chapter  Google Scholar 

  18. Ponti Jr., M.P., Papa, J.P.: Improving accuracy and speed of Optimum-Path Forest classifier using combination of disjoint training subsets. In: Sansone, C., Kittler, J., Roli, F. (eds.) MCS 2011. LNCS, vol. 6713, pp. 237–248. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  19. Ponti Jr., M.P., Papa, J.P., Levada, A.L.M.: A Markov Random Field model for combining Optimum-Path Forest classifiers using decision graphs and Game Strategy Approach. In: San Martin, C., Kim, S.-W. (eds.) CIARP 2011. LNCS, vol. 7042, pp. 581–590. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  20. Tahir, M.A., Kittler, J., Mikolajczyk, K., Yan, F.: A multiple expert approach to the class imbalance problem using inverse random under sampling. In: Benediktsson, J.A., Kittler, J., Roli, F. (eds.) MCS 2009. LNCS, vol. 5519, pp. 82–91. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  21. Valentini, G.: An experimental bias-variance analysis of svm ensembles based on resampling techniques. IEEE Trans. Systems, Man and Cybernetics — Part B 35(6) (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ponti, M.P., Rossi, I. (2013). Ensembles of Optimum-Path Forest Classifiers Using Input Data Manipulation and Undersampling. In: Zhou, ZH., Roli, F., Kittler, J. (eds) Multiple Classifier Systems. MCS 2013. Lecture Notes in Computer Science, vol 7872. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38067-9_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38067-9_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38066-2

  • Online ISBN: 978-3-642-38067-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics