Abstract
The combination of multiple classifiers was proven to be useful in many applications to improve the classification task and stabilize results. In this paper we used the Optimum-Path Forest (OPF) classifier to investigate input data manipulation techniques in order to use less data from the training set without hampering the classification accuracy. The data undersampling can be useful to speed-up the classification task, and could be specially useful with large datasets. The results indicate that the OPF-based ensemble methods allow a significant reduction on the size of the training set, while maintaining or slightly improving accuracy. We provide intuition for a case of failure and report the results of synthetic and real datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Breiman, L.: Bagging predictors. Machine Learning Journal 24(2), 123–140 (1996)
Breiman, L.: Pasting small votes for classification in large databases and on-line. Machine Learning 36, 85–103 (1999)
Brown, G., Wyatt, J., Harris, R., Yao, X.: Diversity creation methods: a survey and categorisation. J. Information Fusion 6(1), 1–28 (2005)
Chawla, N.V., Hall, L.O., Bowyer, K.W., Moore Jr., T.E., Kegelmeyer, W.P.: Distributed pasting of small votes. In: Roli, F., Kittler, J. (eds.) MCS 2002. LNCS, vol. 2364, pp. 52–61. Springer, Heidelberg (2002)
Domingos, P.: A few useful things to know about machine learning. Communications of the ACM 55(10), 78–87 (2012)
Duin, R.P.W.: Prtools v.3 - a matlab toolbox for pattern recognition. In: Proc. of SPIE, p. 1331 (2000)
Frank, A., Asuncion, A.: UCI machine learning repository (2010), http://archive.ics.uci.edu/ml
Geng, G.G., Wang, C.H., Li, Q.D., Xu, L., Jin, X.B.: Boosting the performance of web spam detection with ensemble under-sampling classification. In: Fourth International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2007, vol. 4, pp. 583–587 (2007)
Ho, T.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998)
Li, J., Wang, J.Z.: Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Trans. Pattern Analysis and Machine Intelligence 25(9), 1075–1088 (2003)
Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 39(2), 539–550 (2009)
Louppe, G., Geurts, P.: Ensembles on random patches. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012, Part I. LNCS, vol. 7523, pp. 346–361. Springer, Heidelberg (2012)
Martinez-Munoz, G., Suarez, A.: Out-of-bag estimation of the optimal sample size in bagging. Pattern Recognition 43, 143–152 (2010)
Papa, J.P., Falcao, A.X., Suzuki, C.T.N.: LibOPF: a library for optimum-path forest (OPF) classifiers (2009), http://www.ic.unicamp.br/~afalcao/libopf/
Papa, J., Falcao, A.X., Suzuki, C.T.N.: Supervised pattern classification based on optimum-path forest. Int. J. Imaging Systems and Technology 19(2), 120–131 (2009)
Papa, J., Pagnin, A., Schellini, S., Ponti Jr., M., Spadotto, A., Guido, R.C., Chiachia, G., Falcao, A.X.: Feature selection through gravitational search algorithm. In: 36th Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), pp. 2052–2055. IEEE, Prague (2011)
Ponti Jr., M.P.: Classifier combination: from the creation of ensembles to the decision fusion. In: IEEE Proceedings of the 24th SIBGRAPI Conference on Graphics, Patterns and Images Tutorials (SIBGRAPI-T), pp. 1–10. IEEE, Alagoas (2011)
Ponti Jr., M.P., Papa, J.P.: Improving accuracy and speed of Optimum-Path Forest classifier using combination of disjoint training subsets. In: Sansone, C., Kittler, J., Roli, F. (eds.) MCS 2011. LNCS, vol. 6713, pp. 237–248. Springer, Heidelberg (2011)
Ponti Jr., M.P., Papa, J.P., Levada, A.L.M.: A Markov Random Field model for combining Optimum-Path Forest classifiers using decision graphs and Game Strategy Approach. In: San Martin, C., Kim, S.-W. (eds.) CIARP 2011. LNCS, vol. 7042, pp. 581–590. Springer, Heidelberg (2011)
Tahir, M.A., Kittler, J., Mikolajczyk, K., Yan, F.: A multiple expert approach to the class imbalance problem using inverse random under sampling. In: Benediktsson, J.A., Kittler, J., Roli, F. (eds.) MCS 2009. LNCS, vol. 5519, pp. 82–91. Springer, Heidelberg (2009)
Valentini, G.: An experimental bias-variance analysis of svm ensembles based on resampling techniques. IEEE Trans. Systems, Man and Cybernetics — Part B 35(6) (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ponti, M.P., Rossi, I. (2013). Ensembles of Optimum-Path Forest Classifiers Using Input Data Manipulation and Undersampling. In: Zhou, ZH., Roli, F., Kittler, J. (eds) Multiple Classifier Systems. MCS 2013. Lecture Notes in Computer Science, vol 7872. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38067-9_21
Download citation
DOI: https://doi.org/10.1007/978-3-642-38067-9_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38066-2
Online ISBN: 978-3-642-38067-9
eBook Packages: Computer ScienceComputer Science (R0)