Complexity-Driven Sampling for Bagging

Lancho, Carmen; C. P. de Souto, Marcilio; Lorena, Ana C.; Martín de Diego, Isaac

doi:10.1007/978-3-031-48232-8_2

Carmen Lancho¹³,
Marcilio C. P. de Souto¹⁴,
Ana C. Lorena¹⁵ &
…
Isaac Martín de Diego¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14404))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

414 Accesses

Abstract

Ensemble learning consists of combining the prediction of different learners to obtain a final output. One key step for their success is the diversity among the learners. In this paper, we propose to reach the diversity in terms of the classification complexity by guiding the sampling of instances in the Bagging algorithm with complexity measures. The proposed Complexity-driven Bagging algorithm complements the classic Bagging algorithm by considering training samples of different complexity to cover the complexity space. Besides, the algorithm admits any complexity measure to guide the sampling. The proposal is tested in 28 real datasets and for a total of 9 complexity measures, providing satisfactory and promising results and revealing that training with samples of different complexity, ranging from easy to hard samples, is the best strategy when sampling based on complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996)
Article MATH Google Scholar
Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45014-9_1
Chapter Google Scholar
Dua, D., Graff, C.: UCI machine learning repository (2017). https://archive.ics.uci.edu/ml
Freund, Y., Schapire, R.E., et al.: Experiments with a new boosting algorithm. In: ICML, vol. 96, pp. 148–156 (1996)
Google Scholar
Hansen, L.K., Salamon, P.: Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 12(10), 993–1001 (1990)
Article Google Scholar
Ho, T.K., Basu, M.: Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 289–300 (2002)
Article Google Scholar
Kabir, A., Ruiz, C., Alvarez, S.A.: Mixed bagging: a novel ensemble learning framework for supervised classification based on instance hardness. In: 2018 IEEE International Conference on Data Mining (ICDM), pp. 1073–1078. IEEE (2018)
Google Scholar
Lancho, C., Martín De Diego, I., Cuesta, M., Acena, V., Moguerza, J.M.: Hostility measure for multi-level study of data complexity. Appl. Intell. 53, 1–24 (2022)
Google Scholar
Lorena, A.C., Garcia, L.P., Lehmann, J., Souto, M.C., Ho, T.K.: How complex is your classification problem? a survey on measuring classification complexity. ACM Comput. Surv. (CSUR) 52(5), 1–34 (2019)
Article Google Scholar
Monteiro, M., Jr., Britto, A.S., Jr., Barddal, J.P., Oliveira, L.S., Sabourin, R.: Exploring diversity in data complexity and classifier decision spaces for pool generation. Inf. Fusion 89, 567–587 (2023)
Article Google Scholar
Sagi, O., Rokach, L.: Ensemble learning: a survey. Wiley Interdisc. Rev. Data Min. Knowl. Discovery 8(4), e1249 (2018)
Article Google Scholar
Sleeman IV, W.C., Krawczyk, B.: Bagging using instance-level difficulty for multi-class imbalanced big data classification on spark. In: 2019 IEEE International Conference on Big Data, pp. 2484–2493. IEEE (2019)
Google Scholar
Smith, M.R., Martinez, T., Giraud-Carrier, C.: An instance level analysis of data complexity. Mach. Learn. 95(2), 225–256 (2014)
Article MathSciNet MATH Google Scholar
Walmsley, F.N., Cavalcanti, G.D., Oliveira, D.V., Cruz, R.M., Sabourin, R.: An ensemble generation method based on instance hardness. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2018)
Google Scholar

Download references

Acknowledgements

This research is supported by grants from Rey Juan Carlos University (Ref: C1PREDOC2020) and the Spanish Ministry of Science and Innovation, under the Knowledge Generation Projects program: XMIDAS (Ref: PID2021-122640OB-100). A. C. Lorena would also like to thank the financial support of the FAPESP research agency (grant 2021/06870-3).

Author information

Authors and Affiliations

Data Science Laboratory, Rey Juan Carlos University, C/ Tulipán, s/n, 28933, Móstoles, Spain
Carmen Lancho & Isaac Martín de Diego
Fundamental Computer Science Laboratory, University of Orléans, Léonard de Vinci, B.P. 6759 F-45067, Orleans Cedex 2, France
Marcilio C. P. de Souto
Aeronautics Institute of Technology, Praça Marechal Eduardo Gomes, 50, São José dos Campos, São Paulo, 12228-900, Brazil
Ana C. Lorena

Authors

Carmen Lancho
View author publications
You can also search for this author in PubMed Google Scholar
Marcilio C. P. de Souto
View author publications
You can also search for this author in PubMed Google Scholar
Ana C. Lorena
View author publications
You can also search for this author in PubMed Google Scholar
Isaac Martín de Diego
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carmen Lancho .

Editor information

Editors and Affiliations

University of Évora, Évora, Portugal
Paulo Quaresma
Technical University of Madrid, Madrid, Spain
David Camacho
University of Manchester, Manchester, UK
Hujun Yin
University of Évora, Évora, Portugal
Teresa Gonçalves
Polytechnic University of Valencia, Valencia, Spain
Vicente Julian
University of Huelva, Huelva, Spain
Antonio J. Tallón-Ballesteros

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lancho, C., C. P. de Souto, M., Lorena, A.C., Martín de Diego, I. (2023). Complexity-Driven Sampling for Bagging. In: Quaresma, P., Camacho, D., Yin, H., Gonçalves, T., Julian, V., Tallón-Ballesteros, A.J. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2023. IDEAL 2023. Lecture Notes in Computer Science, vol 14404. Springer, Cham. https://doi.org/10.1007/978-3-031-48232-8_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-48232-8_2
Published: 15 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-48231-1
Online ISBN: 978-3-031-48232-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Complexity-Driven Sampling for Bagging