Skip to main content

Complexity-Driven Sampling for Bagging

  • Conference paper
  • First Online:
Intelligent Data Engineering and Automated Learning – IDEAL 2023 (IDEAL 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14404))

  • 414 Accesses

Abstract

Ensemble learning consists of combining the prediction of different learners to obtain a final output. One key step for their success is the diversity among the learners. In this paper, we propose to reach the diversity in terms of the classification complexity by guiding the sampling of instances in the Bagging algorithm with complexity measures. The proposed Complexity-driven Bagging algorithm complements the classic Bagging algorithm by considering training samples of different complexity to cover the complexity space. Besides, the algorithm admits any complexity measure to guide the sampling. The proposal is tested in 28 real datasets and for a total of 9 complexity measures, providing satisfactory and promising results and revealing that training with samples of different complexity, ranging from easy to hard samples, is the best strategy when sampling based on complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996)

    Article  MATH  Google Scholar 

  2. Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45014-9_1

    Chapter  Google Scholar 

  3. Dua, D., Graff, C.: UCI machine learning repository (2017). https://archive.ics.uci.edu/ml

  4. Freund, Y., Schapire, R.E., et al.: Experiments with a new boosting algorithm. In: ICML, vol. 96, pp. 148–156 (1996)

    Google Scholar 

  5. Hansen, L.K., Salamon, P.: Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 12(10), 993–1001 (1990)

    Article  Google Scholar 

  6. Ho, T.K., Basu, M.: Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 289–300 (2002)

    Article  Google Scholar 

  7. Kabir, A., Ruiz, C., Alvarez, S.A.: Mixed bagging: a novel ensemble learning framework for supervised classification based on instance hardness. In: 2018 IEEE International Conference on Data Mining (ICDM), pp. 1073–1078. IEEE (2018)

    Google Scholar 

  8. Lancho, C., Martín De Diego, I., Cuesta, M., Acena, V., Moguerza, J.M.: Hostility measure for multi-level study of data complexity. Appl. Intell. 53, 1–24 (2022)

    Google Scholar 

  9. Lorena, A.C., Garcia, L.P., Lehmann, J., Souto, M.C., Ho, T.K.: How complex is your classification problem? a survey on measuring classification complexity. ACM Comput. Surv. (CSUR) 52(5), 1–34 (2019)

    Article  Google Scholar 

  10. Monteiro, M., Jr., Britto, A.S., Jr., Barddal, J.P., Oliveira, L.S., Sabourin, R.: Exploring diversity in data complexity and classifier decision spaces for pool generation. Inf. Fusion 89, 567–587 (2023)

    Article  Google Scholar 

  11. Sagi, O., Rokach, L.: Ensemble learning: a survey. Wiley Interdisc. Rev. Data Min. Knowl. Discovery 8(4), e1249 (2018)

    Article  Google Scholar 

  12. Sleeman IV, W.C., Krawczyk, B.: Bagging using instance-level difficulty for multi-class imbalanced big data classification on spark. In: 2019 IEEE International Conference on Big Data, pp. 2484–2493. IEEE (2019)

    Google Scholar 

  13. Smith, M.R., Martinez, T., Giraud-Carrier, C.: An instance level analysis of data complexity. Mach. Learn. 95(2), 225–256 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  14. Walmsley, F.N., Cavalcanti, G.D., Oliveira, D.V., Cruz, R.M., Sabourin, R.: An ensemble generation method based on instance hardness. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2018)

    Google Scholar 

Download references

Acknowledgements

This research is supported by grants from Rey Juan Carlos University (Ref: C1PREDOC2020) and the Spanish Ministry of Science and Innovation, under the Knowledge Generation Projects program: XMIDAS (Ref: PID2021-122640OB-100). A. C. Lorena would also like to thank the financial support of the FAPESP research agency (grant 2021/06870-3).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carmen Lancho .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lancho, C., C. P. de Souto, M., Lorena, A.C., Martín de Diego, I. (2023). Complexity-Driven Sampling for Bagging. In: Quaresma, P., Camacho, D., Yin, H., Gonçalves, T., Julian, V., Tallón-Ballesteros, A.J. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2023. IDEAL 2023. Lecture Notes in Computer Science, vol 14404. Springer, Cham. https://doi.org/10.1007/978-3-031-48232-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-48232-8_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-48231-1

  • Online ISBN: 978-3-031-48232-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics