Skip to main content

On Properties of Undersampling Bagging and Its Extensions for Imbalanced Data

  • Conference paper
  • First Online:
Proceedings of the 9th International Conference on Computer Recognition Systems CORES 2015

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 403))

Abstract

Undersampling bagging ensembles specialized for class imbalanced data are considered. Particular attention is paid to Roughly Balanced Bagging, as it leads to better classification performance than other extensions of bagging. We experimentally analyze its properties with respect to bootstrap construction, deciding on the number of component classifiers, their diversity, and ability to deal with the most difficult types of the minority examples. We also discuss further extensions of undersampling bagging, where the data difficulty factors influence sampling examples into bootstraps.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We are grateful to our Master students Lukasz Idkowiak and Mateusz Lango for their help in implementing and testing these algorithms.

References

  1. Blaszczynski, J., Stefanowski, J., Idkowiak L.: Extending bagging for imbalanced data. In: Proceedings of the 8th CORES 2013. Springer Series on Advances in Intelligent Systems and Computing, vol. 226, pp. 269–278 (2013)

    Google Scholar 

  2. Blaszczynski, J., Stefanowski, J.: Neighbourhood sampling in bagging for imbalanced data. Neurocomputing 150-Part B, 529–542 (2015)

    Google Scholar 

  3. Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 99, 1–22 (2011)

    Google Scholar 

  4. He, H., Garcia, E.: Learning from imbalanced data. IEEE Trans. Data Knowl. Eng. 21(9), 1263–1284 (2009)

    Article  Google Scholar 

  5. He, H., Ma, Y. (eds.): IEEE Imbalanced Learning. Foundations, Algorithms and Applications. Wiley, NewYork (2013)

    Google Scholar 

  6. Hido, S., Kashima, H.: Roughly balanced bagging for imbalance data. Stat. Anal. Data Min. 2(5–6), 412–426 (2009)

    Article  MathSciNet  Google Scholar 

  7. Japkowicz, N., Mohak, S.: Evaluating Learning Algorithms: A Classification Perspective. Cambridge University Press, Cambridge (2011)

    Book  MATH  Google Scholar 

  8. Khoshgoftaar, T., Van Hulse, J., Napolitano, A.: Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Trans. Syst. Man Cybern.-Part A 41(3), 552–568 (2011)

    Article  Google Scholar 

  9. Krawczyk, N., Woźniak, M.: Analysis of diversity assurance methods for combined classifiers. In: Choraś, R.S. (ed.) Image Processing and Communications Challenges 4. Advances in Intelligent Systems and Computing, vol. 184, pp. 177–184. Springer, Heidelberg (2013)

    Google Scholar 

  10. Kuncheva, L.: Combining Pattern Classifiers: Methods and Algorithms, 2nd edn. Wiley, NewYork (2014)

    MATH  Google Scholar 

  11. Lopez, V., Fernandez, A., Garcia, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 257, 113–141 (2014)

    Article  Google Scholar 

  12. Napierała, K., Stefanowski, J., Wilk, S.: Learning from imbalanced data in presence of noisy and borderline examples. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) Rough Sets and Current Trends in Computing. Lecture Notes in Computer Science, vol. 6086, pp. 158–167. Springer, Heidelberg (2010)

    Google Scholar 

  13. Napierala, K., Stefanowski, J.: Identification of different types of minority class examples in imbalanced data. In: Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, S.-B. (eds.) Hybrid Artificial Intelligent Systems. Lecture Notes in Computer Science, vol. 7209, pp. 139–150. Springer, Heidelberg (2012)

    Google Scholar 

  14. Napierala, K., Stefanowski, J.: Types of minority class examples and their influence on learning classifiers from imbalanced data. J. Intell. Inf. Syst. (accepted) (2015). doi:10.1007/s10844-015-0368-1

    Google Scholar 

  15. Wang, S., Yao, T.: Diversity analysis on imbalanced data sets by using ensemble models. In Proc. IEEE Symp. Comput. Intell. Data Min. pp. 324–331 (2009)

    Google Scholar 

  16. Weiss, G.M.: Mining with rarity: a unifying framework. ACM SIGKDD Explor. Newsl. 6(1), 7–19 (2004)

    Article  Google Scholar 

Download references

Acknowledgments

The paper was partially funded by the Polish National Science Center Grant No. DEC-2013/11/B/ST6/00963.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jerzy Stefanowski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Stefanowski, J. (2016). On Properties of Undersampling Bagging and Its Extensions for Imbalanced Data. In: Burduk, R., Jackowski, K., Kurzyński, M., Woźniak, M., Żołnierek, A. (eds) Proceedings of the 9th International Conference on Computer Recognition Systems CORES 2015. Advances in Intelligent Systems and Computing, vol 403. Springer, Cham. https://doi.org/10.1007/978-3-319-26227-7_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26227-7_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26225-3

  • Online ISBN: 978-3-319-26227-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics