Skip to main content

Dynamic Centroid Insertion and Adjustment for Data Sets with Multiple Imbalanced Classes

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2019: Deep Learning (ICANN 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11728))

Included in the following conference series:

  • 3858 Accesses

Abstract

The imbalance problem is receiving an increasing attention in the literature. Studies on binary cases are recurrent but limited when considering the multiple classes approach. Solutions to imbalance domains may be divided into two groups, data level approaches, and algorithmic approaches. The first approach is more common and focuses on changing the training data aiming to balance the data set, oversampling the smallest classes, undersampling the biggest ones or using a combination of both. Instance reduction is another approach to the problem. It tries to find the best-reduced set of instances that represent the original training set. In this work, we propose a new Prototype Generation method called DCIA. It dynamically inserts new prototypes for each class and then adjusts their positions with a search algorithm. The set of generated prototypes may be used to train any classifier. Experiments showed its potentiality by enabling an 1NN classifier to perform sometimes as well or even better than some ensemble classifiers created for different multiclass imbalanced domains.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/EvandroJRSilva/VDBC.

  2. 2.

    https://github.com/chongshengzhang/Multi_Imbalance.

  3. 3.

    https://github.com/liyijing024/AMCS.

References

  1. Alcalá-Fdez, J., et al.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Mult. Valued Log. Soft Comput. 17(2–3), 255–287 (2011)

    Google Scholar 

  2. Asuncion, A., Newman, D.: UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences (2007). http://www.ics.uci.edu/~mlearn/MLRepository.html

  3. Bi, J., Zhang, C.: An empirical comparison on state-of-the-art multi-class imbalance learning algorithms and a new diversified ensemble learning scheme. Knowl. Based Syst. 158, 81–93 (2018). https://doi.org/10.1016/j.knosys.2018.05.037

    Article  Google Scholar 

  4. Cheng, R., Jin, Y.: A competitive swarm optimizer for large scale optimization. IEEE Trans. Cybern. 45(2), 191–204 (2015). https://doi.org/10.1109/TCYB.2014.2322602

    Article  Google Scholar 

  5. Fernández, A., López, V., Galar, M., del Jesus, M.J., Herrera, F.: Analysing the classification of imbalanced data-sets with multiple classes: binarization techniques and ad-hoc approaches. Knowl. Based Syst. 42, 97–110 (2013). https://doi.org/10.1016/j.knosys.2013.01.018

    Article  Google Scholar 

  6. García, V., Sánchez, J.S., Mollineda, R.A.: On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowl. Based Syst. 25(1), 13–21 (2012). https://doi.org/10.1016/j.knosys.2011.06.013

    Article  Google Scholar 

  7. Gu, S., Cheng, R., Jin, Y.: Feature selection for high-dimensional classification using a competitive swarm optimizer. Soft Comput. 22(3), 811–822 (2018). https://doi.org/10.1007/s00500-016-2385-6

    Article  Google Scholar 

  8. Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017). https://doi.org/10.1016/j.eswa.2016.12.035

    Article  Google Scholar 

  9. Han, J., Kamber, M., Pei, J.: Data Mining Concepts and Techniques, 3rd edn. Morgan Kaufmann Publishers, Waltham (2012)

    MATH  Google Scholar 

  10. Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002). https://doi.org/10.3233/IDA-2002-6504

    Article  MATH  Google Scholar 

  11. López, V., Fernández, A., García, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 250, 113–141 (2013). https://doi.org/10.1016/j.ins.2013.07.007

    Article  Google Scholar 

  12. López, V., Triguero, I., Carmona, C.J., García, S., Herrera, F.: Addressing imbalanced classification with instance generation techniques: IPADE-ID. Neurocomputing 126, 15–28 (2014). https://doi.org/10.1016/j.neucom.2013.01.050

    Article  Google Scholar 

  13. Mafarja, M., Mirjalili, S.: Whale optimization approaches for wrapper feature selection. Appl. Soft Comput. 62, 441–453 (2018). https://doi.org/10.1016/j.asoc.2017.11.006

    Article  Google Scholar 

  14. Millán-Giraldo, M., García, V., Sánchez, J.S.: Prototype selection in imbalanced data for dissimilarity representation - a preliminary study. In: Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods, ICPRAM, vol. 1, pp. 242–247 (2012). https://doi.org/10.5220/0003795502420247

  15. Moayedikia, A., Ong, K., Boo, Y.L., Yeoh, W.G.S., Jensen, R.: Feature selection for high dimensional imbalanced class data using harmony search. Eng. Appl. Artif. Intell. 57, 38–49 (2017). https://doi.org/10.1016/j.engappai.2016.10.008

    Article  Google Scholar 

  16. Napierala, K., Stefanowski, J.: BRACID: a comprehensive approach to learning rules from imbalanced data. J. Intell. Inf. Syst. 39(2), 335–373 (2012). https://doi.org/10.1007/s10844-011-0193-0

    Article  Google Scholar 

  17. Oliveira, D.V.R., Magalhaes, G.R., Cavalcanti, G.D.C., Ren, T.I.: Improved self-generating prototypes algorithm for imbalanced datasets. In: 2012 IEEE 24th International Conference on Tools with Artificial Intelligence, pp. 904–909. IEEE Computer Society (2012). https://doi.org/10.1109/ICTAI.2012.126

  18. Oliveira, D.V.R., Cavalcanti, G.D.C., Ren, T.I., Silva, R.M.A.: Evolutionary adaptive self-generating prototypes for imbalanced datasets. In: 2015 International Joint Conference on Neural Networks, IJCNN 2015, Killarney, Ireland, 12–17 July 2015, pp. 1–8. IEEE (2015). https://doi.org/10.1109/IJCNN.2015.7280702

  19. Prati, R.C., Batista, G.E.A.P.A., Monard, M.C.: Class imbalances versus class overlapping: an analysis of a learning system behavior. In: Monroy, R., Arroyo-Figueroa, G., Sucar, L.E., Sossa, H. (eds.) MICAI 2004. LNCS (LNAI), vol. 2972, pp. 312–321. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24694-7_32

    Chapter  Google Scholar 

  20. Rashedi, E., Nezamabadi-pour, H., Saryazdi, S.: GSA: a gravitational search algorithm. Inf. Sci. 179, 2232–2248 (2009). https://doi.org/10.1016/j.ins.2009.03.004

    Article  MATH  Google Scholar 

  21. Silva, E.J.R., Zanchettin, C.: On the existence of a threshold in class imbalance problems. In: IEEE International Conference on Systems, Man, and Cybernetics, Hong Kong, pp. 2714–2719 (2015). https://doi.org/10.1109/SMC.2015.474

  22. Silva, E.J.R., Zanchettin, C.: A voronoi diagram based classifier for multiclass imbalanced data sets. In: 2016 5th Brazilian Conference on Intelligent Systems (BRACIS), pp. 109–114 (2016). https://doi.org/10.1109/BRACIS.2016.030

  23. Sun, Y., Kamel, M.S., Wang, Y.: Boosting for learning multiple classes with imbalanced class distribution. In: ICDM, pp. 592–602. IEEE Computer Society (2006). https://doi.org/10.1109/ICDM.2006.29

  24. Triguero, I., Derrac, J., García, S., Herrera, F.: A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Trans. Syst. Man Cybern. Part C 42(1), 86–100 (2012). https://doi.org/10.1109/TSMCC.2010.2103939

    Article  Google Scholar 

  25. Verbiest, N., Ramentol, E., Cornelis, C., Herrera, F.: Improving SMOTE with fuzzy rough prototype selection to detect noise in imbalanced classification data. In: Pavón, J., Duque-Méndez, N.D., Fuentes-Fernández, R. (eds.) IBERAMIA 2012. LNCS (LNAI), vol. 7637, pp. 169–178. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34654-5_18

    Chapter  Google Scholar 

  26. Wang, S., Yao, X.: Multiclass imbalance problems: analysis and potential solutions. IEEE Trans. Syst. Man Cybern. Part B 42(4), 1119–1130 (2012). https://doi.org/10.1109/TSMCB.2012.2187280

    Article  Google Scholar 

  27. Yijing, L., Haixiang, G., Xiao, L., Yanan, L., Jinling, L.: Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data. Knowl. Based Syst. 94, 88–104 (2016). https://doi.org/10.1016/j.knosys.2015.11.013

    Article  Google Scholar 

Download references

Acknowledgment

The authors would like to thank CNPq and FACEPE (Brazilian research agencies) for financial support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Evandro J. R. Silva .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Silva, E.J.R., Zanchettin, C. (2019). Dynamic Centroid Insertion and Adjustment for Data Sets with Multiple Imbalanced Classes. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Deep Learning. ICANN 2019. Lecture Notes in Computer Science(), vol 11728. Springer, Cham. https://doi.org/10.1007/978-3-030-30484-3_60

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30484-3_60

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30483-6

  • Online ISBN: 978-3-030-30484-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics