Skip to main content
Log in

Adaptive over-sampling method for classification with application to imbalanced datasets in aluminum electrolysis

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The class imbalance problem often appears in practical applications, where one class has numerous instances and the other has only a few instances. Synthetic Minority Over-sampling TEchnique (SMOTE) is the most popular and commonly used sampling method to solve this problem. It has two important parameters: over-sampling rate N and number of nearest neighbors k. However, the two parameters that are arbitrarily chosen by users are not optimal in practical applications. In addition, the imbalance ratios of these datasets are absolutely different, which makes parameter selection in SMOTE more difficult. To overcome the problem, an adaptive over-sampling method is proposed in this study based on SMOTE. It transforms the parameter selection problem in SMOTE to a multi-objective optimization problem. Then, a new selection strategy named absolute dominance-based selection is proposed to obtain the current optimal solution. Finally, the state transition algorithm is used to search the best parameter values of SMOTE to achieve the optimal objectives. Four imbalanced benchmark datasets and four class-imbalanced aluminum electrolysis datasets are used to verify the validity of the proposed method. In comparison with other methods, the proposed method has the advantage of good classification performance. Numerical results also show that the proposed method can successfully solve the class imbalance problem in aluminum electrolysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009) Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Theeramunkon T, Kijsirikul B, Cercone N, Ho TB (eds) Advances in knowledge discovery and data mining. Springer, Berlin, Heidelberg, pp 475–482. https://doi.org/10.1007/978-3-642-01307-2_43

    Google Scholar 

  2. Cao H, Li X-L, Woon DY-K, Ng S-K (2013) Integrated oversampling for imbalanced time series classification. IEEE Trans Knowl Data Eng 25(12):2809–2822

    Google Scholar 

  3. Chawla NV (2003) C4. 5 and imbalanced data sets: investigating the effect of sampling method, probabilistic estimate, and decision tree structure. In: Proceedings of the ICML, vol 3

  4. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    MATH  Google Scholar 

  5. Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) Smoteboost: improving prediction of the minority class in boosting. In: European conference on principles of data mining and knowledge discovery, pp 107–119. Springer

  6. Chen Z, Li Y, Chen X, Yang C, Gui W (2017) Semantic network based on intuitionistic fuzzy directed hyper-graphs and application to aluminum electrolysis cell condition identification. IEEE Access 5:20145–20156

    Google Scholar 

  7. Dua D, Graff C (2019) UCI machine learning repository. University of California, Irvine, CA. http://archive.ics.uci.edu/ml

  8. Fan W, Stolfo SJ, Zhang J, Chan PK (1999) Adacost: misclassification cost-sensitive boosting. In: Icml, vol 99, pp 97–105

  9. Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2012) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern Part C (Appl Rev) 42(4):463–484

    Google Scholar 

  10. Guo H, Viktor HL (2004) Learning from imbalanced data sets with boosting and data generation: the databoost-im approach. ACM Sigkdd Explor Newsl 6(1):30–39

    Google Scholar 

  11. Han H, Wang W-Y, Mao B-H (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. Springer, Berlin, pp 878–887

    Google Scholar 

  12. Han J, Yang C, Zhou X, Gui W (2017) Dynamic multi-objective optimization arising in iron precipitation of zinc hydrometallurgy. Hydrometallurgy 173:134–148

    Google Scholar 

  13. Han J, Yang C, Zhou X, Gui W (2017) A new multi-threshold image segmentation approach using state transition algorithm. Appl Math Model 44:588–601

    MathSciNet  MATH  Google Scholar 

  14. He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284

    Google Scholar 

  15. Huang M, Zhou X, Huang T, Yang C, Gui W (2017) Dynamic optimization based on state transition algorithm for copper removal process. Neural Comput Appl. https://doi.org/10.1007/s00521-017-3232-0

    Google Scholar 

  16. Huang Z, Yang C, Zhou X, Huang T (2018) A hybrid feature selection method based on binary state transition algorithm and ReliefF. IEEE J Biomed Health Inform. https://doi.org/10.1109/JBHI.2018.2872811

    Google Scholar 

  17. Huang Z, Yang C, Zhou X, Gui W (2018) A novel cognitively inspired state transition algorithm for solving the linear bi-level programming problem. Cogn Comput 10(5):816–826

    Google Scholar 

  18. Li J, Fong S, Mohammed S, Fiaidhi J (2016) Improving the classification performance of biological imbalanced datasets by swarm optimization algorithms. J Supercomput 72(10):3708–3728

    Google Scholar 

  19. Li J, Fong S, Sung Y, Cho K, Wong R, Wong KKL (2016) Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification. BioData Min 9(1):37

    Google Scholar 

  20. Lin C, Hsieh T, Liu Y, Lin Y, Fang C, Wang Y, Yen G, Pal NR, Chuang C (2018) Minority oversampling in kernel adaptive subspaces for class imbalanced datasets. IEEE Trans Knowl Data Eng 30(5):950–962

    Google Scholar 

  21. LóPez V, FernáNdez A, Jesus MAJD, Herrera F (2013) A hierarchical genetic fuzzy system based on genetic programming for addressing classification with highly imbalanced and borderline data-sets. Knowl Based Syst 38:85–104

    Google Scholar 

  22. Milner S, Davis C, Zhang H, Llorca J (2012) Nature-inspired self-organization, control, and optimization in heterogeneous wireless networks. IEEE Trans Mob Comput 11(7):1207–1222

    Google Scholar 

  23. Nekooeimehr I, Lai-Yuen SK (2016) Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets. Expert Syst Appl 46:405–416

    Google Scholar 

  24. Pun J, Lawryshyn Y (2012) Improving credit card fraud detection using a meta-classification strategy. Int J Comput Appl 56(10):41–46

    Google Scholar 

  25. Ramentol E, Vluymans S, Verbiest N, Caballero Y, Bello R, Cornelis C, Herrera F (2015) IFROWANN: imbalanced fuzzy-rough ordered weighted average nearest neighbor classification. IEEE Trans Fuzzy Syst 23(5):1622–1637

    Google Scholar 

  26. Ren F, Cao P, Li W, Zhao D, Zaiane O (2017) Ensemble based adaptive over-sampling method for imbalanced data learning in computer aided detection of microaneurysm. Comput Med Imaging Gr 55:54–67 (Special Issue on Ophthalmic Medical Image Analysis)

    Google Scholar 

  27. Sun Y, Kamel MS, Wong AKC, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognit 40(12):3358–3378

    MATH  Google Scholar 

  28. Tang Y, Zhang Y-Q, Chawla NV, Krasser S (2009) Svms modeling for highly imbalanced classification. IEEE Trans Syst Man Cybern Part B (Cybern) 39(1):281–288

    Google Scholar 

  29. Ting KM (2000) A comparative study of cost-sensitive boosting algorithms. In: Proceedings of the 17th international conference on machine learning. Citeseer

  30. Yue W, Chen X, Gui W, Xie Y, Zhang H (2017) A knowledge reasoning fuzzy-Bayesian network for root cause analysis of abnormal aluminum electrolysis cell condition. Front Chem Sci Eng 11(3):414–428

    Google Scholar 

  31. Zhang F, Yang C, Zhou X, Gui W (2018) Fractional-order PID controller tuning using continuous state transition algorithm. Neural Comput Appl 29(10):795–804

    Google Scholar 

  32. Zhang H, Cao X, Ho JKL, Chow TWS (2017) Object-level video advertising: an optimization framework. IEEE Trans Ind Inform 13(2):520–531

    Google Scholar 

  33. Zhou X, Yang C, Gui W (2018) A statistical study on parameter selection of operators in continuous state transition algorithm. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2018.2850350

    Google Scholar 

  34. Zhou X, Zhou J, Yang C, Gui W (2018) Set-point tracking and multi-objective optimization-based PID control for the goethite process. IEEE Access 6:36683–36698

    Google Scholar 

  35. Zhou X, Gao DY, Simpson AR (2016) Optimal design of water distribution networks by a discrete state transition algorithm. Eng Optim 48(4):603–628

    Google Scholar 

  36. Zhou X, Gao DY, Yang C, Gui W (2016) Discrete state transition algorithm for unconstrained integer optimization problems. Neurocomputing 173:864–874

    Google Scholar 

  37. Zhou X, Shi P, Lim C-C, Yang C, Gui W (2018) A dynamic state transition algorithm with application to sensor network localization. Neurocomputing 273:237–250

    Google Scholar 

  38. Zhou X, Yang C, Gui W (2012) State transition algorithm. J Ind Manag Optim 8(4):1039–1056

    MathSciNet  MATH  Google Scholar 

  39. Zhou X, Yang C, Gui W (2014) Nonlinear system identification and control using state transition algorithm. Appl Math Comput 226:169–179

    MathSciNet  MATH  Google Scholar 

  40. Zieba M, Tomczak JM, Lubicz M, Światek J (2014) Boosted svm for extracting rules from imbalanced data in application to prediction of the post-operative life expectancy in the lung cancer patients. Appl Soft Comput 14:99–108

    Google Scholar 

  41. Zieba M, Tomczak SK, Tomczak JM (2016) Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction. Expert Syst Appl 58:93–101

    Google Scholar 

Download references

Acknowledgements

The authors thank the National Natural Science Foundation of China (Grant Nos. 61773405, 61751312, 61533020) and the 111 Project (Grant No. B17048) for their funding support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaofang Chen.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, Z., Yang, C., Chen, X. et al. Adaptive over-sampling method for classification with application to imbalanced datasets in aluminum electrolysis. Neural Comput & Applic 32, 7183–7199 (2020). https://doi.org/10.1007/s00521-019-04208-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-019-04208-7

Keywords

Navigation