An improved dynamic sampling back-propagation algorithm based on mean square error to face the multi-class imbalance problem

Alejo, R.; Monroy-de-Jesús, J.; Ambriz-Polo, J. C.; Pacheco-Sánchez, J. H.

doi:10.1007/s00521-017-2938-3

An improved dynamic sampling back-propagation algorithm based on mean square error to face the multi-class imbalance problem

New Trends in data pre-processing methods for signal and image classification
Published: 16 March 2017

Volume 28, pages 2843–2857, (2017)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

R. Alejo ORCID: orcid.org/0000-0002-7580-3305¹,
J. Monroy-de-Jesús²,
J. C. Ambriz-Polo¹ &
…
J. H. Pacheco-Sánchez³

478 Accesses
7 Citations
Explore all metrics

Abstract

In this paper, we present an improved dynamic sampling approach (I-SDSA) for facing the multi-class imbalance problem. I-SDSA is a modification of the back-propagation algorithm, which is focused to make a better use of the training samples for improving the classification performance of the multilayer perceptron (MLP). I-SDSA uses the mean square error and a Gaussian function to identify the best samples to train the neural network. Results shown in this article stand out that I-SDSA makes better exploitation of the training dataset and improves the MLP classification performance. In others words, I-SDSA is a successful technique for dealing with the multi-class imbalance problem. In addition, results presented in this work indicate that the proposed method is very competitive in terms of classification performance with respect to classical over-sampling methods (also, combined with well-known features selection methods) and other dynamic sampling approaches, even in training time and size it is better than the over-sampling methods .

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Efficient Over-sampling Approach Based on Mean Square Error Back-propagation for Dealing with the Multi-class Imbalance Problem

Article 17 August 2014

A Classification Method of Imbalanced Data Base on PSO Algorithm

Imbalanced Data Classification Algorithm Based on Integrated Sampling and Ensemble Learning

Notes

This MLP only has two neural network outputs (\(z_{0}^{q}\) and \(z_{1}^{q}\)), because it has been designed to work with datasets of two classes.

References

Abdi L, Hashemi S (2016) To combat multi-class imbalanced problems by means of over-sampling techniques. IEEE Trans Knowl Data Eng 28(1):1041–4347. doi:10.1109/TKDE.2015.2458858
Article Google Scholar
Al-Shahib A, Breitling R, Gilbert D (2005) Feature selection and the class imbalance problem in predicting protein function from sequence. Appl Bioinform 4(3):195–203. doi:10.2165/00822942-200504030-00004
Article Google Scholar
Alcalá-Fdez J, Fernandez A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult Valued Logic Soft Comput 17(2–3):255–287
Google Scholar
Alejo R, García V, Pacheco-Sánchez JH (2015) An efficient over-sampling approach based on mean square error back-propagation for dealing with the multi-class imbalance problem. Neural Process Lett 42(3):603–617. doi:10.1007/s11063-014-9376-3
Article Google Scholar
Alejo R, Monroy-de Jesús J, Pacheco-Sánchez JH, López-González E, Antonio-Velázquez JA (2016) A selective dynamic sampling back-propagation approach for handling the two-class imbalance problem. Appl Sci 6(7):200. doi:10.3390/app6070200. http://www.mdpi.com/2076-3417/6/7/200
Alejo R, Valdovinos R, García V, Pacheco-Sanchez JH (2013) A hybrid method to face class overlap and class imbalance on neural networks and multi-class scenarios. Pattern Recognit Lett 34(4):380–388
Article Google Scholar
Anand R, Mehrotra K, Mohan C, Ranka S (1993) An improved algorithm for neural network classification of imbalanced training sets. IEEE Trans Neural Netw 4:962–969
Article Google Scholar
Asuncion A, Newman D (2007) UCI machine learning repository. http://www.ics.uci.edu/~mlearn/
Batista G, Prati R, Monard M (2004) A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor Newsl 6:20–29
Article Google Scholar
Baumgardner M, Biehl L, Landgrebe D (1992) 220 band aviris hyperspectral image data set: June 12, Indian pine test site 3 (2016). http://engineering.purdue.edu/biehl/MultiSpec/hyperspectral.html
Bruzzone L, Serpico SB (1997) Classification of imbalanced remote-sensing data by neural networks. Pattern Recognit Lett 18:1323–1328
Article Google Scholar
Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009) Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Pacific-Asia conference on knowledge discovery and data mining(PAKDD09). Lecture notes on computer science, vol 5476. Springer, pp 475–482. doi:10.1007/978-3-642-01307-2_43
Chawla N, Cieslak D, Hall L, Ajay J (2008) Automatically countering imbalance and its empirical relationship to cost. Data Min Knowl Discov 17:225–252
Article MathSciNet Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
MATH Google Scholar
Duda R, Hart P, Stork D (2001) Pattern classification and scene analysis, 2nd edn. Wiley, New York
MATH Google Scholar
Erguzel TT, Tas C, Cebi M (2015) A wrapper-based approach for feature selection and classification of major depressive disorder–bipolar disorders. Comput Biol Med 64:127–137. doi:10.1016/j.compbiomed.2015.06.021
Article Google Scholar
Fawcett T (2006) An introduction to roc analysis. Pattern Recognit Lett 27:861–874
Article Google Scholar
Fernández A, López V, Galar M, del Jesus M, Herrera F (2013) Analysing the classification of imbalanced data-sets with multiple classes: binarization techniques and ad-hoc approaches. Knowl Based Syst 42:97–110
Article Google Scholar
Fernández-Navarro F, Hervás-Martínez C, Gutiérrez PA (2011) A dynamic over-sampling procedure based on sensitivity for multi-class problems. Pattern Recognit 44(8):1821–1833
Article MATH Google Scholar
Fernández-Navarro F, Hervás-Martínez C, García-Alonso C, Torres-Jiménez M (2011) Determination of relative agrarian technical efficiency by a dynamic over-sampling procedure guided by minimum sensitivity. Expert Syst Appl 38(10):12483–12490
Article Google Scholar
Galar M, Fernández A, Tartas EB, Sola HB, Herrera F (2012) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern Part C 42(4):463–484. doi:10.1109/TSMCC.2011.2161285
Article Google Scholar
García S, Herrera F (2008) An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J Mach Learn Res 9:2677–2694
MATH Google Scholar
Han H, Wang WY, Mao BH (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang DS, Zhang XP, Huang GB (eds) Advances in intelligent computing. ICIC 2005. Lecture notes in computer science, vol 3644. Springer, Berlin, pp 878–887
Hart P (1968) The condensed nearest neighbour rule. IEEE Trans Inf Theory 14(5):515–516
Article Google Scholar
Hatami N, Ebrahimpour R, Ghaderi R (2013) Ecoc-based training of neural networks for face recognition. CoRR abs/1312.3990. http://arxiv.org/abs/1312.3990
He H, Bai Y, Garcia E, Li S (2008) Adasyn: adaptive synthetic sampling approach for imbalanced learning. In: IJCNN. pp 1322–1328
He H, Garcia E (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Article Google Scholar
Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6(2):65–70
MathSciNet MATH Google Scholar
Jing H, Wang B, Yang Y, Xu Y (2009) A general framework of feature selection for text categorization. Springer, Berlin, pp 647–662. doi:10.1007/978-3-642-03070-3_49
Google Scholar
Lecun Y, Bottou L, Orr GB, Müller KR (1998) Efficient backprop. In: Orr G, Müller K. (eds) Neural networks—tricks of the trade. Lecture notes in computer science, vol 1524. Springer, pp 5–50
Lin M, Tang K, Yao X (2013) Dynamic sampling approach to training neural networks for multiclass imbalance classification. IEEE Trans Neural Netw Learn Syst 24(4):647–660
Article Google Scholar
Liu H, Setiono R (1996) Feature selection and classification: a probabilistic wrapper approach. In: 9th International conference on industrial and engineering applications of artificial intelligence and expert systems(IEA-AIE96). pp 419–424
López V, Fernández A, García S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141
Article Google Scholar
Luengo J, García S, Herrera F (2009) A study on the use of statistical tests for experimentation with neural networks: analysis of parametric test conditions and non-parametric tests. Expert Syst Appl 36(4):7798–7808
Article Google Scholar
Maldonado S, Weber R, Famili F (2014) Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf Sci 286:228–246. doi:10.1016/j.ins.2014.07.015
Article Google Scholar
Mirza B, Lin Z, Liu N (2015) Ensemble of subset online sequential extreme learning machine for class imbalance and concept drift. Neurocomputing 149:316–329. doi:10.1016/j.neucom.2014.03.075
Article Google Scholar
Nanni L, Fantozzi C, Lazzarini N (2015) Coupling different methods for overcoming the class imbalance problem. Neurocomputing 158(C):48–61
Article Google Scholar
Prati R, Batista G, Monard M (2004) Class imbalances versus class overlapping: an analysis of a learning system behavior. In: MICAI. pp 312–321
Prati RC, Batista GE, Monard MC (2009) Data mining with imbalanced class distributions: concepts and methods. In: Proceedings of the 4th Indian international conference on artificial intelligence, IICAI, Tumkur, Karnataka, India, 16-18 Dec 2009, pp 359–376
Sánchez JS, García V, Mollineda RA (2011) Exploring synergetic effects of dimensionality reduction and resampling tools on hyperspectral imagery data classification. In: Proceedings of the 7th International conference on machine learning and data mining in pattern recognition, MLDM’11. Springer, Berlin, pp 511–523. http://dl.acm.org/citation.cfm?id=2033831.2033875
Shaffer J (1986) Modified sequentially rejective multiple test procedures. J Am Stat Assoc 81(375): 826–831. http://www.jstor.org/stable/2289016
Show-Jane Y, Yue-Shi L (2009) Cluster-based under-sampling approaches for imbalanced data distributions. Expert Syst Appl 36:5718–5727. doi:10.1016/j.eswa.2008.06.108
Article Google Scholar
Sun T, Jiao L, Feng J, Liu F, Zhang X (2015) Imbalanced hyperspectral image classification based on maximum margin. IEEE Geosci Remote Sens Lett 12(3):522–526
Article Google Scholar
Tomek I (1976) Two modifications of CNN. IEEE Trans Syst Man Cybern 7(2):679–772
MathSciNet MATH Google Scholar
Wang J, Jean JSN (1993) Resolving multifont character confusion with neural networks. Pattern Recognit 26(1):175–187
Article Google Scholar
Wang S, Yao X (2012) Multiclass imbalance problems: analysis and potential solutions. IEEE Trans Syst Man Cybern Part B 42(4):1119–1130
Article Google Scholar
Xu-Ying L, Qian-Qian L, Zhi-Hua Z (2013) Learning imbalanced multi-class data with optimal dichotomy weights. In: 2013 IEEE 13th international conference on data mining, Dallas, TX, USA, 7–10 Dec 2013, pp 478–487. doi:10.1109/ICDM.2013.51
Yang J, Honavar V (1998) Feature subset selection using a genetic algorithm. IEEE Intell Syst 13(2):44–49
Article Google Scholar
Zheng Z, Wu X, Srihari R (2004) Feature selection for text categorization on imbalanced data. SIGKDD Explor Newsl 6(1):80–89. doi:10.1145/1007730.1007741
Article Google Scholar
Zhou ZH, Liu XY (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl Data Eng 18:63–77
Article Google Scholar
Zhou ZH, Liu XY (2010) On multi-class cost-sensitive learning. Comput Intell 26(3):232–257
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Tecnológico de Estudios Superiores de Jocotitlán, Carretera Toluca-Atlacomulco KM. 44.8, Ejido de San Juan y San Agustín, 50700, Jocotitlán, Mexico
R. Alejo & J. C. Ambriz-Polo
Universidad Autónoma del Estado de México, Toluca-Atlacomulco KM. 60, 50000, Atlacomulco, Mexico
J. Monroy-de-Jesús
Instituto Tecnológico de Toluca, Av. Tecnológico s/n, Col. Agrícola Bellavista, 52149, Metepec, Mexico
J. H. Pacheco-Sánchez

Authors

R. Alejo
View author publications
You can also search for this author in PubMed Google Scholar
J. Monroy-de-Jesús
View author publications
You can also search for this author in PubMed Google Scholar
J. C. Ambriz-Polo
View author publications
You can also search for this author in PubMed Google Scholar
J. H. Pacheco-Sánchez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to R. Alejo.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alejo, R., Monroy-de-Jesús, J., Ambriz-Polo, J.C. et al. An improved dynamic sampling back-propagation algorithm based on mean square error to face the multi-class imbalance problem. Neural Comput & Applic 28, 2843–2857 (2017). https://doi.org/10.1007/s00521-017-2938-3

Download citation

Received: 01 November 2016
Accepted: 07 March 2017
Published: 16 March 2017
Issue Date: October 2017
DOI: https://doi.org/10.1007/s00521-017-2938-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An improved dynamic sampling back-propagation algorithm based on mean square error to face the multi-class imbalance problem

Abstract

Access this article

Similar content being viewed by others

An Efficient Over-sampling Approach Based on Mean Square Error Back-propagation for Dealing with the Multi-class Imbalance Problem

A Classification Method of Imbalanced Data Base on PSO Algorithm

Imbalanced Data Classification Algorithm Based on Integrated Sampling and Ensemble Learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An improved dynamic sampling back-propagation algorithm based on mean square error to face the multi-class imbalance problem

Abstract

Access this article

Similar content being viewed by others

An Efficient Over-sampling Approach Based on Mean Square Error Back-propagation for Dealing with the Multi-class Imbalance Problem

A Classification Method of Imbalanced Data Base on PSO Algorithm

Imbalanced Data Classification Algorithm Based on Integrated Sampling and Ensemble Learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation