CBR-PSO: cost-based rough particle swarm optimization approach for high-dimensional imbalanced problems

Aydogan, Emel Kızılkaya; Ozmen, Mihrimah; Delice, Yılmaz

doi:10.1007/s00521-018-3469-2

CBR-PSO: cost-based rough particle swarm optimization approach for high-dimensional imbalanced problems

Original Article
Published: 02 April 2018

Volume 31, pages 6345–6363, (2019)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Emel Kızılkaya Aydogan¹,
Mihrimah Ozmen¹ &
Yılmaz Delice²

800 Accesses
18 Citations
Explore all metrics

Abstract

Datasets, which have a considerably larger number of attributes compared to samples, face a serious classification challenge. This issue becomes even harder when such high-dimensional datasets are also imbalanced. Recently, such datasets have attracted the interest of both industry and academia and thereby have become a very attractive research area. In this paper, a new cost-sensitive classification method, the CBR-PSO, is presented for such high-dimensional datasets with different imbalance ratios and number of classes. The CBR-PSO is based on particle swarm optimization and rough set theory. The robustness of the algorithm is based on the simultaneously applying attribute reduction and classification; in addition, these two stages are also sensitive to misclassification cost. Algorithm efficiency is examined in publicly available datasets and compared to well-known attribute reduction and cost-sensitive classification algorithms. The statistical analysis and experiments showed that the CBR-PSO can be better than or comparable to the other algorithms, in terms of MAUC values.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Yang Q, Wu X (2006) 10 challenging problems in data mining research. Int J Inf Technol Decis Mak 5(04):597–604
Article Google Scholar
López V, Fernández A, García S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141
Article Google Scholar
Hu Q, Zhao H, Xie Z, Yu D (2007) Consistency based attribute reduction. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, Berlin, pp 96–107
Min F, He H, Qian Y, Zhu W (2011) Test-cost-sensitive attribute reduction. Inf Sci 181(22):4928–4942
Article Google Scholar
Min F, Zhu W (2012) Attribute reduction of data with error ranges and test costs. Inf Sci 211:48–67
Article MathSciNet MATH Google Scholar
Wang C, Wu C, Chen D (2008) A systematic study on attribute reduction with rough sets based on general binary relations. Inf Sci 178(9):2237–2261
Article MathSciNet MATH Google Scholar
Yao Y, Zhao Y (2008) Attribute reduction in decision-theoretic rough set models. Inf Sci 178(17):3356–3373
Article MathSciNet MATH Google Scholar
Zhang WX, Mi JS, Wu WZ (2003) Approaches to knowledge reductions in inconsistent systems. Int J Intell Syst 18(9):989–1000
Article MATH Google Scholar
Zhao Y, Wong SM, Yao Y (2011) A note on attribute reduction in the decision-theoretic rough set model. In: Transactions on rough sets XIII. Springer, Berlin, pp 260–275
Zhou X, Li H (2009) A multi-view decision model based on decision-theoretic rough set. In: International conference on rough sets and knowledge technology. Springer, Berlin, pp. 650–657
Jia X, Liao W, Tang Z, Shang L (2013) Minimum cost attribute reduction in decision-theoretic rough set models. Inf Sci 219:151–167
Article MathSciNet MATH Google Scholar
Aydogan EK, Karaoglan I, Pardalos PM (2012) hGA: hybrid genetic algorithm in fuzzy rule-based classification systems for high-dimensional problems. Appl Soft Comput 12(2):800–806
Article Google Scholar
Berlanga FJ, Rivera AJ, del Jesús MJ, Herrera F (2010) GP-COACH: genetic programming-based learning of compact and accurate fuzzy rule-based classification systems for high-dimensional problems. Inf Sci 180(8):1183–1200
Article Google Scholar
Zhou ZH, Liu XY (2010) On multi-class cost-sensitive learning. Comput Intell 26(3):232–257
Article MathSciNet Google Scholar
Hastie T, Tibshirani R (1998) Classification by pairwise coupling. Ann Stat 26(2):451–471
Article MathSciNet MATH Google Scholar
Rifkin R, Klautau A (2004) In defense of one-vs-all classification. J Mach Learn Res 5:101–141
MathSciNet MATH Google Scholar
FernáNdez A, LóPez V, Galar M, Del Jesus MJ, Herrera F (2013) Analysing the classification of imbalanced data-sets with multiple classes: binarization techniques and ad-hoc approaches. Knowl Based Syst 42:97–110
Article Google Scholar
Thanathamathee P, Lursinsap C (2013) Handling imbalanced datasets with synthetic boundary data generation using bootstrap re-sampling and AdaBoost techniques. Pattern Recognit Lett 34(12):1339–1347
Article Google Scholar
McCarthy K, Zabar B, Weiss G (2005) Does cost-sensitive learning beat sampling for classifying rare classes?. In: Proceedings of the 1st international workshop on Utility-based data mining. ACM, pp 69–77
Liu W, Chawla S, Cieslak DA, Chawla NV (2010) A robust decision tree algorithm for imbalanced datasets. In: SDM, vol 10, pp 766–777
Liu J, Hu Q, Yu D (2008) A comparative study on rough set based class-imbalance learning. Knowl Based Syst 21(8):753–763
Article Google Scholar
Sinha S, Singh TN, Singh VK, Verma AK (2010) Epoch determination for neural network by self-organized map (SOM). Comput Geosci 14(1):199–206
Article MATH Google Scholar
Hong X, Chen S, Harris CJ (2007) A kernel-based two-class classifier for imbalanced datasets. IEEE Trans Neural Netw 18(1):28–41
Article Google Scholar
Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2012) A review on ensembles for the class-imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern Part C (Appl Rev) 42(4):463–484
Article Google Scholar
Ertekin S, Huang J, Giles CL (2007) Active learning for class-imbalance problem. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 823–824
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Article MATH Google Scholar
Galar M, Fernández A, Barrenechea E, Bustince H, Herrera F (2011) An overview of ensemble methods for binary classifiers in multi-class problems: experimental study on one-vs-one and one-vs-all schemes. Pattern Recognit 44(8):1761–1776
Article Google Scholar
Elkan C, Noto K (2008) Learning classifiers from only positive and unlabeled data. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 213–220
Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) SMOTEBoost: improving prediction of the minority class in boosting. In: European conference on principles of data mining and knowledge discovery. Springer, Berlin, pp 107–119
Rout N, Mishra D, Mallick MK (2018) Handling imbalanced data: a survey. In: International proceedings on advances in soft computing, intelligent systems and applications. Springer, Singapore, pp 431–443
Domingos P (1999) Metacost: a general method for making classifiers cost-sensitive. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 155–164
Singh TN, Verma AK (2012) Comparative analysis of intelligent algorithms to correlate strength and petrographic properties of some schistose rocks. Eng Comput 28(1):1–12
Article Google Scholar
Fan W, Stolfo SJ, Zhang J, Chan PK (1999) AdaCost: misclassification cost-sensitive boosting. In: Icml, pp 97–105
Joshi MV, Kumar V, Agarwal RC (2001) Evaluating boosting algorithms to classify rare classes: comparison and improvements. In: Proceedings IEEE international conference on data mining, 2001. ICDM 2001. IEEE, pp 257–264
Hu S, Liang Y, Ma L, He Y (2009) MSMOTE: improving classification performance when training data is imbalanced. In Proceedings of the 2009 Second international workshop on computer science and engineering, vol 2, pp 13–17
Han H, Wang WY, Mao BH (2005) Borderline-SMOTE: a new over-sampling method in imbalanced datasets learning. In: International conference on intelligent computing. Springer, Berlin, pp 878–887
He H, Bai Y, Garcia EA, Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence). IEEE, pp 1322–1328
Kubat M, Matwin S (1997). Addressing the curse of imbalanced training sets: one-sided selection. In: ICML, vol 97, pp 179–186
Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM Sigkdd Explor Newsl 6(1):20–29
Article Google Scholar
Stefanowski J, Wilk S (2008) Selective pre-processing of imbalanced data for improving classification performance. In: International conference on data warehousing and knowledge discovery. Springer, Berlin, pp 283–292
Madhubabu N, Singh PK, Kainthola A, Mahanta B, Tripathy A, Singh TN (2016) Prediction of compressive strength and elastic modulus of carbonate rocks. Measurement 88:202–213
Article Google Scholar
Elkan C (2001b) The foundations of cost-sensitive learning. In: International joint conference on artificial intelligence, vol 17, no. 1. Lawrence Erlbaum Associates Ltd, pp 973–978
Ting KM (2002) An instance-weighting method to induce cost-sensitive trees. IEEE Trans Knowl Data Eng 14(3):659–665
Article MathSciNet Google Scholar
Drummond C, Holte RC (2003) C4. 5, class-imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on learning from imbalanced datasets II, vol 11
Maloof MA (2003) Learning when datasets are imbalanced and when costs are unequal and unknown. In: ICML-2003 workshop on learning from imbalanced datasets II, vol 2, pp 2–1
Zhou ZH, Liu XY (2006) Training cost-sensitive neural networks with methods addressing the class-imbalance problem. IEEE Trans Knowl Data Eng 18(1):63–77
Article MathSciNet Google Scholar
Allwein EL, Schapire RE, Singer Y (2000) Reducing multiclass to binary: a unifying approach for margin classifiers. J Mach Learn Res 1(Dec):113–141
MathSciNet MATH Google Scholar
Ramakrishnan D, Singh TN, Purwar N, Barde KS, Gulati A, Gupta S (2008) Artificial neural network and liquefaction susceptibility assessment: a case study using the 2001 Bhuj earthquake data, Gujarat, India. Comput Geosci 12(4):491–501
Article Google Scholar
Haixiang G, Xiuwu L, Kejun Z, Chang D, Yanhui G (2011) Optimizing reservoir features in oil exploration management based on fusion of soft computing. Appl Soft Comput 11(1):1144–1155
Article Google Scholar
Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27(8):861–874
Article MathSciNet Google Scholar
Hand DJ, Till RJ (2001) A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach Learn 45(2):171–186
Article MATH Google Scholar
Anil Kumar D, Ravi V (2008) Predicting credit card customer churn in banks using data mining. Int J Data Anal Techn Strat 1(1):4–28
Article Google Scholar
Ramaswamy S, Ross KN, Lander ES, Golub TR (2003) A molecular signature of metastasis in primary solid tumors. Nat Genet 33(1):49–54
Article Google Scholar
Horng JT, Wu LC, Liu BJ, Kuo JL, Kuo WH, Zhang JJ (2009) An expert system to classify microarray gene expression data using gene selection by decision tree. Expert Syst Appl 36(5):9072–9081
Article Google Scholar
Zheng Z, Wu X, Srihari R (2004) Feature selection for text categorization on imbalanced data. ACM Sigkdd Explor Newsl 6(1):80–89
Article Google Scholar
Shang W, Huang H, Zhu H, Lin Y, Qu Y, Wang Z (2007) A novel feature selection algorithm for text categorization. Expert Syst Appl 33(1):1–5
Article Google Scholar
Tan S (2008) An improved centroid classifier for text categorization. Expert Syst Appl 35(1):279–285
Article Google Scholar
Park BJ, Oh SK, Pedrycz W (2013) The design of polynomial function-based neural network predictors for detection of software defects. Inf Sci 229:40–57
Article MathSciNet MATH Google Scholar
Iizuka N, Oka M, Yamada-Okabe H, Nishida M, Maeda Y, Mori N, Takao T, Tamesa T, Tangoku A, Tabuchi H, Hamada K (2003) Oligonucleotide microarray for prediction of early intrahepatic recurrence of hepatocellular carcinoma after curative resection. The lancet 361(9361):923–929
Article Google Scholar
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3(Mar):1157–1182
MATH Google Scholar
Hamid A, Dwivedi US, Singh TN, Gopi Kishore M, Mahmood M, Singh H, Tandon V, Singh PB (2003) Artificial neural networks in predicting optimum renal stone fragmentation by extracorporeal shock wave lithotripsy: a preliminary study. BJU Int 91(9):821–824
Article Google Scholar
Pawlak, Z. (1991). Rough sets: theoretical aspects of reasoning about data, system theory. Knowl Eng Probl Solving, vol 9
Lurie JD, Sox HC (1999) Principles of medical decision making. Spine 24(5):493–498
Article Google Scholar
Sherif M, Hovland CI (1961) Social judgment: assimilation and contrast effects in communication and attitude change. YALE University Press, New Haven, CT
Google Scholar
Ali K, Manganaris S, Srikant R (1997) Partial classification using association rules. In: KDD, vol 97, pp p115–118)
Pawlak Z, Wong SKM, Ziarko W (1988) Rough sets: probabilistic versus deterministic approach. Int J Man Mach Stud 29(1):81–95
Article MATH Google Scholar
Yao YY (1990) Wong. SKM, Lingras. P.: A decision-theoretic rough set model. In: Proceeding of ISMIS, pp 17–25
Duda RO, Hart PE (1973) Pattern classification and scene analysis, vol 3. Wiley, New York
MATH Google Scholar
Yao Y (2011) The superiority of three-way decisions in probabilistic rough set models. Inf Sci 181(6):1080–1096
Article MathSciNet MATH Google Scholar
Yang Y, Webb GI (2009) Discretization for naive-Bayes learning: managing discretization bias and variance. Mach Learn 74(1):39–74
Article Google Scholar
Sousa T, Silva A, Neves A (2004) Particle swarm based data mining algorithms for classification tasks. Parallel Comput 30(5):767–783
Article Google Scholar
De Falco I, Della Cioppa A, Tarantino E (2007) Facing classification problems with particle swarm optimization. Appl Soft Comput 7(3):652–658
Article Google Scholar
Kennedy J, Eberhart RC (1997). A discrete binary version of the particle swarm algorithm. In: Systems, man, and cybernetics, 1997. Computational cybernetics and simulation., 1997 IEEE international conference on, vol 5. IEEE, pp 4104–4108
Taşgetiren MF, Liang YC (2003) A binary particle swarm optimization algorithm for lot sizing problem. J Econ Soc Res 5(2):1–20
Google Scholar
Arizona State University. Feature selection datasets. http://featureselection.asu.edu/datasets.php
Statnikov A, Aliferis C, Tsardinos I. Gems: gene expression model selector. http://www.gems-system.org/
Nutt CL, Mani DR, Betensky RA, Tamayo P, Cairncross JG, Ladd C, Pohl U, Hartmann C, McLaughlin ME, Batchelor TT, Black PM (2003) Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Can Res 63(7):1602–1607
Google Scholar
Quinlan JR (1993) C4.5: Programs for machine learning. Morgan Kauffman, Burlington
Google Scholar
Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. John Wiley, New York
MATH Google Scholar
McLachlan GJ (2004) Discriminant analysis and statistical pattern recognition. Wiley, Hoboken
MATH Google Scholar
Roffo G (2016) Feature selection library (MATLAB Toolbox). arXiv preprint arXiv:1607.01327
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor 11(1):10–18
Article Google Scholar
Seiffert C, Khoshgoftaar TM, Van Hulse J, Napolitano A (2010) RUSBoost: a hybrid approach to alleviating class-imbalance. IEEE Trans Syst Man Cybern A 40(1):185–197
Article Google Scholar
http://www.keel.es

Download references

Acknowledgements

The authors would like to thank the Ministry of Science, Industry and Technology (Republic of Turkey; Project No: 0777.STZ.2014) for their contributions to the study.

Author information

Authors and Affiliations

Department of Industrial Engineering, Erciyes University, Kayseri, Turkey
Emel Kızılkaya Aydogan & Mihrimah Ozmen
Department of Management and Organization, Develi Vocational College, Erciyes University, Develi, Kayseri, Turkey
Yılmaz Delice

Authors

Emel Kızılkaya Aydogan
View author publications
Search author on:PubMed Google Scholar
Mihrimah Ozmen
View author publications
Search author on:PubMed Google Scholar
Yılmaz Delice
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Mihrimah Ozmen.

Appendix

See Tables 15 and 16.

Table 15 Comparison of attribute reduction component of our algorithm according to MAUC

Full size table

Table 16 Comparison of the CBR-PSO with the cost-sensitive classification algorithms according to MAUC results

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aydogan, E.K., Ozmen, M. & Delice, Y. CBR-PSO: cost-based rough particle swarm optimization approach for high-dimensional imbalanced problems. Neural Comput & Applic 31, 6345–6363 (2019). https://doi.org/10.1007/s00521-018-3469-2

Download citation

Received: 19 October 2017
Accepted: 27 March 2018
Published: 02 April 2018
Issue Date: October 2019
DOI: https://doi.org/10.1007/s00521-018-3469-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CBR-PSO: cost-based rough particle swarm optimization approach for high-dimensional imbalanced problems

Abstract

Access this article

Subscribe and save

Buy Now

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now