Abstract
The asymmetry of different misclassification costs is a common problem in many realistic applications. As one of the most familiar preprocessing methods, cost-sensitive resampling has drawn great attention due to its easy-implemented and universal properties. However, current methods mainly concentrate on changing the amount of the training set, which will alter the original distribution shapes and lead to the classifiers be over-fitted or unstable. For this case, a new method named cost-sensitive kernel shifting is proposed. The training data are remapped from the input space to the feature space by a particular kernel function, in which a distance metric is defined. Then the outliers are eliminated and the informative samples, including border and edge samples are selected due to the neighbor and geometrical information in the mapped space. Thirdly the positions of all the selected samples in the feature space are shifted. A moving step length is defined in proportion to both the ratio and different of the misclassification costs. In all steps only the kernel matrix is needed to be reshaped due to the kernel trick. Experiments on both synthetic and public datasets verify the effectiveness of the proposed methods.
Similar content being viewed by others
Notes
In fact, no shitting is needed since all the data are exactly classified.
References
Khan SH, Hayat M, Bennamoun M et al (2018) Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Trans Neural Netw Learn Syst 29(8):3573–3587
García V, Sánchez JS, Marqués AI et al (2019) Understanding the apparent superiority of over-sampling through an analysis of local information for class-imbalanced data. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2019.113026
Kulluk S, Ozbakir L, Tapkan PZ et al (2016) Cost-sensitive meta-learning classifiers: MEPAR-miner and DIFACONN-miner. Knowl-Based Syst 98:148–161
Zhang G, Sun H, Ji Z et al (2016) Cost-sensitive dictionary learning for face recognition. Pattern Recogn 60:613–629
Turney PD (2000) Types of cost in inductive concept learning. In: Proceedings of the ICML’2000 workshop on cost-sensitive learning. Stanford University, IEEE Press, pp. 15–21
Zhou ZH, Liu XY (2006) On multi-class cost-sensitive learning. In: Proceedings, the twenty-first national conference on artificial intelligence and the eighteenth innovative applications of artificial intelligence conference, July 16–20, 2006, Boston, Massachusetts, USA. AAAI Press
Zhao Z, Wang X (2018) Cost-sensitive SVDD models based on a sample selection approach. Appl Intell 48:4247–4266
Sheng VS, Ling CX (2006) Thresholding for making classifiers cost-sensitive. In: Proceedings of the 21st national conference on artificial intelligence. Boston, Massachusetts, pp 476–481
Dhar S, Cherkassky V (2015) Development and evaluation of cost-sensitive universum-SVM. IEEE Trans Cybern 45(4):806–818
Datta S, Das S (2015) Near-Bayesian support vector machines for imbalanced data classification with equal or unequal misclassification costs. Neural Netw 70:39–52
Zhu H, Wang X (2017) A cost-sensitive semi-supervised learning model based on uncertainty. Neurocomputing 251:106–114
Zhang L, Zhang D (2017) Evolutionary cost-sensitive extreme learning machine. IEEE Trans Neural Netw Learn Syst 28(12):3045–3060
Cheng F, Zhang J, Wen C (2016) Cost-sensitive large margin distribution machine for classification of imbalanced data. Pattern Recogn Lett 80:107–112
Turney PD (1995) Cost-sensitive classification: empirical evaluation of a hybrid genetic decision tree induction algorithm. J Artif Intell Res 2:369–409
Li X, Zhao H, Zhu W (2015) A cost sensitive decision tree algorithm with two adaptive mechanisms. Knowl-Based Syst 88:24–33
Jia X, Liao W et al (2013) Minimum cost attribute reduction in decision-theoretic rough set models. Inf Sci 219:151–167
Shu W, Shen H (2016) Multi-criteria feature selection on cost-sensitive data with missing values. Pattern Recogn 51:268–280
Tapkana P, Özbakıra L, Kulluka S et al (2016) A cost-sensitive classification algorithm: BEE-Miner. Knowl-Based Syst 95:99–113
Domingos P (1999) Metacost: a general method for making classifiers cost-sensitive. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining. San Diego, CA, ACM Press, pp 155–164
Kima YJ, Baik B, Cho S (2016) Detecting financial misstatements with fraud intention using multi-class cost-sensitive learning. Expert Syst Appl 62:32–43
Zhao Z, Wang X (2018) Multi-segments Naïve Bayes classifier in likelihood space. IET Comput Vision 12(6):882–891
Elkan C (2001) The foundations of cost-sensitive learning. In: Proceedings of 17th international joint conference on artificial intelligence, pp. 973–978
Zadrozny B, Langford J, Abe N (2003) Cost-sensitive learning by cost-proportionate example weighting. In: Proceedings of the third IEEE international conference on data mining (ICDM’03)
K.M. Ting. Inducing cost-sensitive trees via instance weighting. In The 2nd European Symposium on Principles of KDD, Springer-Verlag, 1998, 139–147
Ting KM (2002) An instance-weighting method to induce cost-sensitive trees. IEEE Trans Knowl Data Eng 14(3):659–665
Zhang XL (2017) Speech separation by cost-sensitive deep learning. In: Proceedings of APSIPA annual summit and conference, 12–15 December, 2017, Malaysia
Choi S, Kim YJ, Briceno S, et al. (2017) Cost-sensitive prediction of airline delays using machine learning. In: 36th IEEE/AIAA digital avionics systems conference (DASC), SEP 17-21
Zhang J, Cui X, Li J et al (2017) Imbalanced classification of mental workload using a cost-sensitive majority weighted minority oversampling strategy. Cogn Tech Work 19:633–653
Sun Z, Song Q, Zhu X et al (2015) A novel ensemble method for classifying imbalanced data. Pattern Recogn 48:1623–1637
Wang S, Li Z, Chao W, et al. (2012) Applying adaptive over-sampling technique based on data density and cost-sensitive SVM to imbalanced learning. In: IEEE world congress on computational intelligence, June, 10–15, 2012, Brisbane, Australia
Han H, Wang WY, Mao BH (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. Advances in intelligent computing. Springer, Berlin Heidelberg
Ertekin S, Huang J, Bottou L, et al. (2007) Learning on the border: active learning in imbalanced data classification. In: Proceedings of the sixteenth ACM conference on information and knowledge management, CIKM 2007, Lisbon, Portugal, November 6–10, 2007 ACM
Borowska K, Stepaniuk J (2016) Imbalanced data classification: a novel re-sampling approach combining versatile improved SMOTE and rough sets. Computer Information Systems and Industrial Management. In: CISIM 2016. Lecture Notes in Computer Science, vol 9842. Springer, Cham
Wang Q, Luo Z, Huang J, et al. (2017) A novel ensemble method for imbalanced data learning: bagging of extrapolation-SMOTE SVM. In: Computational Intelligence and Neuroscience, 2017, Article ID 1827016
Nanni L, Fantozzi C, Lazzarini N (2015) Coupling different methods for overcoming the class imbalance problem. Neurocomputing 158:48–61
Perez-Ortiz M, Gutierrez PA, Hervas-Martinez C (2013) Borderline Kernel based over-sampling. In: Proceeding International Conference Hybrid Artifical Intelligence System, Salamanca, Spain, September 2013, pp 472–481
Mathew J, Luo M, Pang CK, et al (2015) Kernel-based SMOTE for SVM classification of imbalanced datasets. In: Proceedings IEEE IECON, Yokohama, Japan, November 2015, pp 1127–1132
Mathew J, Pang CK, Luo M et al (2018) Classification of imbalanced data by oversampling in kernel space of support vector machines. IEEE Trans Neural Netw Learn Syst 29(9):4065–4076
Pes B (2020) Learning from high-dimensional biomedical datasets: the issue of class imbalance. IEEE Access 8:13527–13540
Kamalov F (2020) Kernel density estimation based sampling for imbalanced class distribution. Inf Sci 512:1192–1201
Liu B, Tsoumakas G (2020) Dealing with class imbalance in classifier chains via random undersampling. Knowl-Based Syst 192:105292
Pouyanfar S, Chen S-C, Shyu M-L (2018) Deep spatio-temporal representation learning for multi-class imbalanced data classification. In: IEEE international conference on information reuse and integration for data science, pp 386–393
Huang C, Li Y, Loy CC, et al. (2016) Learning deep representation for imbalanced classification. In: IEEE Conference on computer vision and pattern recognition, (CVPR 2016), pp. 5375–5384
Wong ML, Seng K, Wong PK (2020) Cost-sensitive ensemble of stacked denoising autoencoders for class imbalance problems in business domain. Expert Syst Appl 141:112918
Datta S, Das S (2019) Multiobjective support vector machines: handling class imbalance with pareto optimality. IEEE Trans Neural Netw Learn Syst 30(5):1602–1608
Mullick SS, Datta S, Das S (2019) Generative adversarial minority oversampling. arXiv: 1903.09730
Datta S, Nag S, Das S (2019) Boosting with lexicographic programming: addressing class imbalance without cost tuning. IEEE Transactions on Knowledge and Data Engineering. https://doi.org/10.1109/tkde.2019.2894148
Mullick SS, Datta S, Dhekane SG et al (2020) Appropriateness of performance indices for imbalanced data classification: an analysis. Pattern Recogn 102:107197
Jian C, Gao J, Ao Y (2016) A new sampling method for classifying imbalanced data based on support vector machine ensemble. Neurocomputing 193:115–122
Lin WC, Tsai CF, Hu YH et al (2017) Clustering-based undersampling in class-imbalanced data. Inf Sci 409–410:17–26
Rayhan F, Ahmed S, Mahbub A, et al (2017) CUSBoost: cluster-based under-sampling with boosting for imbalanced classification. In: 2nd IEEE international conference on computational systems and information technology for sustainable solutions. pp, 70–75
Ofek N, Rokach L, Stern R et al (2017) Fast-CBUS: a fast clustering-based undersampling method for addressing the class imbalanced problem. Neurocomputing 243:88–102
Nejatian S, Parvin H, Faraji E (2018) Using sub-sampling and ensemble clustering techniques to improve performance of imbalanced classification. Neurocomputing 276:55–66
Lu W, Li Z, Chu J (2017) Adaptive ensemble undersampling-Boost: a novel learning framework for imbalanced data. J Syst Softw 132:272–282
Sun B, Chen H, Wang J et al (2018) Evolutionary under-sampling based bagging ensemble method for imbalanced data classification. Front Comput Sci. 12:331. https://doi.org/10.1007/s11704-016-5306-z
He H, Bai Y, Garcia EA, et al. (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proceeding of international joint conferwence neural network. June 2008, pp 1322–1328
Jiang L, Qiu C, Li C (2015) A novel minority cloning technique for cost-sensitive learning. Int J Pattern Recognit Artif Intell 29(4):1551004
Ahmed S, Mahbub A, Rayhan F, et al. (2017) Hybrid methods for class imbalance learning employing Bagging with sampling techniques. In: 2017 2nd international conference on computational systems and information technology for sustainable solution (CSITSS), Bangalore, pp 1–5
Qiu C, Jiang L, Kong G (2015) A differential evolution-based method for class-imbalanced cost-sensitive learning. In: 2015 International joint conference on neural networks (IJCNN), Killarney, pp 1–8
Zhang C, Song J, Gao W, et al. (2016) An imbalanced data classification algorithm of improved autoencoder neural network. In: 8th international conference on advanced computational intelligence, Chiang Mai, Thailand, February 14–16
Wang J, Huang PL, Sun KW, et al. (2013) Ensemble of cost-sensitive hypernetworks for class-imbalanced learning. In: 2013 IEEE international conference on systems, man, and cybernetics
Li YH, Maguire L (2011) Selecting critical patterns based on local geometrical and statistical information. IEEE Trans Pattern Anal Mach Intell 33(6):1189–1201
Paredes R, Vidal E (2006) Learning prototypes and distances: a prototype reduction technique based on nearest neighbor error minimization. Pattern Recogn 39:180–188
Nanni L, Franco A (2011) Reduced Reward-punishment editing for building ensembles of classifiers. Expert Syst Appl 38:2395–2400
Alcalá-Fdez J, Fernandez A, Luengo J et al (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Multiple-Valued Logic Soft Comput 17(2–3):255–287
Zhao Z, Wang X, Wu C, Lei L (2019) Hunting optimization: an new framework for single objective optimization problems. IEEE Access 7:31305–31320
Acknowledgements
This work is supported by the National Natural Science Foundation of China under Grants 61876189, 61273275, 61806219, 61703426 and 61503407.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhao, Z., Wang, X., Wu, C. et al. Cost-sensitive sample shifting in feature space. Pattern Anal Applic 23, 1689–1707 (2020). https://doi.org/10.1007/s10044-020-00890-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-020-00890-9