Cost-sensitive sample shifting in feature space

Zhao, Zhenchong; Wang, Xiaodan; Wu, Chongming; Lei, Lei

doi:10.1007/s10044-020-00890-9

Cost-sensitive sample shifting in feature space

Theoretical advances
Published: 17 June 2020

Volume 23, pages 1689–1707, (2020)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Zhenchong Zhao¹,
Xiaodan Wang¹,
Chongming Wu² &
…
Lei Lei¹

221 Accesses
1 Citation
Explore all metrics

Abstract

The asymmetry of different misclassification costs is a common problem in many realistic applications. As one of the most familiar preprocessing methods, cost-sensitive resampling has drawn great attention due to its easy-implemented and universal properties. However, current methods mainly concentrate on changing the amount of the training set, which will alter the original distribution shapes and lead to the classifiers be over-fitted or unstable. For this case, a new method named cost-sensitive kernel shifting is proposed. The training data are remapped from the input space to the feature space by a particular kernel function, in which a distance metric is defined. Then the outliers are eliminated and the informative samples, including border and edge samples are selected due to the neighbor and geometrical information in the mapped space. Thirdly the positions of all the selected samples in the feature space are shifted. A moving step length is defined in proportion to both the ratio and different of the misclassification costs. In all steps only the kernel matrix is needed to be reshaped due to the kernel trick. Experiments on both synthetic and public datasets verify the effectiveness of the proposed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unsupervised Kernel Function Building Using Maximization of Information Potential Variability

Borderline Kernel Based Over-Sampling

Unsupervised nonlinear feature selection algorithm via kernel function

Article 08 November 2018

Jiaye Li, Shichao Zhang, … Jilian Zhang

Notes

In fact, no shitting is needed since all the data are exactly classified.
http://prtools.tudelft.nl/.

References

Khan SH, Hayat M, Bennamoun M et al (2018) Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Trans Neural Netw Learn Syst 29(8):3573–3587
Article Google Scholar
García V, Sánchez JS, Marqués AI et al (2019) Understanding the apparent superiority of over-sampling through an analysis of local information for class-imbalanced data. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2019.113026
Article Google Scholar
Kulluk S, Ozbakir L, Tapkan PZ et al (2016) Cost-sensitive meta-learning classifiers: MEPAR-miner and DIFACONN-miner. Knowl-Based Syst 98:148–161
Article Google Scholar
Zhang G, Sun H, Ji Z et al (2016) Cost-sensitive dictionary learning for face recognition. Pattern Recogn 60:613–629
Article Google Scholar
Turney PD (2000) Types of cost in inductive concept learning. In: Proceedings of the ICML’2000 workshop on cost-sensitive learning. Stanford University, IEEE Press, pp. 15–21
Zhou ZH, Liu XY (2006) On multi-class cost-sensitive learning. In: Proceedings, the twenty-first national conference on artificial intelligence and the eighteenth innovative applications of artificial intelligence conference, July 16–20, 2006, Boston, Massachusetts, USA. AAAI Press
Zhao Z, Wang X (2018) Cost-sensitive SVDD models based on a sample selection approach. Appl Intell 48:4247–4266
Article Google Scholar
Sheng VS, Ling CX (2006) Thresholding for making classifiers cost-sensitive. In: Proceedings of the 21st national conference on artificial intelligence. Boston, Massachusetts, pp 476–481
Dhar S, Cherkassky V (2015) Development and evaluation of cost-sensitive universum-SVM. IEEE Trans Cybern 45(4):806–818
Article Google Scholar
Datta S, Das S (2015) Near-Bayesian support vector machines for imbalanced data classification with equal or unequal misclassification costs. Neural Netw 70:39–52
Article Google Scholar
Zhu H, Wang X (2017) A cost-sensitive semi-supervised learning model based on uncertainty. Neurocomputing 251:106–114
Article Google Scholar
Zhang L, Zhang D (2017) Evolutionary cost-sensitive extreme learning machine. IEEE Trans Neural Netw Learn Syst 28(12):3045–3060
Article MathSciNet Google Scholar
Cheng F, Zhang J, Wen C (2016) Cost-sensitive large margin distribution machine for classification of imbalanced data. Pattern Recogn Lett 80:107–112
Article Google Scholar
Turney PD (1995) Cost-sensitive classification: empirical evaluation of a hybrid genetic decision tree induction algorithm. J Artif Intell Res 2:369–409
Article Google Scholar
Li X, Zhao H, Zhu W (2015) A cost sensitive decision tree algorithm with two adaptive mechanisms. Knowl-Based Syst 88:24–33
Article Google Scholar
Jia X, Liao W et al (2013) Minimum cost attribute reduction in decision-theoretic rough set models. Inf Sci 219:151–167
Article MathSciNet Google Scholar
Shu W, Shen H (2016) Multi-criteria feature selection on cost-sensitive data with missing values. Pattern Recogn 51:268–280
Article Google Scholar
Tapkana P, Özbakıra L, Kulluka S et al (2016) A cost-sensitive classification algorithm: BEE-Miner. Knowl-Based Syst 95:99–113
Article Google Scholar
Domingos P (1999) Metacost: a general method for making classifiers cost-sensitive. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining. San Diego, CA, ACM Press, pp 155–164
Kima YJ, Baik B, Cho S (2016) Detecting financial misstatements with fraud intention using multi-class cost-sensitive learning. Expert Syst Appl 62:32–43
Article Google Scholar
Zhao Z, Wang X (2018) Multi-segments Naïve Bayes classifier in likelihood space. IET Comput Vision 12(6):882–891
Article Google Scholar
Elkan C (2001) The foundations of cost-sensitive learning. In: Proceedings of 17th international joint conference on artificial intelligence, pp. 973–978
Zadrozny B, Langford J, Abe N (2003) Cost-sensitive learning by cost-proportionate example weighting. In: Proceedings of the third IEEE international conference on data mining (ICDM’03)
K.M. Ting. Inducing cost-sensitive trees via instance weighting. In The 2nd European Symposium on Principles of KDD, Springer-Verlag, 1998, 139–147
Ting KM (2002) An instance-weighting method to induce cost-sensitive trees. IEEE Trans Knowl Data Eng 14(3):659–665
Article MathSciNet Google Scholar
Zhang XL (2017) Speech separation by cost-sensitive deep learning. In: Proceedings of APSIPA annual summit and conference, 12–15 December, 2017, Malaysia
Choi S, Kim YJ, Briceno S, et al. (2017) Cost-sensitive prediction of airline delays using machine learning. In: 36th IEEE/AIAA digital avionics systems conference (DASC), SEP 17-21
Zhang J, Cui X, Li J et al (2017) Imbalanced classification of mental workload using a cost-sensitive majority weighted minority oversampling strategy. Cogn Tech Work 19:633–653
Article Google Scholar
Sun Z, Song Q, Zhu X et al (2015) A novel ensemble method for classifying imbalanced data. Pattern Recogn 48:1623–1637
Article Google Scholar
Wang S, Li Z, Chao W, et al. (2012) Applying adaptive over-sampling technique based on data density and cost-sensitive SVM to imbalanced learning. In: IEEE world congress on computational intelligence, June, 10–15, 2012, Brisbane, Australia
Han H, Wang WY, Mao BH (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. Advances in intelligent computing. Springer, Berlin Heidelberg
Google Scholar
Ertekin S, Huang J, Bottou L, et al. (2007) Learning on the border: active learning in imbalanced data classification. In: Proceedings of the sixteenth ACM conference on information and knowledge management, CIKM 2007, Lisbon, Portugal, November 6–10, 2007 ACM
Borowska K, Stepaniuk J (2016) Imbalanced data classification: a novel re-sampling approach combining versatile improved SMOTE and rough sets. Computer Information Systems and Industrial Management. In: CISIM 2016. Lecture Notes in Computer Science, vol 9842. Springer, Cham
Wang Q, Luo Z, Huang J, et al. (2017) A novel ensemble method for imbalanced data learning: bagging of extrapolation-SMOTE SVM. In: Computational Intelligence and Neuroscience, 2017, Article ID 1827016
Nanni L, Fantozzi C, Lazzarini N (2015) Coupling different methods for overcoming the class imbalance problem. Neurocomputing 158:48–61
Article Google Scholar
Perez-Ortiz M, Gutierrez PA, Hervas-Martinez C (2013) Borderline Kernel based over-sampling. In: Proceeding International Conference Hybrid Artifical Intelligence System, Salamanca, Spain, September 2013, pp 472–481
Mathew J, Luo M, Pang CK, et al (2015) Kernel-based SMOTE for SVM classification of imbalanced datasets. In: Proceedings IEEE IECON, Yokohama, Japan, November 2015, pp 1127–1132
Mathew J, Pang CK, Luo M et al (2018) Classification of imbalanced data by oversampling in kernel space of support vector machines. IEEE Trans Neural Netw Learn Syst 29(9):4065–4076
Article Google Scholar
Pes B (2020) Learning from high-dimensional biomedical datasets: the issue of class imbalance. IEEE Access 8:13527–13540
Article Google Scholar
Kamalov F (2020) Kernel density estimation based sampling for imbalanced class distribution. Inf Sci 512:1192–1201
Article MathSciNet Google Scholar
Liu B, Tsoumakas G (2020) Dealing with class imbalance in classifier chains via random undersampling. Knowl-Based Syst 192:105292
Article Google Scholar
Pouyanfar S, Chen S-C, Shyu M-L (2018) Deep spatio-temporal representation learning for multi-class imbalanced data classification. In: IEEE international conference on information reuse and integration for data science, pp 386–393
Huang C, Li Y, Loy CC, et al. (2016) Learning deep representation for imbalanced classification. In: IEEE Conference on computer vision and pattern recognition, (CVPR 2016), pp. 5375–5384
Wong ML, Seng K, Wong PK (2020) Cost-sensitive ensemble of stacked denoising autoencoders for class imbalance problems in business domain. Expert Syst Appl 141:112918
Article Google Scholar
Datta S, Das S (2019) Multiobjective support vector machines: handling class imbalance with pareto optimality. IEEE Trans Neural Netw Learn Syst 30(5):1602–1608
Article MathSciNet Google Scholar
Mullick SS, Datta S, Das S (2019) Generative adversarial minority oversampling. arXiv: 1903.09730
Datta S, Nag S, Das S (2019) Boosting with lexicographic programming: addressing class imbalance without cost tuning. IEEE Transactions on Knowledge and Data Engineering. https://doi.org/10.1109/tkde.2019.2894148
Article Google Scholar
Mullick SS, Datta S, Dhekane SG et al (2020) Appropriateness of performance indices for imbalanced data classification: an analysis. Pattern Recogn 102:107197
Article Google Scholar
Jian C, Gao J, Ao Y (2016) A new sampling method for classifying imbalanced data based on support vector machine ensemble. Neurocomputing 193:115–122
Article Google Scholar
Lin WC, Tsai CF, Hu YH et al (2017) Clustering-based undersampling in class-imbalanced data. Inf Sci 409–410:17–26
Article Google Scholar
Rayhan F, Ahmed S, Mahbub A, et al (2017) CUSBoost: cluster-based under-sampling with boosting for imbalanced classification. In: 2nd IEEE international conference on computational systems and information technology for sustainable solutions. pp, 70–75
Ofek N, Rokach L, Stern R et al (2017) Fast-CBUS: a fast clustering-based undersampling method for addressing the class imbalanced problem. Neurocomputing 243:88–102
Article Google Scholar
Nejatian S, Parvin H, Faraji E (2018) Using sub-sampling and ensemble clustering techniques to improve performance of imbalanced classification. Neurocomputing 276:55–66
Article Google Scholar
Lu W, Li Z, Chu J (2017) Adaptive ensemble undersampling-Boost: a novel learning framework for imbalanced data. J Syst Softw 132:272–282
Article Google Scholar
Sun B, Chen H, Wang J et al (2018) Evolutionary under-sampling based bagging ensemble method for imbalanced data classification. Front Comput Sci. 12:331. https://doi.org/10.1007/s11704-016-5306-z
Article Google Scholar
He H, Bai Y, Garcia EA, et al. (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proceeding of international joint conferwence neural network. June 2008, pp 1322–1328
Jiang L, Qiu C, Li C (2015) A novel minority cloning technique for cost-sensitive learning. Int J Pattern Recognit Artif Intell 29(4):1551004
Article Google Scholar
Ahmed S, Mahbub A, Rayhan F, et al. (2017) Hybrid methods for class imbalance learning employing Bagging with sampling techniques. In: 2017 2nd international conference on computational systems and information technology for sustainable solution (CSITSS), Bangalore, pp 1–5
Qiu C, Jiang L, Kong G (2015) A differential evolution-based method for class-imbalanced cost-sensitive learning. In: 2015 International joint conference on neural networks (IJCNN), Killarney, pp 1–8
Zhang C, Song J, Gao W, et al. (2016) An imbalanced data classification algorithm of improved autoencoder neural network. In: 8th international conference on advanced computational intelligence, Chiang Mai, Thailand, February 14–16
Wang J, Huang PL, Sun KW, et al. (2013) Ensemble of cost-sensitive hypernetworks for class-imbalanced learning. In: 2013 IEEE international conference on systems, man, and cybernetics
Li YH, Maguire L (2011) Selecting critical patterns based on local geometrical and statistical information. IEEE Trans Pattern Anal Mach Intell 33(6):1189–1201
Article Google Scholar
Paredes R, Vidal E (2006) Learning prototypes and distances: a prototype reduction technique based on nearest neighbor error minimization. Pattern Recogn 39:180–188
Article Google Scholar
Nanni L, Franco A (2011) Reduced Reward-punishment editing for building ensembles of classifiers. Expert Syst Appl 38:2395–2400
Article Google Scholar
Alcalá-Fdez J, Fernandez A, Luengo J et al (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Multiple-Valued Logic Soft Comput 17(2–3):255–287
Google Scholar
Zhao Z, Wang X, Wu C, Lei L (2019) Hunting optimization: an new framework for single objective optimization problems. IEEE Access 7:31305–31320
Article Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China under Grants 61876189, 61273275, 61806219, 61703426 and 61503407.

Author information

Authors and Affiliations

Air and Missile Defense College, Air force Engineering University, Xi’an, 710051, Shaanxi, People’s Republic of China
Zhenchong Zhao, Xiaodan Wang & Lei Lei
College of Business, Xijing University, Xi’an, Shaanxi, 710123, People’s Republic of China
Chongming Wu

Authors

Zhenchong Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Xiaodan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chongming Wu
View author publications
You can also search for this author in PubMed Google Scholar
Lei Lei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaodan Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, Z., Wang, X., Wu, C. et al. Cost-sensitive sample shifting in feature space. Pattern Anal Applic 23, 1689–1707 (2020). https://doi.org/10.1007/s10044-020-00890-9

Download citation

Received: 06 August 2019
Accepted: 08 June 2020
Published: 17 June 2020
Issue Date: November 2020
DOI: https://doi.org/10.1007/s10044-020-00890-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cost-sensitive sample shifting in feature space

Abstract

Access this article

Similar content being viewed by others

Unsupervised Kernel Function Building Using Maximization of Information Potential Variability

Borderline Kernel Based Over-Sampling

Unsupervised nonlinear feature selection algorithm via kernel function

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cost-sensitive sample shifting in feature space

Abstract

Access this article

Similar content being viewed by others

Unsupervised Kernel Function Building Using Maximization of Information Potential Variability

Borderline Kernel Based Over-Sampling

Unsupervised nonlinear feature selection algorithm via kernel function

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation