Learning Misclassification Costs for Imbalanced Datasets, Application in Gene Expression Data Classification

Lu, Huijuan; Xu, Yige; Ye, Minchao; Yan, Ke; Jin, Qun; Gao, Zhigang

doi:10.1007/978-3-319-95930-6_47

Huijuan Lu¹⁷,
Yige Xu¹⁷,
Minchao Ye¹⁷,
Ke Yan¹⁷,
Qun Jin¹⁸ &
…
Zhigang Gao¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10954))

Included in the following conference series:

International Conference on Intelligent Computing

2819 Accesses
2 Citations

Abstract

Cost-sensitive algorithms have been widely used to solve imbalanced classification problem. However, the misclassification costs are usually determined empirically, leading to uncertain performance. Hence an effective method is desired to automatically calculate the optimal cost weights. Targeting at the highest weighted classification accuracy (WCA), we propose two approaches to search for the optimal cost weights, including grid searching and function fitting. In experiments, we classify imbalanced gene expression data using extreme learning machine to test the cost weights obtained by the two approaches. Comprehensive experimental results show that the function fitting is more efficient which can well find the optimal cost weights with acceptable WCA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Golub, T.R., Slonim, D.K., Tamayo, P., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
Article Google Scholar
Yan, K., Ma, L.L., Dai, Y.T., et al.: Cost-sensitive and sequential feature selection for chiller fault detection and diagnosis. Int. J. Refrig. 86, 401–409 (2018)
Article Google Scholar
Lu, H.J., Yang, L., Yan, K., et al.: A cost-sensitive rotation forest algorithm for gene expression data classification. Neurocomputing 228, 270–276 (2017)
Article Google Scholar
Cao, P., Zhao, D., Zaiane, O.: An optimized cost-sensitive SVM for imbalanced data learning. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013. LNCS, vol. 7819, pp. 280–292. Springer, Berlin (2013). https://doi.org/10.1007/978-3-642-37456-2_24
Chapter Google Scholar
Zheng, E., Zhang, C., Liu, X., Lu, H., Sun, J.: Cost-sensitive extreme learning machine. In: Motoda, H., et al. (eds.) ADMA 2013. LNCS (LNAI), vol. 8347, pp. 478–488. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-53917-6_43
Chapter Google Scholar
Liu, Y., Lu, H., Yan, K., et al.: Applying cost-sensitive extreme learning machine and dissimilarity integration to gene expression data classification. Comput. Intell. Neurosci. 2016 (2016). Article ID 8056253
Google Scholar
Lu, H.J., Chen, J.Y., Yan, K., et al.: A hybrid feature selection algorithm for gene expression data classification. Neurocomputing 256, 56–62 (2017)
Article Google Scholar
Yan, K., Ji, Z.W., Shen, W.: Online fault detection methods for chillers combining extended Kalman filter and recursive one-class SVM. Neurocomputing 228, 205–212 (2017)
Article Google Scholar
Cheng, X.Y., Chai, F.X., et al.: 1stOpt and global optimization platform—comparison and case study. In: Proceedings of the 4th IEEE International Conference on Computer Science and Information Technology, Chengdu, China, pp. 18–21 (2011)
Google Scholar

Download references

Acknowledgments

This study is supported by National Natural Science Foundation of China (Nos. 61272315, 61402417, 61602431 and 61701468), Zhejiang Provincial Natural Science Foundation (Nos. Y1110342, LY15F020037) and International Cooperation Project of Zhejiang Provincial Science and Technology Department (No. 2017C34003).

Author information

Authors and Affiliations

College of Information Engineering, China Jiliang University, 258 Xueyuan Street, Hangzhou, 310018, China
Huijuan Lu, Yige Xu, Minchao Ye & Ke Yan
Faculty of Human Sciences, Waseda University, Tokorozawa, 359-1192, Japan
Qun Jin
College of Computer Science, Hangzhou Dianzi University, Hangzhou, 310018, China
Zhigang Gao

Authors

Huijuan Lu
View author publications
You can also search for this author in PubMed Google Scholar
Yige Xu
View author publications
You can also search for this author in PubMed Google Scholar
Minchao Ye
View author publications
You can also search for this author in PubMed Google Scholar
Ke Yan
View author publications
You can also search for this author in PubMed Google Scholar
Qun Jin
View author publications
You can also search for this author in PubMed Google Scholar
Zhigang Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Minchao Ye .

Editor information

Editors and Affiliations

Tongji University, Shanghai, China
De-Shuang Huang
Polytechnic of Bari, Bari, Italy
Vitoantonio Bevilacqua
University of Wollongong, North Wollongong, New South Wales, Australia
Prashan Premaratne
Indian Institute of Technology Kanpur, Kanpur, India
Phalguni Gupta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lu, H., Xu, Y., Ye, M., Yan, K., Jin, Q., Gao, Z. (2018). Learning Misclassification Costs for Imbalanced Datasets, Application in Gene Expression Data Classification. In: Huang, DS., Bevilacqua, V., Premaratne, P., Gupta, P. (eds) Intelligent Computing Theories and Application. ICIC 2018. Lecture Notes in Computer Science(), vol 10954. Springer, Cham. https://doi.org/10.1007/978-3-319-95930-6_47

Download citation

DOI: https://doi.org/10.1007/978-3-319-95930-6_47
Published: 06 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-95929-0
Online ISBN: 978-3-319-95930-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics