Application of large-scale L2-SVM for microarray classification

Li, Baosheng; Han, Baole; Qin, Chuandong

doi:10.1007/s11227-021-03962-7

Application of large-scale L₂-SVM for microarray classification

Published: 28 June 2021

Volume 78, pages 2265–2286, (2022)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

226 Accesses
1 Citation
Explore all metrics

Abstract

Traditional classification algorithms work well on general small-scale microarray datasets, but for large-scale scenarios, general machines are not capable of supporting the operation of these algorithms anymore for the memory and time costs. In this paper, we design a new application framework to perform the computation of at the fastest speed. First, the synthetic minority over-sampling technique is used to sample a few classes of sample for obtaining the balanced data. Then, a large-scale algorithm for \(L_{2}\)-SVM based on the stochastic gradient descent method is proposed and used for microarray classification. Also, We give a simple proof of the convergence of stochastic gradient descent algorithm. Next, various large-scale algorithms for support vector machines are performed on the microarray datasets to identify the most appropriate algorithm. Finally, a comparative analysis of loss functions is done to clearly understand the differences. The experimental results show that the stochastic gradient descent algorithm and the squared hinge loss is an attractive choice, which can achieve high accuracy in seconds.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Learning from imbalanced data: open challenges and future directions

Article Open access 22 April 2016

Supervised Classification Algorithms in Machine Learning: A Survey and Review

References

Leung YF, Cavalieri D (2003) Fundamentals of cDNA microarray data analysis. Trends Genet 19:649–659
Article Google Scholar
Lee G, Rodriguez C, Madabhushi A (2008) Investigating the efficacy of nonlinear dimensionality reduction schemes in classifying gene and protein expression studies. In: IEEE/ACM transactions on computational biology and bioinformatics. pp 368–384
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422. https://doi.org/10.1023/A:1012487302797
Article MATH Google Scholar
Daoud M, Mayo M (2019) A survey of neural network-based cancer prediction models from microarray data. Artif Intell Med 97:204–214
Article Google Scholar
Garro BA, Rodríguez K, Vázquez RA (2016) Classification of DNA microarrays using artificial neural networks and ABC algorithm. Appl Soft Comput J 38:548–560. https://doi.org/10.1016/j.asoc.2015.10.002
Article Google Scholar
Shah SH, Iqbal MJ, Ahmad I et al (2020) Optimized gene selection and classification of cancer from microarray gene expression data using deep learning. Neural Comput Appl. https://doi.org/10.1007/s00521-020-05367-8
Article Google Scholar
Vafaee Sharbaf F, Mosafer S, Moattar MH (2016) A hybrid gene selection approach for microarray data classification using cellular learning automata and ant colony optimization. Genomics 107:231–238. https://doi.org/10.1016/j.ygeno.2016.05.001
Article Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357. https://doi.org/10.1613/jair.953
Article MATH Google Scholar
Liu XY, Wu J, Zhou ZH (2009) Exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man Cybern Part B Cybern 39:539–550. https://doi.org/10.1109/TSMCB.2008.2007853
Article Google Scholar
Platt J (1999) Sequential minimal optimization: A fast algorithm for training support vector machines. Advances in Kernel Methods-Support Vector learning. Cambridge, MA MIT Press, pp. 185–208
Joachims T (2006) Training linear SVMs in linear time. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp 217–226
Fan RE, Chang KW, Hsieh CJ et al (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9:1871–1874. https://doi.org/10.1145/1390681.1442794
Article MATH Google Scholar
Smola AJ, Vishwanathan SVN, Le QV (2007) Bundle methods for machine learning. In: Proceedings of the 20th International Conference on Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, USA. pp 1377–1384
Bottou L (2012) Stochastic gradient descent tricks. pp 421–436
Bottou L, Curtis FE, Nocedal J (2018) Optimization methods for large-scale machine learning. SIAM Rev 60:223–311
Article MathSciNet Google Scholar
Nguyen LM, Nguyen PH, Richtárik P et al (2019) New convergence aspects of stochastic gradient algorithms. J Mach Learn Res 20:1–49
MathSciNet MATH Google Scholar
Kivinen J, Smola AJ, Williamson RC (2004) Online learning with kernels. IEEE Trans Signal Process 52:2165–2176. https://doi.org/10.1109/TSP.2004.830991
Article MathSciNet MATH Google Scholar
Shalev-Shwartz S, Singer Y, Srebro N, Cotter A (2011) Pegasos: primal estimated sub-gradient solver for SVM. Math Program 127:3–30. https://doi.org/10.1007/s10107-010-0420-4
Article MathSciNet MATH Google Scholar
Bordes A, Bottou L, Gallinari P (2009) SGD-QN: Careful quasi-newton stochastic gradient descent. J Mach Learn Res 10:1737–1754
Takáč M, Bijral A, Richtárik P, Srebro N (2013) Mini-batch primal and dual methods for SVMs. In: 30th International Conference on Machine Learning, ICML 2013. pp 2059–2067
Wang Z, Djuric N, Crammer K, Vucetic S (2011) Trading representability for scalability: adaptive multi-hyperplane machine for nonlinear classification. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp 24–32
Djuric N, Wang Z, Vucetic S (2020) Growing adaptive multi-hyperplane machines. In: III HD, Singh A (eds) Proceedings of the 37th International Conference on Machine Learning. PMLR, Virtual. pp 2567–2576
Wang Z, Crammer K, Vucetic S (2012) Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale SVM training. J Mach Learn Res 13:3103–3131
MathSciNet MATH Google Scholar
Cheung IWT and JTK and P-M (2005) Core vector machines: fast SVM training on very large data sets. J Mach Learn Res 6:363–392
MathSciNet Google Scholar
Wang S, Wang J, Chung F (2014) Kernel density estimation, kernel methods, and fast learning in large data sets. IEEE Trans Cybern 44:1–20. https://doi.org/10.1109/TSMCB.2012.2236828
Article Google Scholar
Ding S, Nie X, Qiao H, Zhang B (2018) A fast algorithm of convex hull vertices selection for online classification. IEEE Trans Neural Netw Learn Syst 29:792–806. https://doi.org/10.1109/TNNLS.2017.2648038
Article MathSciNet Google Scholar
Gu X, Chung F, Wang S (2018) Fast convex-hull vector machine for training on large-scale ncRNA data classification tasks. Knowl Based Syst 151:149–164. https://doi.org/10.1016/j.knosys.2018.03.029
Article Google Scholar
Graf HP, Cosatto E, Bottou L, et al (2005) Parallel support vector machines: the cascade SVM. In: Advances in neural information processing systems
Haferlach T, Kohlmann A, Wieczorek L et al (2010) Clinical utility of microarray-based gene expression profiling in the diagnosis and subclassification of leukemia: report from the International Microarray Innovations in Leukemia Study Group. J Clin Oncol Off J Am Soc Clin Oncol 28:2529–2537. https://doi.org/10.1200/JCO.2009.23.4732
Article Google Scholar
Urabe F, Matsuzaki J, Yamamoto Y et al (2019) Large-scale Circulating microRNA Profiling for the liquid biopsy of prostate cancer. Clin Cancer Res Off J Am Assoc Cancer Res 25:3016–3025. https://doi.org/10.1158/1078-0432.CCR-18-2849
Article Google Scholar
Noble CL, Abbas AR, Cornelius J et al (2008) Regional variation in gene expression in the healthy colon is dysregulated in ulcerative colitis. Gut 57:1398–1405. https://doi.org/10.1136/gut.2008.148395
Article Google Scholar
Pellagatti A, Cazzola M, Giagounidis A et al (2010) Deregulated gene expression pathways in myelodysplastic syndrome hematopoietic stem cells. Leukemia 24:756–764. https://doi.org/10.1038/leu.2010.31
Article Google Scholar
Kumar M, Kumar Rath S (2015) Classification of microarray using MapReduce based proximal support vector machine classifier. Knowl Based Syst 89:584–602. https://doi.org/10.1016/j.knosys.2015.09.005
Article Google Scholar
Kumar M, Rath NK, Rath SK (2016) Analysis of microarray leukemia data using an efficient MapReduce-based K-nearest-neighbor classifier. J Biomed Inform 60:395–409. https://doi.org/10.1016/j.jbi.2016.03.002
Article Google Scholar
Baliarsingh SK, Vipsita S, Gandomi AH et al (2020) Analysis of high-dimensional genomic data using MapReduce based probabilistic neural network. Comput Methods Programs Biomed. https://doi.org/10.1016/j.cmpb.2020.105625
Article Google Scholar
Liu S, Mocanu DC, Matavalam ARR et al (2021) Sparse evolutionary deep learning with over one million artificial neurons on commodity hardware. Neural Comput Appl 33:2589–2604. https://doi.org/10.1007/s00521-020-05136-7
Article Google Scholar

Download references

Acknowledgements

This work was partially supported by the funding of National Natural Science Foundation of China (No. 62066001), National Natural Science Youth Science Foundation of China (No. 61907012), Natural Science Foundation of Ningxia (No. 2021AAC03230), and North Minzu University Major special projects: 201804. Authors are grateful to all the reviewers and Editor-in-Chief for their insightful comments on this paper.

Author information

Authors and Affiliations

School of Mathematics and Information Science, North Minzu University, Yinchuan, 750021, China
Baosheng Li, Baole Han & Chuandong Qin
Ningxia Key Laboratory of Intelligent Information and Big Data Processing, Yinchuan, 750021, China
Chuandong Qin

Authors

Baosheng Li
View author publications
You can also search for this author in PubMed Google Scholar
Baole Han
View author publications
You can also search for this author in PubMed Google Scholar
Chuandong Qin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chuandong Qin.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, B., Han, B. & Qin, C. Application of large-scale L₂-SVM for microarray classification. J Supercomput 78, 2265–2286 (2022). https://doi.org/10.1007/s11227-021-03962-7

Download citation

Accepted: 18 June 2021
Published: 28 June 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s11227-021-03962-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Application of large-scale L₂-SVM for microarray classification

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Learning from imbalanced data: open challenges and future directions

Supervised Classification Algorithms in Machine Learning: A Survey and Review

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Application of large-scale L2-SVM for microarray classification

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Learning from imbalanced data: open challenges and future directions

Supervised Classification Algorithms in Machine Learning: A Survey and Review

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

Application of large-scale L₂-SVM for microarray classification