Skip to main content
Log in

Application of large-scale L2-SVM for microarray classification

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Traditional classification algorithms work well on general small-scale microarray datasets, but for large-scale scenarios, general machines are not capable of supporting the operation of these algorithms anymore for the memory and time costs. In this paper, we design a new application framework to perform the computation of at the fastest speed. First, the synthetic minority over-sampling technique is used to sample a few classes of sample for obtaining the balanced data. Then, a large-scale algorithm for \(L_{2}\)-SVM based on the stochastic gradient descent method is proposed and used for microarray classification. Also, We give a simple proof of the convergence of stochastic gradient descent algorithm. Next, various large-scale algorithms for support vector machines are performed on the microarray datasets to identify the most appropriate algorithm. Finally, a comparative analysis of loss functions is done to clearly understand the differences. The experimental results show that the stochastic gradient descent algorithm and the squared hinge loss is an attractive choice, which can achieve high accuracy in seconds.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Leung YF, Cavalieri D (2003) Fundamentals of cDNA microarray data analysis. Trends Genet 19:649–659

    Article  Google Scholar 

  2. Lee G, Rodriguez C, Madabhushi A (2008) Investigating the efficacy of nonlinear dimensionality reduction schemes in classifying gene and protein expression studies. In: IEEE/ACM transactions on computational biology and bioinformatics. pp 368–384

  3. Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422. https://doi.org/10.1023/A:1012487302797

    Article  MATH  Google Scholar 

  4. Daoud M, Mayo M (2019) A survey of neural network-based cancer prediction models from microarray data. Artif Intell Med 97:204–214

    Article  Google Scholar 

  5. Garro BA, Rodríguez K, Vázquez RA (2016) Classification of DNA microarrays using artificial neural networks and ABC algorithm. Appl Soft Comput J 38:548–560. https://doi.org/10.1016/j.asoc.2015.10.002

    Article  Google Scholar 

  6. Shah SH, Iqbal MJ, Ahmad I et al (2020) Optimized gene selection and classification of cancer from microarray gene expression data using deep learning. Neural Comput Appl. https://doi.org/10.1007/s00521-020-05367-8

    Article  Google Scholar 

  7. Vafaee Sharbaf F, Mosafer S, Moattar MH (2016) A hybrid gene selection approach for microarray data classification using cellular learning automata and ant colony optimization. Genomics 107:231–238. https://doi.org/10.1016/j.ygeno.2016.05.001

    Article  Google Scholar 

  8. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357. https://doi.org/10.1613/jair.953

    Article  MATH  Google Scholar 

  9. Liu XY, Wu J, Zhou ZH (2009) Exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man Cybern Part B Cybern 39:539–550. https://doi.org/10.1109/TSMCB.2008.2007853

    Article  Google Scholar 

  10. Platt J (1999) Sequential minimal optimization: A fast algorithm for training support vector machines. Advances in Kernel Methods-Support Vector learning. Cambridge, MA MIT Press, pp. 185–208

  11. Joachims T (2006) Training linear SVMs in linear time. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp 217–226

  12. Fan RE, Chang KW, Hsieh CJ et al (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9:1871–1874. https://doi.org/10.1145/1390681.1442794

    Article  MATH  Google Scholar 

  13. Smola AJ, Vishwanathan SVN, Le QV (2007) Bundle methods for machine learning. In: Proceedings of the 20th International Conference on Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, USA. pp 1377–1384

  14. Bottou L (2012) Stochastic gradient descent tricks. pp 421–436

  15. Bottou L, Curtis FE, Nocedal J (2018) Optimization methods for large-scale machine learning. SIAM Rev 60:223–311

    Article  MathSciNet  Google Scholar 

  16. Nguyen LM, Nguyen PH, Richtárik P et al (2019) New convergence aspects of stochastic gradient algorithms. J Mach Learn Res 20:1–49

    MathSciNet  MATH  Google Scholar 

  17. Kivinen J, Smola AJ, Williamson RC (2004) Online learning with kernels. IEEE Trans Signal Process 52:2165–2176. https://doi.org/10.1109/TSP.2004.830991

    Article  MathSciNet  MATH  Google Scholar 

  18. Shalev-Shwartz S, Singer Y, Srebro N, Cotter A (2011) Pegasos: primal estimated sub-gradient solver for SVM. Math Program 127:3–30. https://doi.org/10.1007/s10107-010-0420-4

    Article  MathSciNet  MATH  Google Scholar 

  19. Bordes A, Bottou L, Gallinari P (2009) SGD-QN: Careful quasi-newton stochastic gradient descent. J Mach Learn Res 10:1737–1754

  20. Takáč M, Bijral A, Richtárik P, Srebro N (2013) Mini-batch primal and dual methods for SVMs. In: 30th International Conference on Machine Learning, ICML 2013. pp 2059–2067

  21. Wang Z, Djuric N, Crammer K, Vucetic S (2011) Trading representability for scalability: adaptive multi-hyperplane machine for nonlinear classification. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp 24–32

  22. Djuric N, Wang Z, Vucetic S (2020) Growing adaptive multi-hyperplane machines. In: III HD, Singh A (eds) Proceedings of the 37th International Conference on Machine Learning. PMLR, Virtual. pp 2567–2576

  23. Wang Z, Crammer K, Vucetic S (2012) Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale SVM training. J Mach Learn Res 13:3103–3131

    MathSciNet  MATH  Google Scholar 

  24. Cheung IWT and JTK and P-M (2005) Core vector machines: fast SVM training on very large data sets. J Mach Learn Res 6:363–392

    MathSciNet  Google Scholar 

  25. Wang S, Wang J, Chung F (2014) Kernel density estimation, kernel methods, and fast learning in large data sets. IEEE Trans Cybern 44:1–20. https://doi.org/10.1109/TSMCB.2012.2236828

    Article  Google Scholar 

  26. Ding S, Nie X, Qiao H, Zhang B (2018) A fast algorithm of convex hull vertices selection for online classification. IEEE Trans Neural Netw Learn Syst 29:792–806. https://doi.org/10.1109/TNNLS.2017.2648038

    Article  MathSciNet  Google Scholar 

  27. Gu X, Chung F, Wang S (2018) Fast convex-hull vector machine for training on large-scale ncRNA data classification tasks. Knowl Based Syst 151:149–164. https://doi.org/10.1016/j.knosys.2018.03.029

    Article  Google Scholar 

  28. Graf HP, Cosatto E, Bottou L, et al (2005) Parallel support vector machines: the cascade SVM. In: Advances in neural information processing systems

  29. Haferlach T, Kohlmann A, Wieczorek L et al (2010) Clinical utility of microarray-based gene expression profiling in the diagnosis and subclassification of leukemia: report from the International Microarray Innovations in Leukemia Study Group. J Clin Oncol Off J Am Soc Clin Oncol 28:2529–2537. https://doi.org/10.1200/JCO.2009.23.4732

    Article  Google Scholar 

  30. Urabe F, Matsuzaki J, Yamamoto Y et al (2019) Large-scale Circulating microRNA Profiling for the liquid biopsy of prostate cancer. Clin Cancer Res Off J Am Assoc Cancer Res 25:3016–3025. https://doi.org/10.1158/1078-0432.CCR-18-2849

    Article  Google Scholar 

  31. Noble CL, Abbas AR, Cornelius J et al (2008) Regional variation in gene expression in the healthy colon is dysregulated in ulcerative colitis. Gut 57:1398–1405. https://doi.org/10.1136/gut.2008.148395

    Article  Google Scholar 

  32. Pellagatti A, Cazzola M, Giagounidis A et al (2010) Deregulated gene expression pathways in myelodysplastic syndrome hematopoietic stem cells. Leukemia 24:756–764. https://doi.org/10.1038/leu.2010.31

    Article  Google Scholar 

  33. Kumar M, Kumar Rath S (2015) Classification of microarray using MapReduce based proximal support vector machine classifier. Knowl Based Syst 89:584–602. https://doi.org/10.1016/j.knosys.2015.09.005

    Article  Google Scholar 

  34. Kumar M, Rath NK, Rath SK (2016) Analysis of microarray leukemia data using an efficient MapReduce-based K-nearest-neighbor classifier. J Biomed Inform 60:395–409. https://doi.org/10.1016/j.jbi.2016.03.002

    Article  Google Scholar 

  35. Baliarsingh SK, Vipsita S, Gandomi AH et al (2020) Analysis of high-dimensional genomic data using MapReduce based probabilistic neural network. Comput Methods Programs Biomed. https://doi.org/10.1016/j.cmpb.2020.105625

    Article  Google Scholar 

  36. Liu S, Mocanu DC, Matavalam ARR et al (2021) Sparse evolutionary deep learning with over one million artificial neurons on commodity hardware. Neural Comput Appl 33:2589–2604. https://doi.org/10.1007/s00521-020-05136-7

    Article  Google Scholar 

Download references

Acknowledgements

This work was partially supported by the funding of National Natural Science Foundation of China (No. 62066001), National Natural Science Youth Science Foundation of China (No. 61907012), Natural Science Foundation of Ningxia (No. 2021AAC03230), and North Minzu University Major special projects: 201804. Authors are grateful to all the reviewers and Editor-in-Chief for their insightful comments on this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chuandong Qin.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, B., Han, B. & Qin, C. Application of large-scale L2-SVM for microarray classification. J Supercomput 78, 2265–2286 (2022). https://doi.org/10.1007/s11227-021-03962-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-021-03962-7

Keywords

Navigation