Skip to main content
Log in

Non-convex approximation based l0-norm multiple indefinite kernel feature selection

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Multiple kernel learning (MKL) for feature selection utilizes kernels to explore complex properties of features, which has been shown to be among the most effective for feature selection. To perform feature selection, a natural way is to use the l0-norm to get sparse solutions. However, the optimization problem involving l0-norm is NP-hard. Therefore, previous MKL methods typically utilize a l1-norm to get sparse kernel combinations. However, the l1-norm, as a convex approximation of l0-norm, sometimes cannot attain the desired solution of the l0-norm regularizer problem and may lead to prediction accuracy loss. In contrast, various non-convex approximations of l0-norm have been proposed and perform better in many linear feature selection methods. In this paper, we propose a novel l0-norm based MKL method (l0-MKL) for feature selection with non-convex approximations constraint on kernel combination coefficients to select features automatically. Considering the better empirical performance of indefinite kernels than positive kernels, our l0-MKL is built on the primal form of multiple indefinite kernel learning for feature selection. The non-convex optimization problem of l0-MKL is further refumated as a difference of convex functions (DC) programming and solved by DC algorithm (DCA). Experiments on real-world datasets demonstrate that l0-MKL is superior to some related state-of-the-art methods in both feature selection and classification performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. http://featureselection.asu.edu/datasets.php

  2. http://datam.i2r.a-star.edu.sg/datasets/krbd/

References

  1. Bradley PS, Mangasarian OL (1998) Feature selection via concave minimization and support vector machines. In: Proceedings of fifteenth international conference on machine learning, vol 98, pp 82–90

  2. Candes EJ, Wakin MB, Boyd S (2008) Enhancing sparsity by reweighted L1 minimization. J Fourier Anal Appl 14(5–6):877–905

    Article  MathSciNet  Google Scholar 

  3. Chen Z, Li J, Wei L (2007) A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue. Artif Intell Med 41(2): 161–175

    Article  Google Scholar 

  4. Dileep AD, Sekhar CC (2009) Representation and feature selection using multiple kernel learning. In: Proceedings of the twenty-second international joint conference on neural networks. IEEE, pp 717–722

  5. Dinh TP, Le Thi HA (2014) Recent advances in dc programming and dca. In: Transactions on computational intelligence XIII. Springer, pp 1–37

  6. Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360

    Article  MathSciNet  Google Scholar 

  7. Grant M, Boyd S, Ye Y (2008) Cvx: Matlab software for disciplined convex programming

  8. Gribonval R, Nielsen M (2003) Sparse representations in unions of bases. IEEE Trans Inf Theory 49 (12):3320–3325

    Article  MathSciNet  Google Scholar 

  9. Hao Z, Yuan G, Yang X, Chen Z (2013) A primal method for multiple kernel learning. Neural Comput Appl 23(3-4):975–987

    Article  Google Scholar 

  10. Le Thi HA, Dinh TP, Le HM, Vo XT (2015) Dc approximation approaches for sparse optimization. Eur J Oper Res 244(1): 26–46

  11. Le Thi HA, Le HM, Dinh TP (2015) Feature selection in machine learning: an exact penalty approach using a difference of convex function algorithm. Mach Learn 101(1–3):163–186

  12. Le Thi HA, Le HM, Dinh TP et al (2008) A dc programming approach for feature selection in support vector machines learning. Adv Data Anal Classif 2(3):259–278

    Article  MathSciNet  Google Scholar 

  13. Le Thi HA, Ouchani S et al (2008) Gene selection for cancer classification using dca. In: International conference on advanced data mining and applications. Springer, pp 62–72

  14. López J, Maldonado S, Carrasco M (2018) Double regularization methods for robust feature selection and svm classification via dc programming. Inf Sci 429:377–389

    Article  MathSciNet  Google Scholar 

  15. Luo D, Ding C, Huang H (2010) Towards structural sparsity: an explicit l2/l0 approach. In: 2010 IEEE international conference on data mining, pp 344–353

  16. Neumann J, Schnörr C, Steidl G (2005) Combined svm-based feature selection and classification. Mach Learn 61(1–3):129–150

    Article  Google Scholar 

  17. Ong CS, An LTH (2013) Learning sparse classifiers with difference of convex functions algorithms. Optim Methods Softw 28(4):830–854

    Article  MathSciNet  Google Scholar 

  18. Peleg D, Meir R (2008) A bilinear formulation for vector sparsity optimization. Signal Process 88(2):375–389

    Article  Google Scholar 

  19. Rinaldi F, Schoen F, Sciandrone M (2010) Concave programming for minimizing the zero-norm over polyhedral sets. Comput Optim Appl 46(3):467–486

    Article  MathSciNet  Google Scholar 

  20. Tao PD, An LTH (1997) Convex analysis approach to dc programming: theory, algorithms and applications. Acta Math Vietnam 22(1):289–355

    MathSciNet  MATH  Google Scholar 

  21. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc: Ser B Methodol 58 (1):267–288

    MathSciNet  MATH  Google Scholar 

  22. Varma M, Babu BR (2009) More generality in efficient multiple kernel learning. In: Proceedings of the twenty-sixth annual international conference on machine learning. ACM, pp 1065–1072

  23. Wang F, Cao W, Xu Z (2018) Convergence of multi-block bregman admm for nonconvex composite problems. Sci China Inf Sci 61(12):122101

    Article  MathSciNet  Google Scholar 

  24. Xu HM, Xue H, Chen X, Wang Y (2017) Solving indefinite kernel support vector machine with difference of convex functions programming. In: AAAI, pp 2782–2788

  25. Xue H, Song Y, Xu HM (2017) Multiple indefinite kernel learning for feature selection. In: Proceedings of the 26th international joint conference on artificial intelligence. AAAI Press, pp 3210–3216

  26. Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc 67(2):301–320

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

This work was supported by the National Key R&D Program of China (Grant No. 2017YFB1002801) and National Natural Science Foundation of China (Grant Nos. 61375057, 61876091). Furthermore, the work was also supported by Collaborative Innovation Center of Wireless Communications Technology.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hui Xue.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xue, H., Song, Y. Non-convex approximation based l0-norm multiple indefinite kernel feature selection. Appl Intell 50, 192–202 (2020). https://doi.org/10.1007/s10489-018-01407-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-018-01407-y

Keywords

Navigation