Non-convex approximation based l0-norm multiple indefinite kernel feature selection

Xue, Hui; Song, Yu

doi:10.1007/s10489-018-01407-y

Non-convex approximation based l₀-norm multiple indefinite kernel feature selection

Published: 16 July 2019

Volume 50, pages 192–202, (2020)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Hui Xue^1,2 &
Yu Song^1,2

368 Accesses
2 Citations
Explore all metrics

Abstract

Multiple kernel learning (MKL) for feature selection utilizes kernels to explore complex properties of features, which has been shown to be among the most effective for feature selection. To perform feature selection, a natural way is to use the l₀-norm to get sparse solutions. However, the optimization problem involving l₀-norm is NP-hard. Therefore, previous MKL methods typically utilize a l₁-norm to get sparse kernel combinations. However, the l₁-norm, as a convex approximation of l₀-norm, sometimes cannot attain the desired solution of the l₀-norm regularizer problem and may lead to prediction accuracy loss. In contrast, various non-convex approximations of l₀-norm have been proposed and perform better in many linear feature selection methods. In this paper, we propose a novel l₀-norm based MKL method (l₀-MKL) for feature selection with non-convex approximations constraint on kernel combination coefficients to select features automatically. Considering the better empirical performance of indefinite kernels than positive kernels, our l₀-MKL is built on the primal form of multiple indefinite kernel learning for feature selection. The non-convex optimization problem of l₀-MKL is further refumated as a difference of convex functions (DC) programming and solved by DC algorithm (DCA). Experiments on real-world datasets demonstrate that l₀-MKL is superior to some related state-of-the-art methods in both feature selection and classification performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

l p -norm Multiple Kernel Learning with Diversity of Classes

Efficient Mixed-Norm Multiple Kernel Learning

A Novel Multiple Kernel Learning Method Based on the Kullback–Leibler Divergence

Article 25 October 2014

Notes

References

Bradley PS, Mangasarian OL (1998) Feature selection via concave minimization and support vector machines. In: Proceedings of fifteenth international conference on machine learning, vol 98, pp 82–90
Candes EJ, Wakin MB, Boyd S (2008) Enhancing sparsity by reweighted L1 minimization. J Fourier Anal Appl 14(5–6):877–905
Article MathSciNet Google Scholar
Chen Z, Li J, Wei L (2007) A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue. Artif Intell Med 41(2): 161–175
Article Google Scholar
Dileep AD, Sekhar CC (2009) Representation and feature selection using multiple kernel learning. In: Proceedings of the twenty-second international joint conference on neural networks. IEEE, pp 717–722
Dinh TP, Le Thi HA (2014) Recent advances in dc programming and dca. In: Transactions on computational intelligence XIII. Springer, pp 1–37
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360
Article MathSciNet Google Scholar
Grant M, Boyd S, Ye Y (2008) Cvx: Matlab software for disciplined convex programming
Gribonval R, Nielsen M (2003) Sparse representations in unions of bases. IEEE Trans Inf Theory 49 (12):3320–3325
Article MathSciNet Google Scholar
Hao Z, Yuan G, Yang X, Chen Z (2013) A primal method for multiple kernel learning. Neural Comput Appl 23(3-4):975–987
Article Google Scholar
Le Thi HA, Dinh TP, Le HM, Vo XT (2015) Dc approximation approaches for sparse optimization. Eur J Oper Res 244(1): 26–46
Le Thi HA, Le HM, Dinh TP (2015) Feature selection in machine learning: an exact penalty approach using a difference of convex function algorithm. Mach Learn 101(1–3):163–186
Le Thi HA, Le HM, Dinh TP et al (2008) A dc programming approach for feature selection in support vector machines learning. Adv Data Anal Classif 2(3):259–278
Article MathSciNet Google Scholar
Le Thi HA, Ouchani S et al (2008) Gene selection for cancer classification using dca. In: International conference on advanced data mining and applications. Springer, pp 62–72
López J, Maldonado S, Carrasco M (2018) Double regularization methods for robust feature selection and svm classification via dc programming. Inf Sci 429:377–389
Article MathSciNet Google Scholar
Luo D, Ding C, Huang H (2010) Towards structural sparsity: an explicit l2/l0 approach. In: 2010 IEEE international conference on data mining, pp 344–353
Neumann J, Schnörr C, Steidl G (2005) Combined svm-based feature selection and classification. Mach Learn 61(1–3):129–150
Article Google Scholar
Ong CS, An LTH (2013) Learning sparse classifiers with difference of convex functions algorithms. Optim Methods Softw 28(4):830–854
Article MathSciNet Google Scholar
Peleg D, Meir R (2008) A bilinear formulation for vector sparsity optimization. Signal Process 88(2):375–389
Article Google Scholar
Rinaldi F, Schoen F, Sciandrone M (2010) Concave programming for minimizing the zero-norm over polyhedral sets. Comput Optim Appl 46(3):467–486
Article MathSciNet Google Scholar
Tao PD, An LTH (1997) Convex analysis approach to dc programming: theory, algorithms and applications. Acta Math Vietnam 22(1):289–355
MathSciNet MATH Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc: Ser B Methodol 58 (1):267–288
MathSciNet MATH Google Scholar
Varma M, Babu BR (2009) More generality in efficient multiple kernel learning. In: Proceedings of the twenty-sixth annual international conference on machine learning. ACM, pp 1065–1072
Wang F, Cao W, Xu Z (2018) Convergence of multi-block bregman admm for nonconvex composite problems. Sci China Inf Sci 61(12):122101
Article MathSciNet Google Scholar
Xu HM, Xue H, Chen X, Wang Y (2017) Solving indefinite kernel support vector machine with difference of convex functions programming. In: AAAI, pp 2782–2788
Xue H, Song Y, Xu HM (2017) Multiple indefinite kernel learning for feature selection. In: Proceedings of the 26th international joint conference on artificial intelligence. AAAI Press, pp 3210–3216
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc 67(2):301–320
Article MathSciNet Google Scholar

Download references

Acknowledgments

This work was supported by the National Key R&D Program of China (Grant No. 2017YFB1002801) and National Natural Science Foundation of China (Grant Nos. 61375057, 61876091). Furthermore, the work was also supported by Collaborative Innovation Center of Wireless Communications Technology.

Author information

Authors and Affiliations

School of Computer Science and Engineering, Southeast University, Nanjing, 210096, China
Hui Xue & Yu Song
MOE Key Laboratory of Computer Network and Information Integration, Southeast University, Nanjing, China
Hui Xue & Yu Song

Authors

Hui Xue
View author publications
You can also search for this author in PubMed Google Scholar
Yu Song
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hui Xue.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xue, H., Song, Y. Non-convex approximation based l₀-norm multiple indefinite kernel feature selection. Appl Intell 50, 192–202 (2020). https://doi.org/10.1007/s10489-018-01407-y

Download citation

Published: 16 July 2019
Issue Date: January 2020
DOI: https://doi.org/10.1007/s10489-018-01407-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Non-convex approximation based l₀-norm multiple indefinite kernel feature selection

Abstract

Access this article

Similar content being viewed by others

l p -norm Multiple Kernel Learning with Diversity of Classes

Efficient Mixed-Norm Multiple Kernel Learning

A Novel Multiple Kernel Learning Method Based on the Kullback–Leibler Divergence

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Non-convex approximation based l0-norm multiple indefinite kernel feature selection

Abstract

Access this article

Similar content being viewed by others

l p -norm Multiple Kernel Learning with Diversity of Classes

Efficient Mixed-Norm Multiple Kernel Learning

A Novel Multiple Kernel Learning Method Based on the Kullback–Leibler Divergence

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

Non-convex approximation based l₀-norm multiple indefinite kernel feature selection