Abstract
Multiple kernel learning (MKL) for feature selection utilizes kernels to explore complex properties of features, which has been shown to be among the most effective for feature selection. To perform feature selection, a natural way is to use the l0-norm to get sparse solutions. However, the optimization problem involving l0-norm is NP-hard. Therefore, previous MKL methods typically utilize a l1-norm to get sparse kernel combinations. However, the l1-norm, as a convex approximation of l0-norm, sometimes cannot attain the desired solution of the l0-norm regularizer problem and may lead to prediction accuracy loss. In contrast, various non-convex approximations of l0-norm have been proposed and perform better in many linear feature selection methods. In this paper, we propose a novel l0-norm based MKL method (l0-MKL) for feature selection with non-convex approximations constraint on kernel combination coefficients to select features automatically. Considering the better empirical performance of indefinite kernels than positive kernels, our l0-MKL is built on the primal form of multiple indefinite kernel learning for feature selection. The non-convex optimization problem of l0-MKL is further refumated as a difference of convex functions (DC) programming and solved by DC algorithm (DCA). Experiments on real-world datasets demonstrate that l0-MKL is superior to some related state-of-the-art methods in both feature selection and classification performance.
Similar content being viewed by others
References
Bradley PS, Mangasarian OL (1998) Feature selection via concave minimization and support vector machines. In: Proceedings of fifteenth international conference on machine learning, vol 98, pp 82–90
Candes EJ, Wakin MB, Boyd S (2008) Enhancing sparsity by reweighted L1 minimization. J Fourier Anal Appl 14(5–6):877–905
Chen Z, Li J, Wei L (2007) A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue. Artif Intell Med 41(2): 161–175
Dileep AD, Sekhar CC (2009) Representation and feature selection using multiple kernel learning. In: Proceedings of the twenty-second international joint conference on neural networks. IEEE, pp 717–722
Dinh TP, Le Thi HA (2014) Recent advances in dc programming and dca. In: Transactions on computational intelligence XIII. Springer, pp 1–37
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360
Grant M, Boyd S, Ye Y (2008) Cvx: Matlab software for disciplined convex programming
Gribonval R, Nielsen M (2003) Sparse representations in unions of bases. IEEE Trans Inf Theory 49 (12):3320–3325
Hao Z, Yuan G, Yang X, Chen Z (2013) A primal method for multiple kernel learning. Neural Comput Appl 23(3-4):975–987
Le Thi HA, Dinh TP, Le HM, Vo XT (2015) Dc approximation approaches for sparse optimization. Eur J Oper Res 244(1): 26–46
Le Thi HA, Le HM, Dinh TP (2015) Feature selection in machine learning: an exact penalty approach using a difference of convex function algorithm. Mach Learn 101(1–3):163–186
Le Thi HA, Le HM, Dinh TP et al (2008) A dc programming approach for feature selection in support vector machines learning. Adv Data Anal Classif 2(3):259–278
Le Thi HA, Ouchani S et al (2008) Gene selection for cancer classification using dca. In: International conference on advanced data mining and applications. Springer, pp 62–72
López J, Maldonado S, Carrasco M (2018) Double regularization methods for robust feature selection and svm classification via dc programming. Inf Sci 429:377–389
Luo D, Ding C, Huang H (2010) Towards structural sparsity: an explicit l2/l0 approach. In: 2010 IEEE international conference on data mining, pp 344–353
Neumann J, Schnörr C, Steidl G (2005) Combined svm-based feature selection and classification. Mach Learn 61(1–3):129–150
Ong CS, An LTH (2013) Learning sparse classifiers with difference of convex functions algorithms. Optim Methods Softw 28(4):830–854
Peleg D, Meir R (2008) A bilinear formulation for vector sparsity optimization. Signal Process 88(2):375–389
Rinaldi F, Schoen F, Sciandrone M (2010) Concave programming for minimizing the zero-norm over polyhedral sets. Comput Optim Appl 46(3):467–486
Tao PD, An LTH (1997) Convex analysis approach to dc programming: theory, algorithms and applications. Acta Math Vietnam 22(1):289–355
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc: Ser B Methodol 58 (1):267–288
Varma M, Babu BR (2009) More generality in efficient multiple kernel learning. In: Proceedings of the twenty-sixth annual international conference on machine learning. ACM, pp 1065–1072
Wang F, Cao W, Xu Z (2018) Convergence of multi-block bregman admm for nonconvex composite problems. Sci China Inf Sci 61(12):122101
Xu HM, Xue H, Chen X, Wang Y (2017) Solving indefinite kernel support vector machine with difference of convex functions programming. In: AAAI, pp 2782–2788
Xue H, Song Y, Xu HM (2017) Multiple indefinite kernel learning for feature selection. In: Proceedings of the 26th international joint conference on artificial intelligence. AAAI Press, pp 3210–3216
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc 67(2):301–320
Acknowledgments
This work was supported by the National Key R&D Program of China (Grant No. 2017YFB1002801) and National Natural Science Foundation of China (Grant Nos. 61375057, 61876091). Furthermore, the work was also supported by Collaborative Innovation Center of Wireless Communications Technology.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Xue, H., Song, Y. Non-convex approximation based l0-norm multiple indefinite kernel feature selection. Appl Intell 50, 192–202 (2020). https://doi.org/10.1007/s10489-018-01407-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-018-01407-y