Abstract
Stability of feature selection is an important issue in knowledge discovery from high-dimensional data. A key factor affecting the stability of a feature selection algorithm is the sample size of training set. To alleviate the problem of small sample size in high-dimensional data, we propose a novel framework of margin based sample weighting which extensively explores the available samples. Specifically, it exploits the discrepancy among local profiles of feature importance at various samples and weights a sample according to the outlying degree of its local profile of feature importance. We also develop an efficient algorithm under the framework. Experiments on a set of public microarray datasets demonstrate that the proposed algorithm is effective at improving the stability of state-of-the-art feature selection algorithms, while maintaining comparable classification accuracy on selected features.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alon, U., Barkai, N., Notterman, D.A., et al.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl Acad. Sci. USA 96, 6745–6750 (1999)
Cortes, C., Vapnik, V.: Support vector networks. Machine Learning 20, 273–297 (1995)
Crammer, K., Gilad-Bachrach, R., Navot, A.: Margin analysis of the LVQ algorithm. In: Proceedings of the 17th Conference on Neural Information Processing Systems, pp. 462–469 (2002)
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. Computer Systems and Science 55(1), 119–139 (1997)
Gilad-Bachrach, R., Navot, A., Tishby, N.: Margin based feature selection: theory and algorithms. In: Proceedings of the 21st International Conference on Machine learning (2004)
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Gordon, G.J., Jensen, R.V., Hsiaoand, L., et al.: Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Research 62, 4963–4967 (2002)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)
Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms: a study on high-dimensional spaces. Knowledge and Information Systems 12, 95–116 (2007)
Krizek, P., Kittler, J., Hlavac, V.: Improving stability of feature selection methods. In: Proceedings of the 12th International Conference on Computer Analysis of Images and Patterns, pp. 929–936 (2007)
Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering (TKDE) 17(4), 491–502 (2005)
Loscalzo, S., Yu, L., Ding, C.: Consensus group based stable feature selection. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2009), pp. 567–576 (2009)
Pepe, M.S., Etzioni, R., Feng, Z., Potter, J.D., Thompson, M.L., Thornquist, M., Winget, M., Yasui, Y.: Phases of biomarker development for early detection of cancer. J. Natl. Cancer Inst. 93, 1054–1060 (2001)
Petricoin, E.F., Ardekani, A.M., Hitt, B.A., Levine, P.J., Fusaro, V.A., Steinberg, S.M., Mills, G.B., Simone, C., Fishman, D.A., Kohn, E.C., Liotta, L.A.: Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359, 572–577 (2002)
Petricoin, E.F., et al.: Serum proteomic patterns for detection of prostate cancer. J. Natl. Cancer Inst. 94(20) (2002)
Robnik-Sikonja, M., Kononenko, I.: Theoretical and empirical analysis of Relief and ReliefF. Machine Learning 53, 23–69 (2003)
Saeys, Y., Abeel, T., Peer, Y.V.: Robust feature selection using ensemble feature selection techniques. In: Proceedings of the ECML Confernce, pp. 313–325 (2008)
Singh, D., et al.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell. 2(2) (2002)
Witten, I.H., Frank, E.: Data Mining - Pracitcal Machine Learning Tools and Techniques. Morgan Kaufmann Publishers, San Francisco (2005)
Yu, L., Ding, C., Loscalzo, S.: Stable feature selection via dense feature groups. In: Proceedings of the 14th ACM International Conference on Knowledge Discovery and Data Mining (KDD 2008), pp. 803–811 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Han, Y., Yu, L. (2010). Margin Based Sample Weighting for Stable Feature Selection. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds) Web-Age Information Management. WAIM 2010. Lecture Notes in Computer Science, vol 6184. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14246-8_65
Download citation
DOI: https://doi.org/10.1007/978-3-642-14246-8_65
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14245-1
Online ISBN: 978-3-642-14246-8
eBook Packages: Computer ScienceComputer Science (R0)