Skip to main content

Margin Based Sample Weighting for Stable Feature Selection

  • Conference paper
Web-Age Information Management (WAIM 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6184))

Included in the following conference series:

  • 1702 Accesses

Abstract

Stability of feature selection is an important issue in knowledge discovery from high-dimensional data. A key factor affecting the stability of a feature selection algorithm is the sample size of training set. To alleviate the problem of small sample size in high-dimensional data, we propose a novel framework of margin based sample weighting which extensively explores the available samples. Specifically, it exploits the discrepancy among local profiles of feature importance at various samples and weights a sample according to the outlying degree of its local profile of feature importance. We also develop an efficient algorithm under the framework. Experiments on a set of public microarray datasets demonstrate that the proposed algorithm is effective at improving the stability of state-of-the-art feature selection algorithms, while maintaining comparable classification accuracy on selected features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alon, U., Barkai, N., Notterman, D.A., et al.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl Acad. Sci. USA 96, 6745–6750 (1999)

    Article  Google Scholar 

  2. Cortes, C., Vapnik, V.: Support vector networks. Machine Learning 20, 273–297 (1995)

    MATH  Google Scholar 

  3. Crammer, K., Gilad-Bachrach, R., Navot, A.: Margin analysis of the LVQ algorithm. In: Proceedings of the 17th Conference on Neural Information Processing Systems, pp. 462–469 (2002)

    Google Scholar 

  4. Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. Computer Systems and Science 55(1), 119–139 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  5. Gilad-Bachrach, R., Navot, A., Tishby, N.: Margin based feature selection: theory and algorithms. In: Proceedings of the 21st International Conference on Machine learning (2004)

    Google Scholar 

  6. Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)

    Article  Google Scholar 

  7. Gordon, G.J., Jensen, R.V., Hsiaoand, L., et al.: Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Research 62, 4963–4967 (2002)

    Google Scholar 

  8. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)

    Article  MATH  Google Scholar 

  9. Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms: a study on high-dimensional spaces. Knowledge and Information Systems 12, 95–116 (2007)

    Article  Google Scholar 

  10. Krizek, P., Kittler, J., Hlavac, V.: Improving stability of feature selection methods. In: Proceedings of the 12th International Conference on Computer Analysis of Images and Patterns, pp. 929–936 (2007)

    Google Scholar 

  11. Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering (TKDE) 17(4), 491–502 (2005)

    Article  Google Scholar 

  12. Loscalzo, S., Yu, L., Ding, C.: Consensus group based stable feature selection. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2009), pp. 567–576 (2009)

    Google Scholar 

  13. Pepe, M.S., Etzioni, R., Feng, Z., Potter, J.D., Thompson, M.L., Thornquist, M., Winget, M., Yasui, Y.: Phases of biomarker development for early detection of cancer. J. Natl. Cancer Inst. 93, 1054–1060 (2001)

    Article  Google Scholar 

  14. Petricoin, E.F., Ardekani, A.M., Hitt, B.A., Levine, P.J., Fusaro, V.A., Steinberg, S.M., Mills, G.B., Simone, C., Fishman, D.A., Kohn, E.C., Liotta, L.A.: Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359, 572–577 (2002)

    Article  Google Scholar 

  15. Petricoin, E.F., et al.: Serum proteomic patterns for detection of prostate cancer. J. Natl. Cancer Inst. 94(20) (2002)

    Google Scholar 

  16. Robnik-Sikonja, M., Kononenko, I.: Theoretical and empirical analysis of Relief and ReliefF. Machine Learning 53, 23–69 (2003)

    Article  MATH  Google Scholar 

  17. Saeys, Y., Abeel, T., Peer, Y.V.: Robust feature selection using ensemble feature selection techniques. In: Proceedings of the ECML Confernce, pp. 313–325 (2008)

    Google Scholar 

  18. Singh, D., et al.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell. 2(2) (2002)

    Google Scholar 

  19. Witten, I.H., Frank, E.: Data Mining - Pracitcal Machine Learning Tools and Techniques. Morgan Kaufmann Publishers, San Francisco (2005)

    Google Scholar 

  20. Yu, L., Ding, C., Loscalzo, S.: Stable feature selection via dense feature groups. In: Proceedings of the 14th ACM International Conference on Knowledge Discovery and Data Mining (KDD 2008), pp. 803–811 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Han, Y., Yu, L. (2010). Margin Based Sample Weighting for Stable Feature Selection. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds) Web-Age Information Management. WAIM 2010. Lecture Notes in Computer Science, vol 6184. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14246-8_65

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14246-8_65

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14245-1

  • Online ISBN: 978-3-642-14246-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics