Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3732))

Included in the following conference series:

Abstract

The paper clarifies the difference between dimension reduction and variable selection methods in statistics and data mining. Traditional and recent modeling methods are listed and a typical approach to variable selection is mentioned. In addition, the need for and types of cross validation in modeling is sketched.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anderberg, M.R.: Cluster Analysis for Applications. Academic Press, Inc, New York (1973)

    MATH  Google Scholar 

  2. Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Statist. Soc., Ser. B, 289–300 (1995)

    Google Scholar 

  3. Bi, J., Bennett, K., Embrechts, M., Breneman, C., Song, M.: Dimensionality Reduction via Sparse Support Vector Machines. Journal of Machine Learning Research 1, 1–48 (2002)

    Google Scholar 

  4. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, J.: Classification and Regression Trees. Wadsworth (1984)

    Google Scholar 

  5. Breiman, L.: Better subset selection using the non-negative garotte; TechnicalReport, Univ. of California, Berkeley (1993)

    Google Scholar 

  6. Carroll, J.D., Arabie, P.: Multidimensional scaling. In: Birnbaum, M.H. (ed.) Handbook of Perception and Cognition: Measurement, Judgment and Decision Making, pp. 179–250. Academic Press, San Diego (1998)

    Google Scholar 

  7. Chen, S.S., Donoho, D.L., Saunders, M.: Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing 20, 33–61 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  8. Donoho, D., Johnstone, I., Tibshirani, R.: Wavelet shrinkage: asymptotia (with discussion). J. R. Statist. Soc., Ser. B 57, 301–337 (1995)

    MATH  Google Scholar 

  9. Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least Angle Regression. The Annals of Statistics 32, 407–499 (2002)

    MathSciNet  Google Scholar 

  10. Fung, G., Mangasarian, O.L.: A Feature Selection Newton Method for Support Vector Machine Classification. Computational Optimization and Aplications, 1–18 (2003)

    Google Scholar 

  11. Gifi, A.: Nonlinear Multivariate Analysis, Dep. of Data Theory, Univ. of Leiden (1981)

    Google Scholar 

  12. Greenacre, M.J.: Theory and Applications of Correspondence Analysis. Academic Press, London (1988)

    Google Scholar 

  13. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene Selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)

    Article  MATH  Google Scholar 

  14. Harman, H.H.: Modern Factor Analysis. University of Chicago Press, Chicago (1976)

    Google Scholar 

  15. Hartmann, W.: CMAT Users Manual (1995), see http://www.cmat.pair.com/cmat

  16. Joachims, T.: Making large-scale SVM learning practical. In: Schölkopf, B., Burges, C.J.C., Smola, A.J. (eds.) Advances in Kernel Methods: Support Vector Learning, MIT Press, Cambridge (1999)

    Google Scholar 

  17. Kaufman, L., Rousseeuw, P.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990)

    Google Scholar 

  18. Li, K.C.: Sliced inverse regression for dimension reduction. JASA 86, 316–342 (1991)

    MATH  Google Scholar 

  19. Li, K.C.: On principal Hessian directions for data visualization and dimension reduction. JASA 87, 1025–1034 (1992)

    MATH  Google Scholar 

  20. Mangasarian, O.L., Wild, E.W.: Feature Selection in k-Median Clustering, Technical Report 04-01, Data Mining Institute, Madison: University of Wisconsin (2004)

    Google Scholar 

  21. McCabe, G.P.: Principal variables. Technometrics 26, 139–144 (1984)

    Article  MathSciNet  Google Scholar 

  22. Miller, A.: Subset Selection in Regression. CRC Press, Chapman & Hall (2002)

    Google Scholar 

  23. Mulaik, S.A.: The Foundations of Factor Analysis. Mc Graw Hill, New York (1972)

    Google Scholar 

  24. Osborne, M.R., Presnell, B., Turlach, B.A.: On the LASSO and its Dual. JCGS 9, 319–337 (2000)

    MathSciNet  Google Scholar 

  25. Osborne, M.R., Presnell, B., Turlach, B.A.: A new approach to variable selection in least squares problems. IMA Journal of Numerical Analysis 20, 389–404 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  26. R Language and packages see: http://www.r-project.org/ and http://cran.r-project.org/

  27. Ripley, B.D.: Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge (1996)

    MATH  Google Scholar 

  28. Rosipal, R., Trejo, L.J.: Kernel partial least squares regression in reproducing kernel Hilbert space. Journal of Machine Learning Research 2, 97–123 (2001)

    Article  Google Scholar 

  29. SAS/STAT User’s Guide, Version 6, Second Printing, SAS Institute Inc., Cary, NC (1990)

    Google Scholar 

  30. Schölkopf, B., Smola, A.J.: Learning with Kernels. MIT Press, Cambridge (2002)

    Google Scholar 

  31. Somerville, P.N.: Step-down FDR Procedures for large numbers of hypotheses. In: Dongarra, J., Madsen, K., Waśniewski, J. (eds.) PARA 2004. LNCS, vol. 3732, pp. 949–956. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  32. Somerville, P.N.: FORTRAN90 and SAS-IML programs for computation of critical values for multiple testing and simultaneous confidence intervals. Journal of Statistical Software (2001)

    Google Scholar 

  33. Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Statist. Soc., Ser. B 58, 267–288 (1996)

    MATH  MathSciNet  Google Scholar 

  34. Tibshirani, R., Hastie, T., Narasimhan, B., Chu, G.: Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proceeding of the National Academy of Sciences 99, 6567–6572 (2002)

    Article  Google Scholar 

  35. Weisberg, S.: Dimension reduction regression with R. JSS, 7 (2002)

    Google Scholar 

  36. Weston, J., Mukherje, S., Chapelle, O., Pontil, M., Poggio, T., Vapnik, V.: Feature Selection for SVMs. Neural Information Processing Systems 13, 668–674 (2000)

    Google Scholar 

  37. Vapnik, V.N.: The Nature of Statistical Learning. Springer, New York (1995)

    MATH  Google Scholar 

  38. Wold, H.: Estimation of principal components and related models by iterative least squares. In: Multivariate Analysis, Academic Press, New York (1966)

    Google Scholar 

  39. Yang, J., Honavar, V.: Feature selection using a genetic algorithm, Technical Report, Iowa State University (1997)

    Google Scholar 

  40. Zou, H., Hastie, T.: Regression shrinkage and selection via the elastic net, with applications to micro arrays, Technical Report, Stanford University (2003)

    Google Scholar 

  41. Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. Technical Report, Stanford University (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hartmann, W.M. (2006). Dimension Reduction vs. Variable Selection. In: Dongarra, J., Madsen, K., Waśniewski, J. (eds) Applied Parallel Computing. State of the Art in Scientific Computing. PARA 2004. Lecture Notes in Computer Science, vol 3732. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11558958_113

Download citation

  • DOI: https://doi.org/10.1007/11558958_113

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29067-4

  • Online ISBN: 978-3-540-33498-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics