Dimension Reduction vs. Variable Selection

Hartmann, Wolfgang M.

doi:10.1007/11558958_113

Wolfgang M. Hartmann¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3732))

Included in the following conference series:

International Workshop on Applied Parallel Computing

1734 Accesses
5 Citations

Abstract

The paper clarifies the difference between dimension reduction and variable selection methods in statistics and data mining. Traditional and recent modeling methods are listed and a typical approach to variable selection is mentioned. In addition, the need for and types of cross validation in modeling is sketched.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Anderberg, M.R.: Cluster Analysis for Applications. Academic Press, Inc, New York (1973)
MATH Google Scholar
Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Statist. Soc., Ser. B, 289–300 (1995)
Google Scholar
Bi, J., Bennett, K., Embrechts, M., Breneman, C., Song, M.: Dimensionality Reduction via Sparse Support Vector Machines. Journal of Machine Learning Research 1, 1–48 (2002)
Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, J.: Classification and Regression Trees. Wadsworth (1984)
Google Scholar
Breiman, L.: Better subset selection using the non-negative garotte; TechnicalReport, Univ. of California, Berkeley (1993)
Google Scholar
Carroll, J.D., Arabie, P.: Multidimensional scaling. In: Birnbaum, M.H. (ed.) Handbook of Perception and Cognition: Measurement, Judgment and Decision Making, pp. 179–250. Academic Press, San Diego (1998)
Google Scholar
Chen, S.S., Donoho, D.L., Saunders, M.: Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing 20, 33–61 (1999)
Article MATH MathSciNet Google Scholar
Donoho, D., Johnstone, I., Tibshirani, R.: Wavelet shrinkage: asymptotia (with discussion). J. R. Statist. Soc., Ser. B 57, 301–337 (1995)
MATH Google Scholar
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least Angle Regression. The Annals of Statistics 32, 407–499 (2002)
MathSciNet Google Scholar
Fung, G., Mangasarian, O.L.: A Feature Selection Newton Method for Support Vector Machine Classification. Computational Optimization and Aplications, 1–18 (2003)
Google Scholar
Gifi, A.: Nonlinear Multivariate Analysis, Dep. of Data Theory, Univ. of Leiden (1981)
Google Scholar
Greenacre, M.J.: Theory and Applications of Correspondence Analysis. Academic Press, London (1988)
Google Scholar
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene Selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)
Article MATH Google Scholar
Harman, H.H.: Modern Factor Analysis. University of Chicago Press, Chicago (1976)
Google Scholar
Hartmann, W.: CMAT Users Manual (1995), see http://www.cmat.pair.com/cmat
Joachims, T.: Making large-scale SVM learning practical. In: Schölkopf, B., Burges, C.J.C., Smola, A.J. (eds.) Advances in Kernel Methods: Support Vector Learning, MIT Press, Cambridge (1999)
Google Scholar
Kaufman, L., Rousseeuw, P.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990)
Google Scholar
Li, K.C.: Sliced inverse regression for dimension reduction. JASA 86, 316–342 (1991)
MATH Google Scholar
Li, K.C.: On principal Hessian directions for data visualization and dimension reduction. JASA 87, 1025–1034 (1992)
MATH Google Scholar
Mangasarian, O.L., Wild, E.W.: Feature Selection in k-Median Clustering, Technical Report 04-01, Data Mining Institute, Madison: University of Wisconsin (2004)
Google Scholar
McCabe, G.P.: Principal variables. Technometrics 26, 139–144 (1984)
Article MathSciNet Google Scholar
Miller, A.: Subset Selection in Regression. CRC Press, Chapman & Hall (2002)
Google Scholar
Mulaik, S.A.: The Foundations of Factor Analysis. Mc Graw Hill, New York (1972)
Google Scholar
Osborne, M.R., Presnell, B., Turlach, B.A.: On the LASSO and its Dual. JCGS 9, 319–337 (2000)
MathSciNet Google Scholar
Osborne, M.R., Presnell, B., Turlach, B.A.: A new approach to variable selection in least squares problems. IMA Journal of Numerical Analysis 20, 389–404 (2000)
Article MATH MathSciNet Google Scholar
R Language and packages see: http://www.r-project.org/ and http://cran.r-project.org/
Ripley, B.D.: Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge (1996)
MATH Google Scholar
Rosipal, R., Trejo, L.J.: Kernel partial least squares regression in reproducing kernel Hilbert space. Journal of Machine Learning Research 2, 97–123 (2001)
Article Google Scholar
SAS/STAT User’s Guide, Version 6, Second Printing, SAS Institute Inc., Cary, NC (1990)
Google Scholar
Schölkopf, B., Smola, A.J.: Learning with Kernels. MIT Press, Cambridge (2002)
Google Scholar
Somerville, P.N.: Step-down FDR Procedures for large numbers of hypotheses. In: Dongarra, J., Madsen, K., Waśniewski, J. (eds.) PARA 2004. LNCS, vol. 3732, pp. 949–956. Springer, Heidelberg (2006)
Chapter Google Scholar
Somerville, P.N.: FORTRAN90 and SAS-IML programs for computation of critical values for multiple testing and simultaneous confidence intervals. Journal of Statistical Software (2001)
Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Statist. Soc., Ser. B 58, 267–288 (1996)
MATH MathSciNet Google Scholar
Tibshirani, R., Hastie, T., Narasimhan, B., Chu, G.: Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proceeding of the National Academy of Sciences 99, 6567–6572 (2002)
Article Google Scholar
Weisberg, S.: Dimension reduction regression with R. JSS, 7 (2002)
Google Scholar
Weston, J., Mukherje, S., Chapelle, O., Pontil, M., Poggio, T., Vapnik, V.: Feature Selection for SVMs. Neural Information Processing Systems 13, 668–674 (2000)
Google Scholar
Vapnik, V.N.: The Nature of Statistical Learning. Springer, New York (1995)
MATH Google Scholar
Wold, H.: Estimation of principal components and related models by iterative least squares. In: Multivariate Analysis, Academic Press, New York (1966)
Google Scholar
Yang, J., Honavar, V.: Feature selection using a genetic algorithm, Technical Report, Iowa State University (1997)
Google Scholar
Zou, H., Hastie, T.: Regression shrinkage and selection via the elastic net, with applications to micro arrays, Technical Report, Stanford University (2003)
Google Scholar
Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. Technical Report, Stanford University (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

SAS Institute, Inc, Cary, NC, USA
Wolfgang M. Hartmann

Authors

Wolfgang M. Hartmann
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department, University of Tennessee, 37996-3450, Knoxville, TN, USA
Jack Dongarra
Department of Informatics and Mathematical Modelling, Technical University of Denmark, DK-2800, Lyngby, Denmark
Kaj Madsen
Informatics & Mathematical Modeling, Technical University of Denmark, DK-2800, Lyngby, Denmark
Jerzy Waśniewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hartmann, W.M. (2006). Dimension Reduction vs. Variable Selection. In: Dongarra, J., Madsen, K., Waśniewski, J. (eds) Applied Parallel Computing. State of the Art in Scientific Computing. PARA 2004. Lecture Notes in Computer Science, vol 3732. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11558958_113

Download citation

DOI: https://doi.org/10.1007/11558958_113
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29067-4
Online ISBN: 978-3-540-33498-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics