Abstract
The paper clarifies the difference between dimension reduction and variable selection methods in statistics and data mining. Traditional and recent modeling methods are listed and a typical approach to variable selection is mentioned. In addition, the need for and types of cross validation in modeling is sketched.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Anderberg, M.R.: Cluster Analysis for Applications. Academic Press, Inc, New York (1973)
Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Statist. Soc., Ser. B, 289–300 (1995)
Bi, J., Bennett, K., Embrechts, M., Breneman, C., Song, M.: Dimensionality Reduction via Sparse Support Vector Machines. Journal of Machine Learning Research 1, 1–48 (2002)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, J.: Classification and Regression Trees. Wadsworth (1984)
Breiman, L.: Better subset selection using the non-negative garotte; TechnicalReport, Univ. of California, Berkeley (1993)
Carroll, J.D., Arabie, P.: Multidimensional scaling. In: Birnbaum, M.H. (ed.) Handbook of Perception and Cognition: Measurement, Judgment and Decision Making, pp. 179–250. Academic Press, San Diego (1998)
Chen, S.S., Donoho, D.L., Saunders, M.: Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing 20, 33–61 (1999)
Donoho, D., Johnstone, I., Tibshirani, R.: Wavelet shrinkage: asymptotia (with discussion). J. R. Statist. Soc., Ser. B 57, 301–337 (1995)
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least Angle Regression. The Annals of Statistics 32, 407–499 (2002)
Fung, G., Mangasarian, O.L.: A Feature Selection Newton Method for Support Vector Machine Classification. Computational Optimization and Aplications, 1–18 (2003)
Gifi, A.: Nonlinear Multivariate Analysis, Dep. of Data Theory, Univ. of Leiden (1981)
Greenacre, M.J.: Theory and Applications of Correspondence Analysis. Academic Press, London (1988)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene Selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)
Harman, H.H.: Modern Factor Analysis. University of Chicago Press, Chicago (1976)
Hartmann, W.: CMAT Users Manual (1995), see http://www.cmat.pair.com/cmat
Joachims, T.: Making large-scale SVM learning practical. In: Schölkopf, B., Burges, C.J.C., Smola, A.J. (eds.) Advances in Kernel Methods: Support Vector Learning, MIT Press, Cambridge (1999)
Kaufman, L., Rousseeuw, P.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990)
Li, K.C.: Sliced inverse regression for dimension reduction. JASA 86, 316–342 (1991)
Li, K.C.: On principal Hessian directions for data visualization and dimension reduction. JASA 87, 1025–1034 (1992)
Mangasarian, O.L., Wild, E.W.: Feature Selection in k-Median Clustering, Technical Report 04-01, Data Mining Institute, Madison: University of Wisconsin (2004)
McCabe, G.P.: Principal variables. Technometrics 26, 139–144 (1984)
Miller, A.: Subset Selection in Regression. CRC Press, Chapman & Hall (2002)
Mulaik, S.A.: The Foundations of Factor Analysis. Mc Graw Hill, New York (1972)
Osborne, M.R., Presnell, B., Turlach, B.A.: On the LASSO and its Dual. JCGS 9, 319–337 (2000)
Osborne, M.R., Presnell, B., Turlach, B.A.: A new approach to variable selection in least squares problems. IMA Journal of Numerical Analysis 20, 389–404 (2000)
R Language and packages see: http://www.r-project.org/ and http://cran.r-project.org/
Ripley, B.D.: Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge (1996)
Rosipal, R., Trejo, L.J.: Kernel partial least squares regression in reproducing kernel Hilbert space. Journal of Machine Learning Research 2, 97–123 (2001)
SAS/STAT User’s Guide, Version 6, Second Printing, SAS Institute Inc., Cary, NC (1990)
Schölkopf, B., Smola, A.J.: Learning with Kernels. MIT Press, Cambridge (2002)
Somerville, P.N.: Step-down FDR Procedures for large numbers of hypotheses. In: Dongarra, J., Madsen, K., Waśniewski, J. (eds.) PARA 2004. LNCS, vol. 3732, pp. 949–956. Springer, Heidelberg (2006)
Somerville, P.N.: FORTRAN90 and SAS-IML programs for computation of critical values for multiple testing and simultaneous confidence intervals. Journal of Statistical Software (2001)
Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Statist. Soc., Ser. B 58, 267–288 (1996)
Tibshirani, R., Hastie, T., Narasimhan, B., Chu, G.: Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proceeding of the National Academy of Sciences 99, 6567–6572 (2002)
Weisberg, S.: Dimension reduction regression with R. JSS, 7 (2002)
Weston, J., Mukherje, S., Chapelle, O., Pontil, M., Poggio, T., Vapnik, V.: Feature Selection for SVMs. Neural Information Processing Systems 13, 668–674 (2000)
Vapnik, V.N.: The Nature of Statistical Learning. Springer, New York (1995)
Wold, H.: Estimation of principal components and related models by iterative least squares. In: Multivariate Analysis, Academic Press, New York (1966)
Yang, J., Honavar, V.: Feature selection using a genetic algorithm, Technical Report, Iowa State University (1997)
Zou, H., Hastie, T.: Regression shrinkage and selection via the elastic net, with applications to micro arrays, Technical Report, Stanford University (2003)
Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. Technical Report, Stanford University (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hartmann, W.M. (2006). Dimension Reduction vs. Variable Selection. In: Dongarra, J., Madsen, K., Waśniewski, J. (eds) Applied Parallel Computing. State of the Art in Scientific Computing. PARA 2004. Lecture Notes in Computer Science, vol 3732. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11558958_113
Download citation
DOI: https://doi.org/10.1007/11558958_113
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29067-4
Online ISBN: 978-3-540-33498-9
eBook Packages: Computer ScienceComputer Science (R0)