Abstract
In this chapter, we will introduce our recent works on feature selection for Partial Least Square based Dimension Reduction (PLSDR). Some previous works of PLSDR, have performed well on bio-medical and chemical data sets, but there are still some problems, like how to determine the number of principle components and how to remove the irrelevant and redundant features for PLSDR. Firstly, we propose a general framework to describe how to perform feature selection for dimension reduction methods, which contains the preprocessing step of irrelevant and redundant feature selection and the postprocessing step of selection of principle components. Secondly, to give an example, we try to handle these problems in the case of PLSDR: 1) we discuss how to determine the top number of features for PLSDR; 2) we propose to remove irrelevant features for PLSDR by using an efficient algorithmof feature probes; 3) we investigate an supervised solution to remove redundant features; 4) we study on whether the top features are important to classification and how to select the most discriminant principal components. The above proposed algorithms are evaluated on several benchmark microarray data sets and show satisfied performance.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: Class discovery and class prediction by gene expression. Bioinformatics & Computational Biology 286, 531–537 (1999)
Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. In: Proceedings of the National Academy of Sciences of the United States of America, pp. 6745–6750 (1999)
Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association 97, 77–87 (2002)
Antoniadis, A., Lambert-Lacroix, S., Leblanc, F.: Effective dimension reduction methods for tumor classification using gene expression data. Bioinformatics 19, 563–570 (2003)
Nguyen, D.V., David, D.M., Rocke, M.: On partial least squares dimension reduction for microarray-based classification: a simulation study. Computational Statistics & Data Analysis 46, 407–425 (2004)
Dai, J.J., Lieu, L., Rocke, D.: Dimension reduction for classification with gene expression data. Statistical Applications in Genetics and Molecular Biology 6, Article 6 (2006)
Boulesteix, A.L., Strimmer, K.: Partial least squares: A versatile tool for the analysis of high-dimensional genomic data. Briefings in Bioinformatics 8, 32–44 (2007)
Wold, H.: Path models with latent variables: the NIPALS approach. In: Quantitative Sociology: International Perspectives on Mathematical and Statistical Model Building, pp. 307–357. Academic Press, London (1975)
Wold, S., Ruhe, A., Wold, H., Dunn, W.: Collinearity problem in linear regression the partial least squares (pls) approach to generalized inverses. SIAM Journal of Scientific and Statistical Computations 5, 735–743 (1984)
Martens, H.: Reliable and relevant modeling of real world data: a personal account of the development of pls regression. Chemometrics and Intelligent Laboratory Systems 58, 85–95 (2001)
Helland, I.S.: On the structure of partial least squares regression. Communications in statistics. Simulation and computation 17, 581–607 (1988)
Wold, S., Sjostrom, M., Eriksson, L.: Pls-regression: a basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems 58, 109–130 (2001)
Helland, I.S.: Some theoretical aspects of partial least squares regression. Chemometrics and Intelligent Laboratory Systems 58, 97–107 (2001)
Nguyen, D.V., Rocke, D.M.: Tumor classification by partial least squares using microarray gene expression data. Bioinformatics 18, 39–50 (2002)
Nguyen, D.V., Rocke, D.M.: Multi-class cancer classification via partial least squares with gene expression profiles. Bioinformatics 18, 1216–1226 (2002)
Zeng, X.Q., Li, G.Z., Wu, G.: On the number of partial least squares components in dimension reduction for tumor classification. In: BioDM 2007. LNCS (LNBI), vol. 4819, pp. 206–217. Springer, Heidelberg (2007)
Bu, H.L., Li, G.Z., Zeng, X.Q., Yang, M.Q., Yang, J.Y.: Feature selection and partial least squares based dimension reduction for tumor classification. In: Proceedings of IEEE 7th International Symposium on Bioinformatics & Bioengineering (IEEE BIBE 2007), Boston, USA, pp. 1439–1444. IEEE Press, Los Alamitos (2007)
Zeng, X.Q., Li, G.Z., Wu, G.F., Yang, J.Y., Yang, M.Q.: Irrelevant gene elimination for partial least squares based dimension reduction by using feature probes. International Journal of Data Mining & Bioinformatics (in press) (2008)
Li, G.Z., Zeng, X.Q., Yang, J.Y., Yang, M.Q.: Partial least squares based dimension reduction with gene selection for tumor classification. In: Proceedings of IEEE 7th International Symposium on Bioinformatics & Bioengineering (IEEE BIBE 2007), Boston, USA, pp. 967–973 (2007)
Zeng, X.Q., Li, G.Z., Yang, J.Y., Yang, M.Q., Wu, G.F.: Dimension reduction with redundant genes elimination for tumor classification. BMC Bioinformatics 9(suppl. 6), 8 (2008)
Zeng, X.Q., Wang, M.W., Nie, J.Y.: Text classification based on partial least square analysis. In: The 22nd Annual ACM Symposium on Applied Computing, Special Track on Information Access and Retrieval, pp. 834–838 (2007)
Zeng, X.Q., Li, G.Z., Wang, M., Wu, G.F.: Local semantic indexing based on partial least squares for text classification. Journal of Computational Information Systems 4, 1145–1152 (2008)
Zeng, X.Q., Li, G.Z.: Orthogonal projection weights in dimension reduction based on partial least squares. International Journal of Computational Intelligence of Bioinformatics & System Biology 1(1), 105–120 (2008)
Nguyen, D.V., Rocke, D.M.: Tumor classification by partial least squares using microarray gene expression data. Bioinformatics 18, 39–50 (2002)
Dai, J.J., Lieu, L., Rocke, D.: Dimension reduction for classification with gene expression microarray data. Statistical Applications in Genetics and Molecular Biology 5(1), Article 6 (2006)
Wold, S., Sjostrom, M., Eriksson, L.: PLS-regression: A basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems 58, 109–130 (2001)
Barker, M., Rayens, W.: Partial least squares for discrimination. Journal of Chemometrics 17, 166–173 (2003)
Hoskuldsson, A.: Pls regression methods. Journal of Chemometrics 2, 211–228 (1988)
Manne, R.: Analysis of two partial-least-squares algorithms for multivariate calibration. Chemometrics and Intelligent Laboratory Systems 2, 187–197 (1987)
Yu, L., Liu, H.: Redundancy based feature selection for microarray data. In: Proc. 10th ACM SIGKDD Conf. Knowledge Discovery and Data Mining, pp. 22–25 (2004)
Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research 5, 1205–1224 (2004)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley Interscience, Hoboken (2000)
Levner, I.: Feature selection and nearest centroid classification for protein mass spectrometry. BMC Bioinformatics 6, 68 (2005)
Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation 10, 1895–1923 (1998)
Molinaro, A.M., Simon, R., Pfeiffer, R.M.: Prediction error estimation: A comparison of resampling methods. Bioinformatics 21, 3301–3307 (2005)
Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Guyon, I., Elisseefi, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
Forman, G.: An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research 3, 1289–1305 (2003)
Hall, M.A., Holmes, G.: Benchmarking attribute selection techniques for discrete class data mining. IEEE Transactions on Knowledge and Data Engineering 15, 1437–1447 (2003)
Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation 10, 1895–1923 (1998)
Bu, H.L., Li, G.Z., Zeng, X.Q.: Reducing error of tumor classification by using dimension reduction with feature selection. Lecture Notes in Operations Research 7, 232–241 (2007)
Li, G.Z., Bu, H.L., Yang, M.Q., Zeng, X.Q., Yang, J.Y.: Selecting subsets of newly extracted features from PCA and PLS in microarray data analysis. BMC Genomics 9(S2), S24 (2008)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering 17(3), 1–12 (2005)
Kudo, M., Sklansky, J.: Comparison of algorithms that select features for pattern classifiers. Pattern Recognition 33, 25–41 (2000)
Goldberg, D.E.: Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, Boston (1998)
Li, G.Z., Meng, H.H., Ni, J.: Embedded gene selection for imbalanced microarray data analysis. In: Proceedings of Third IEEE International Multisymposium on Computer and Computational Sciences (IEEE- IMSCCS 2008). IEEE Press, Los Alamitos (in press) (2008)
Li, G.Z., Meng, H.H., Lu, W.C., Yang, J.Y., Yang, M.Q.: Asymmetric bagging and feature selection for activities prediction of drug molecules. BMC Bioinformatics 9(suppl. 6), 7 (2008)
Van’t Veer, L.V., Dai, H., Vijver, M.V., He, Y., Hart, A., Mao, M., Peterse, H., Kooy, K., Marton, M., Witteveen, A., Schreiber, G., Kerkhoven, R., Roberts, C., Linsley, P., Bernards, R., Friend, S.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002)
Pomeroy, S.L., Tamayo, P., Gaasenbeek, M., Sturla, L.M., Angelo, M., McLaughlin, M.E., Kim, J.Y., Goumnerovak, L.C., Blackk, P.M., Lau, C., Allen, J.C., Zagzag, D., Olson, J.M., Curran, T., Wetmo, C.: Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415, 436–442 (2002)
Alizadeh, A.A., Eisen, M.B., Davis, R.E., Ma, C., Lossos, I.S., Rosenwald, A., Boldrick, J.C., Sabet, H., Tran, T., Yu, X., Powell, J.I., Yang, L., Marti, G.E., Moore, T., Jr, J.H., Lu, L., Lewis, D.B., Tibshirani, R., Sherlock, G., Chan, W.C., Greiner, T.C., Weisenburger, D.D., Armitage, J.O., Warnke, R., Levy, R., Wilson, W., Grever, M.R., Byrd, J.C., Botstein, D., Brown, P.O., Staudt, L.M.: Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000)
Gordon, G.J., Jensen, R.V., Hsiao, L.L., Gullans, S.R., Blumenstock, J.E., Ramaswamy, S., Richards, W.G., Sugarbaker, D.J., Bueno, R.: Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Research 62, 4963–4967 (2002)
Petricoin, E.F., Ardekani, A.M., Hitt, B.A., Levine, P.J., Fusaro, V.A., Steinberg, S.M., Mills, G.B., Simone, C., Fishman, D., Kohn, E.C., Liotta, L.: Use of proteomic patterns in serum to identify ovarian cancer. The Lancet 359, 572–577 (2002)
Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J., Lander, E.S., Loda, M., Kantoff, P.W., Golub, T.R., Sellers, W.R.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1, 203–209 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Li, GZ., Zeng, XQ. (2009). Feature Selection for Partial Least Square Based Dimension Reduction. In: Abraham, A., Hassanien, AE., Snášel, V. (eds) Foundations of Computational Intelligence Volume 5. Studies in Computational Intelligence, vol 205. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01536-6_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-01536-6_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01535-9
Online ISBN: 978-3-642-01536-6
eBook Packages: EngineeringEngineering (R0)