Skip to main content

Feature Selection for Partial Least Square Based Dimension Reduction

  • Chapter

Part of the book series: Studies in Computational Intelligence ((SCI,volume 205))

Abstract

In this chapter, we will introduce our recent works on feature selection for Partial Least Square based Dimension Reduction (PLSDR). Some previous works of PLSDR, have performed well on bio-medical and chemical data sets, but there are still some problems, like how to determine the number of principle components and how to remove the irrelevant and redundant features for PLSDR. Firstly, we propose a general framework to describe how to perform feature selection for dimension reduction methods, which contains the preprocessing step of irrelevant and redundant feature selection and the postprocessing step of selection of principle components. Secondly, to give an example, we try to handle these problems in the case of PLSDR: 1) we discuss how to determine the top number of features for PLSDR; 2) we propose to remove irrelevant features for PLSDR by using an efficient algorithmof feature probes; 3) we investigate an supervised solution to remove redundant features; 4) we study on whether the top features are important to classification and how to select the most discriminant principal components. The above proposed algorithms are evaluated on several benchmark microarray data sets and show satisfied performance.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: Class discovery and class prediction by gene expression. Bioinformatics & Computational Biology 286, 531–537 (1999)

    Google Scholar 

  2. Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. In: Proceedings of the National Academy of Sciences of the United States of America, pp. 6745–6750 (1999)

    Google Scholar 

  3. Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association 97, 77–87 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  4. Antoniadis, A., Lambert-Lacroix, S., Leblanc, F.: Effective dimension reduction methods for tumor classification using gene expression data. Bioinformatics 19, 563–570 (2003)

    Article  Google Scholar 

  5. Nguyen, D.V., David, D.M., Rocke, M.: On partial least squares dimension reduction for microarray-based classification: a simulation study. Computational Statistics & Data Analysis 46, 407–425 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  6. Dai, J.J., Lieu, L., Rocke, D.: Dimension reduction for classification with gene expression data. Statistical Applications in Genetics and Molecular Biology 6, Article 6 (2006)

    Google Scholar 

  7. Boulesteix, A.L., Strimmer, K.: Partial least squares: A versatile tool for the analysis of high-dimensional genomic data. Briefings in Bioinformatics 8, 32–44 (2007)

    Article  Google Scholar 

  8. Wold, H.: Path models with latent variables: the NIPALS approach. In: Quantitative Sociology: International Perspectives on Mathematical and Statistical Model Building, pp. 307–357. Academic Press, London (1975)

    Google Scholar 

  9. Wold, S., Ruhe, A., Wold, H., Dunn, W.: Collinearity problem in linear regression the partial least squares (pls) approach to generalized inverses. SIAM Journal of Scientific and Statistical Computations 5, 735–743 (1984)

    Article  MATH  Google Scholar 

  10. Martens, H.: Reliable and relevant modeling of real world data: a personal account of the development of pls regression. Chemometrics and Intelligent Laboratory Systems 58, 85–95 (2001)

    Article  Google Scholar 

  11. Helland, I.S.: On the structure of partial least squares regression. Communications in statistics. Simulation and computation 17, 581–607 (1988)

    Article  MATH  MathSciNet  Google Scholar 

  12. Wold, S., Sjostrom, M., Eriksson, L.: Pls-regression: a basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems 58, 109–130 (2001)

    Article  Google Scholar 

  13. Helland, I.S.: Some theoretical aspects of partial least squares regression. Chemometrics and Intelligent Laboratory Systems 58, 97–107 (2001)

    Article  Google Scholar 

  14. Nguyen, D.V., Rocke, D.M.: Tumor classification by partial least squares using microarray gene expression data. Bioinformatics 18, 39–50 (2002)

    Article  Google Scholar 

  15. Nguyen, D.V., Rocke, D.M.: Multi-class cancer classification via partial least squares with gene expression profiles. Bioinformatics 18, 1216–1226 (2002)

    Article  Google Scholar 

  16. Zeng, X.Q., Li, G.Z., Wu, G.: On the number of partial least squares components in dimension reduction for tumor classification. In: BioDM 2007. LNCS (LNBI), vol. 4819, pp. 206–217. Springer, Heidelberg (2007)

    Google Scholar 

  17. Bu, H.L., Li, G.Z., Zeng, X.Q., Yang, M.Q., Yang, J.Y.: Feature selection and partial least squares based dimension reduction for tumor classification. In: Proceedings of IEEE 7th International Symposium on Bioinformatics & Bioengineering (IEEE BIBE 2007), Boston, USA, pp. 1439–1444. IEEE Press, Los Alamitos (2007)

    Google Scholar 

  18. Zeng, X.Q., Li, G.Z., Wu, G.F., Yang, J.Y., Yang, M.Q.: Irrelevant gene elimination for partial least squares based dimension reduction by using feature probes. International Journal of Data Mining & Bioinformatics (in press) (2008)

    Google Scholar 

  19. Li, G.Z., Zeng, X.Q., Yang, J.Y., Yang, M.Q.: Partial least squares based dimension reduction with gene selection for tumor classification. In: Proceedings of IEEE 7th International Symposium on Bioinformatics & Bioengineering (IEEE BIBE 2007), Boston, USA, pp. 967–973 (2007)

    Google Scholar 

  20. Zeng, X.Q., Li, G.Z., Yang, J.Y., Yang, M.Q., Wu, G.F.: Dimension reduction with redundant genes elimination for tumor classification. BMC Bioinformatics 9(suppl. 6), 8 (2008)

    Article  Google Scholar 

  21. Zeng, X.Q., Wang, M.W., Nie, J.Y.: Text classification based on partial least square analysis. In: The 22nd Annual ACM Symposium on Applied Computing, Special Track on Information Access and Retrieval, pp. 834–838 (2007)

    Google Scholar 

  22. Zeng, X.Q., Li, G.Z., Wang, M., Wu, G.F.: Local semantic indexing based on partial least squares for text classification. Journal of Computational Information Systems 4, 1145–1152 (2008)

    Google Scholar 

  23. Zeng, X.Q., Li, G.Z.: Orthogonal projection weights in dimension reduction based on partial least squares. International Journal of Computational Intelligence of Bioinformatics & System Biology 1(1), 105–120 (2008)

    Google Scholar 

  24. Nguyen, D.V., Rocke, D.M.: Tumor classification by partial least squares using microarray gene expression data. Bioinformatics 18, 39–50 (2002)

    Article  Google Scholar 

  25. Dai, J.J., Lieu, L., Rocke, D.: Dimension reduction for classification with gene expression microarray data. Statistical Applications in Genetics and Molecular Biology 5(1), Article 6 (2006)

    Google Scholar 

  26. Wold, S., Sjostrom, M., Eriksson, L.: PLS-regression: A basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems 58, 109–130 (2001)

    Article  Google Scholar 

  27. Barker, M., Rayens, W.: Partial least squares for discrimination. Journal of Chemometrics 17, 166–173 (2003)

    Article  Google Scholar 

  28. Hoskuldsson, A.: Pls regression methods. Journal of Chemometrics 2, 211–228 (1988)

    Article  Google Scholar 

  29. Manne, R.: Analysis of two partial-least-squares algorithms for multivariate calibration. Chemometrics and Intelligent Laboratory Systems 2, 187–197 (1987)

    Article  Google Scholar 

  30. Yu, L., Liu, H.: Redundancy based feature selection for microarray data. In: Proc. 10th ACM SIGKDD Conf. Knowledge Discovery and Data Mining, pp. 22–25 (2004)

    Google Scholar 

  31. Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research 5, 1205–1224 (2004)

    MathSciNet  Google Scholar 

  32. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley Interscience, Hoboken (2000)

    Google Scholar 

  33. Levner, I.: Feature selection and nearest centroid classification for protein mass spectrometry. BMC Bioinformatics 6, 68 (2005)

    Article  Google Scholar 

  34. Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation 10, 1895–1923 (1998)

    Article  Google Scholar 

  35. Molinaro, A.M., Simon, R., Pfeiffer, R.M.: Prediction error estimation: A comparison of resampling methods. Bioinformatics 21, 3301–3307 (2005)

    Article  Google Scholar 

  36. Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  37. Guyon, I., Elisseefi, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)

    Article  MATH  Google Scholar 

  38. Forman, G.: An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research 3, 1289–1305 (2003)

    Article  MATH  Google Scholar 

  39. Hall, M.A., Holmes, G.: Benchmarking attribute selection techniques for discrete class data mining. IEEE Transactions on Knowledge and Data Engineering 15, 1437–1447 (2003)

    Article  Google Scholar 

  40. Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation 10, 1895–1923 (1998)

    Article  Google Scholar 

  41. Bu, H.L., Li, G.Z., Zeng, X.Q.: Reducing error of tumor classification by using dimension reduction with feature selection. Lecture Notes in Operations Research 7, 232–241 (2007)

    Google Scholar 

  42. Li, G.Z., Bu, H.L., Yang, M.Q., Zeng, X.Q., Yang, J.Y.: Selecting subsets of newly extracted features from PCA and PLS in microarray data analysis. BMC Genomics 9(S2), S24 (2008)

    Article  Google Scholar 

  43. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)

    Article  MATH  Google Scholar 

  44. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)

    Article  MATH  Google Scholar 

  45. Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering 17(3), 1–12 (2005)

    Article  MATH  Google Scholar 

  46. Kudo, M., Sklansky, J.: Comparison of algorithms that select features for pattern classifiers. Pattern Recognition 33, 25–41 (2000)

    Article  Google Scholar 

  47. Goldberg, D.E.: Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, Boston (1998)

    Google Scholar 

  48. Li, G.Z., Meng, H.H., Ni, J.: Embedded gene selection for imbalanced microarray data analysis. In: Proceedings of Third IEEE International Multisymposium on Computer and Computational Sciences (IEEE- IMSCCS 2008). IEEE Press, Los Alamitos (in press) (2008)

    Google Scholar 

  49. Li, G.Z., Meng, H.H., Lu, W.C., Yang, J.Y., Yang, M.Q.: Asymmetric bagging and feature selection for activities prediction of drug molecules. BMC Bioinformatics 9(suppl. 6), 7 (2008)

    Article  Google Scholar 

  50. Van’t Veer, L.V., Dai, H., Vijver, M.V., He, Y., Hart, A., Mao, M., Peterse, H., Kooy, K., Marton, M., Witteveen, A., Schreiber, G., Kerkhoven, R., Roberts, C., Linsley, P., Bernards, R., Friend, S.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002)

    Article  Google Scholar 

  51. Pomeroy, S.L., Tamayo, P., Gaasenbeek, M., Sturla, L.M., Angelo, M., McLaughlin, M.E., Kim, J.Y., Goumnerovak, L.C., Blackk, P.M., Lau, C., Allen, J.C., Zagzag, D., Olson, J.M., Curran, T., Wetmo, C.: Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415, 436–442 (2002)

    Article  Google Scholar 

  52. Alizadeh, A.A., Eisen, M.B., Davis, R.E., Ma, C., Lossos, I.S., Rosenwald, A., Boldrick, J.C., Sabet, H., Tran, T., Yu, X., Powell, J.I., Yang, L., Marti, G.E., Moore, T., Jr, J.H., Lu, L., Lewis, D.B., Tibshirani, R., Sherlock, G., Chan, W.C., Greiner, T.C., Weisenburger, D.D., Armitage, J.O., Warnke, R., Levy, R., Wilson, W., Grever, M.R., Byrd, J.C., Botstein, D., Brown, P.O., Staudt, L.M.: Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000)

    Article  Google Scholar 

  53. Gordon, G.J., Jensen, R.V., Hsiao, L.L., Gullans, S.R., Blumenstock, J.E., Ramaswamy, S., Richards, W.G., Sugarbaker, D.J., Bueno, R.: Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Research 62, 4963–4967 (2002)

    Google Scholar 

  54. Petricoin, E.F., Ardekani, A.M., Hitt, B.A., Levine, P.J., Fusaro, V.A., Steinberg, S.M., Mills, G.B., Simone, C., Fishman, D., Kohn, E.C., Liotta, L.: Use of proteomic patterns in serum to identify ovarian cancer. The Lancet 359, 572–577 (2002)

    Article  Google Scholar 

  55. Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J., Lander, E.S., Loda, M., Kantoff, P.W., Golub, T.R., Sellers, W.R.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1, 203–209 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Li, GZ., Zeng, XQ. (2009). Feature Selection for Partial Least Square Based Dimension Reduction. In: Abraham, A., Hassanien, AE., Snášel, V. (eds) Foundations of Computational Intelligence Volume 5. Studies in Computational Intelligence, vol 205. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01536-6_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-01536-6_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-01535-9

  • Online ISBN: 978-3-642-01536-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics