Skip to main content

Penalized Principal Component Analysis of Microarray Data

  • Conference paper
Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB 2009)

Abstract

The high dimensionality of microarray data, the expressions of thousands of genes in a much smaller number of samples, presents challenges that affect the validity of the analytical results. Hence attention has to be given to some form of dimension reduction to represent the data in terms of a smaller number of variables. The latter are often chosen to be a linear combinations of the original variables (genes) called metagenes. One commonly used approach is principal component analysis (PCA), which can be implemented via a singular value decomposition (SVD). However, in the case of a high-dimensional matrix, SVD may be very expensive in terms of computational time. We propose to reduce the SVD task to the ordinary maximisation problem with an Euclidean norm which may be solved easily using gradient-based optimisation. We demonstrate the effectiveness of this approach to the supervised classification of gene expression data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Huber, P.: Projection pursuit. The Annals of Statistics 13, 435–475 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  2. Friedman, J.: Exploratory projection pursuit. Journal of the American Statistical Association 82, 249–266 (1987)

    Article  MATH  MathSciNet  Google Scholar 

  3. Alter, O., Brown, P., Botstein, D.: Singular value decomposition for genome-wide expression data processing and modelling. PNAS 97, 10101–10106 (2000)

    Article  Google Scholar 

  4. Guan, Y., Dy, J.: Sparse probabilistic principal component analysis. In: AISTATS, pp. 185–192 (2009)

    Google Scholar 

  5. Zass, R., Shashua, A.: Nonnegative sparse PCA. In: Advances in Neural Information Processing Systems (2006)

    Google Scholar 

  6. Nikulin, V., McLachlan, G.: Regularised k-means clustering for dimension reduction applied to supervised classification. In: CIBB Conference, Genova, Italy (2009)

    Google Scholar 

  7. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)

    Article  MATH  Google Scholar 

  8. Bohning, D.: Multinomial logistic regression algorithm. Ann. Inst. Statist. Math. 44, 197–200 (1992)

    Article  MathSciNet  Google Scholar 

  9. Liu, L., Hawkins, D., Ghosh, S., Young, S.: Robust singular value decomposition analysis of microarray data. PNAS 100, 13167–13172 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  10. Fogel, P., Young, S., Hawkins, D., Ledirac, N.: A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Bioinformatics 23, 44–49 (2007)

    Article  Google Scholar 

  11. Hastie, T., Tibshirani, R.: Efficient quadratic regularisation of expression arrays. Biostatistics 5, 329–340 (2004)

    Article  MATH  Google Scholar 

  12. Witten, D., Tibshirani, R., Hastie, T.: A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10, 515–534 (2009)

    Article  Google Scholar 

  13. Golub, T., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)

    Article  Google Scholar 

  14. Alizadeh, A., et al.: Distinct types of diffuse large b-cell-lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000)

    Article  Google Scholar 

  15. Sharma, P., et al.: Early detection of breast cancer based on gene-expression patterns in peripheral blood cells. Breast Cancer Research 7, R634–R644 (2005)

    Article  Google Scholar 

  16. Khan, J., et al.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine 7, 673–679 (2001)

    Article  Google Scholar 

  17. Dudoit, S., Fridlyand, J., Speed, I.: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of Americal Statistical Association 97, 77–87 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  18. Dettling, M., Buhlmann, P.: Boosting for tumor classification with gene expression data. Bioinformatics 19, 1061–1069 (2003)

    Article  Google Scholar 

  19. Peng, Y.: A novel ensemble machine learning for robust microarray data classification. Computers in Biology and Medicine 36, 553–573 (2006)

    Article  Google Scholar 

  20. Wood, I., Visscher, P., Mengersen, K.: Classification based upon expression data: bias and precision of error rates. Bioinformatics 23, 1363–1370 (2007)

    Article  Google Scholar 

  21. McLachlan, G., et al.: Analysing microarray gene expression data. Wiley, Hoboken (2004)

    Book  Google Scholar 

  22. Ambroise, C., McLachlan, G.: Selection bias in gene extraction on the basis of microarray gene expression data. PNAS 99, 6562–6566 (2002)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nikulin, V., McLachlan, G.J. (2010). Penalized Principal Component Analysis of Microarray Data. In: Masulli, F., Peterson, L.E., Tagliaferri, R. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2009. Lecture Notes in Computer Science(), vol 6160. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14571-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14571-1_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14570-4

  • Online ISBN: 978-3-642-14571-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics