Penalized Principal Component Analysis of Microarray Data

Nikulin, Vladimir; McLachlan, Geoffrey J.

doi:10.1007/978-3-642-14571-1_7

Vladimir Nikulin²² &
Geoffrey J. McLachlan²²

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6160))

Included in the following conference series:

International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics

1062 Accesses
4 Citations

Abstract

The high dimensionality of microarray data, the expressions of thousands of genes in a much smaller number of samples, presents challenges that affect the validity of the analytical results. Hence attention has to be given to some form of dimension reduction to represent the data in terms of a smaller number of variables. The latter are often chosen to be a linear combinations of the original variables (genes) called metagenes. One commonly used approach is principal component analysis (PCA), which can be implemented via a singular value decomposition (SVD). However, in the case of a high-dimensional matrix, SVD may be very expensive in terms of computational time. We propose to reduce the SVD task to the ordinary maximisation problem with an Euclidean norm which may be solved easily using gradient-based optimisation. We demonstrate the effectiveness of this approach to the supervised classification of gene expression data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Huber, P.: Projection pursuit. The Annals of Statistics 13, 435–475 (1985)
Article MATH MathSciNet Google Scholar
Friedman, J.: Exploratory projection pursuit. Journal of the American Statistical Association 82, 249–266 (1987)
Article MATH MathSciNet Google Scholar
Alter, O., Brown, P., Botstein, D.: Singular value decomposition for genome-wide expression data processing and modelling. PNAS 97, 10101–10106 (2000)
Article Google Scholar
Guan, Y., Dy, J.: Sparse probabilistic principal component analysis. In: AISTATS, pp. 185–192 (2009)
Google Scholar
Zass, R., Shashua, A.: Nonnegative sparse PCA. In: Advances in Neural Information Processing Systems (2006)
Google Scholar
Nikulin, V., McLachlan, G.: Regularised k-means clustering for dimension reduction applied to supervised classification. In: CIBB Conference, Genova, Italy (2009)
Google Scholar
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)
Article MATH Google Scholar
Bohning, D.: Multinomial logistic regression algorithm. Ann. Inst. Statist. Math. 44, 197–200 (1992)
Article MathSciNet Google Scholar
Liu, L., Hawkins, D., Ghosh, S., Young, S.: Robust singular value decomposition analysis of microarray data. PNAS 100, 13167–13172 (2003)
Article MATH MathSciNet Google Scholar
Fogel, P., Young, S., Hawkins, D., Ledirac, N.: A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Bioinformatics 23, 44–49 (2007)
Article Google Scholar
Hastie, T., Tibshirani, R.: Efficient quadratic regularisation of expression arrays. Biostatistics 5, 329–340 (2004)
Article MATH Google Scholar
Witten, D., Tibshirani, R., Hastie, T.: A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10, 515–534 (2009)
Article Google Scholar
Golub, T., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Article Google Scholar
Alizadeh, A., et al.: Distinct types of diffuse large b-cell-lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000)
Article Google Scholar
Sharma, P., et al.: Early detection of breast cancer based on gene-expression patterns in peripheral blood cells. Breast Cancer Research 7, R634–R644 (2005)
Article Google Scholar
Khan, J., et al.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine 7, 673–679 (2001)
Article Google Scholar
Dudoit, S., Fridlyand, J., Speed, I.: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of Americal Statistical Association 97, 77–87 (2002)
Article MATH MathSciNet Google Scholar
Dettling, M., Buhlmann, P.: Boosting for tumor classification with gene expression data. Bioinformatics 19, 1061–1069 (2003)
Article Google Scholar
Peng, Y.: A novel ensemble machine learning for robust microarray data classification. Computers in Biology and Medicine 36, 553–573 (2006)
Article Google Scholar
Wood, I., Visscher, P., Mengersen, K.: Classification based upon expression data: bias and precision of error rates. Bioinformatics 23, 1363–1370 (2007)
Article Google Scholar
McLachlan, G., et al.: Analysing microarray gene expression data. Wiley, Hoboken (2004)
Book Google Scholar
Ambroise, C., McLachlan, G.: Selection bias in gene extraction on the basis of microarray gene expression data. PNAS 99, 6562–6566 (2002)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, University of Queensland,
Vladimir Nikulin & Geoffrey J. McLachlan

Authors

Vladimir Nikulin
View author publications
You can also search for this author in PubMed Google Scholar
Geoffrey J. McLachlan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

DISI - Dipartimento di Informatica e Scienze dell’Informazione, Università di Genova, Via Dodecaneso 35, 16146, Genova, Italy
Francesco Masulli
Center for Biostatistics, The Methodist Hospital Research Institute (TMHRI), Weill Cornell Medical College, Cornell University, 6565 Fannin, Suite MGJ6-031, 77030, Houston, Texas, USA
Leif E. Peterson
Dipartimento di Matematica ed Informatica, Università di Salerno, Via Ponte don Melillo, 84084, Fisciano, (Sa), Italy
Roberto Tagliaferri

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nikulin, V., McLachlan, G.J. (2010). Penalized Principal Component Analysis of Microarray Data. In: Masulli, F., Peterson, L.E., Tagliaferri, R. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2009. Lecture Notes in Computer Science(), vol 6160. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14571-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-14571-1_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14570-4
Online ISBN: 978-3-642-14571-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics