Abstract
The high dimensionality of microarray data, the expressions of thousands of genes in a much smaller number of samples, presents challenges that affect the validity of the analytical results. Hence attention has to be given to some form of dimension reduction to represent the data in terms of a smaller number of variables. The latter are often chosen to be a linear combinations of the original variables (genes) called metagenes. One commonly used approach is principal component analysis (PCA), which can be implemented via a singular value decomposition (SVD). However, in the case of a high-dimensional matrix, SVD may be very expensive in terms of computational time. We propose to reduce the SVD task to the ordinary maximisation problem with an Euclidean norm which may be solved easily using gradient-based optimisation. We demonstrate the effectiveness of this approach to the supervised classification of gene expression data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Huber, P.: Projection pursuit. The Annals of Statistics 13, 435–475 (1985)
Friedman, J.: Exploratory projection pursuit. Journal of the American Statistical Association 82, 249–266 (1987)
Alter, O., Brown, P., Botstein, D.: Singular value decomposition for genome-wide expression data processing and modelling. PNAS 97, 10101–10106 (2000)
Guan, Y., Dy, J.: Sparse probabilistic principal component analysis. In: AISTATS, pp. 185–192 (2009)
Zass, R., Shashua, A.: Nonnegative sparse PCA. In: Advances in Neural Information Processing Systems (2006)
Nikulin, V., McLachlan, G.: Regularised k-means clustering for dimension reduction applied to supervised classification. In: CIBB Conference, Genova, Italy (2009)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)
Bohning, D.: Multinomial logistic regression algorithm. Ann. Inst. Statist. Math. 44, 197–200 (1992)
Liu, L., Hawkins, D., Ghosh, S., Young, S.: Robust singular value decomposition analysis of microarray data. PNAS 100, 13167–13172 (2003)
Fogel, P., Young, S., Hawkins, D., Ledirac, N.: A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Bioinformatics 23, 44–49 (2007)
Hastie, T., Tibshirani, R.: Efficient quadratic regularisation of expression arrays. Biostatistics 5, 329–340 (2004)
Witten, D., Tibshirani, R., Hastie, T.: A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10, 515–534 (2009)
Golub, T., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Alizadeh, A., et al.: Distinct types of diffuse large b-cell-lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000)
Sharma, P., et al.: Early detection of breast cancer based on gene-expression patterns in peripheral blood cells. Breast Cancer Research 7, R634–R644 (2005)
Khan, J., et al.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine 7, 673–679 (2001)
Dudoit, S., Fridlyand, J., Speed, I.: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of Americal Statistical Association 97, 77–87 (2002)
Dettling, M., Buhlmann, P.: Boosting for tumor classification with gene expression data. Bioinformatics 19, 1061–1069 (2003)
Peng, Y.: A novel ensemble machine learning for robust microarray data classification. Computers in Biology and Medicine 36, 553–573 (2006)
Wood, I., Visscher, P., Mengersen, K.: Classification based upon expression data: bias and precision of error rates. Bioinformatics 23, 1363–1370 (2007)
McLachlan, G., et al.: Analysing microarray gene expression data. Wiley, Hoboken (2004)
Ambroise, C., McLachlan, G.: Selection bias in gene extraction on the basis of microarray gene expression data. PNAS 99, 6562–6566 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nikulin, V., McLachlan, G.J. (2010). Penalized Principal Component Analysis of Microarray Data. In: Masulli, F., Peterson, L.E., Tagliaferri, R. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2009. Lecture Notes in Computer Science(), vol 6160. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14571-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-14571-1_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14570-4
Online ISBN: 978-3-642-14571-1
eBook Packages: Computer ScienceComputer Science (R0)