Gene expression data from microarray has very high dimensionality, resulting in extremely large sample covariance matrices. In this paper, we investigate the applicability of the block principal components analysis and a variable selection method based on principal components loadings for dimension reduction prior to performing discriminant analysis on the data. In such cases, because of high correlations among variables, the Mahalanobis distances between clusters becomes very large due to ill-conditioning. It is shown in this paper that the Mahalanobis distance is unreliable when the condition number of the covariance matrix exceeds 480,000 or the natural log of the determinant of the covariance matrix is less than −26.3.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alberts, B., Johnson, J., Lewis, J., Raff, M., Roberts, K., and Walter, P. (2002). Molecular Biology of the Cell, Garden Science Publishing, London.
Antoniadis, A., Lambert-Lacroix, S., and Leblanc, F. (2003) Effective Dimension Reduction Methods for Tumor Classification Using Gene Expression Data. Bioinformatics, 19, 563-570.
Baldi, T. and Hatfield, G.W. (2002). DNA Microarrays and Gene Expressions. Cambridge University Press, U.K.
Bolshakova, N., Azuaje, F., and Cunningham, P (2004) An Integrated Tool for Microarray Data Clustering and Cluster Validity Assessment. Bioinformatics, 21, 451-455.
Burden, L. R.and Faires, J. D. (1997) Numerical Analysis. 6th edition. Brooks/Cole.
Chapra, S. C. and Canale, R. P. (2006) Numerical Methods for Engineers. 5th edition. McGraw-Hill.
Chen, Y., Radmacher, M., Simon, R., Ben-Dor, A., Yakhini. Z., Dougherty, E., and Bittner, M. (2000) Molecular Classification of Cutaneous Malignant Melanoma by Gene Expression Profiling Nature 406: 536-540.
Draghici, S. (2003) Data Analysis Tools for DNA Microarray Chapman and Hill/CRC.
Duggan D. J., Bittner M., Chen Y., Meltzer P., and Trent JM. (1999). Expression Profiling using cDNA Microarrays Nat Genet. Jan;21(1 Suppl):10-4. 1999.
Farnham I. M., Stetzenbach K. J., Singh A. K., and Johannesson K. H. (2000) Deciphering Groundwater Flow Systems in Oasis Valley, Nevada, Using Trace Elenment Chemistry, Multivariate Statistics, and Geographical Information System Mathematical Geology, Vol.32 No. 8.
Golub T.R., Slonim D. K., Tamayo P., Huard C., Gaasenbeek M., Mesirov J. P., Coller H., Loh M. L., Downing J. R., Caligiuri M. A., Bloomfield C. D., and Lander E. S. (1999) Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. SCIENCE VOL 286.
Johnson, R.A. and Wichern, D. W. (2002) Applied Multivariate Statistical Analysis. 5th edition. Prentice Hall.
Jolliffe, I. T. (1972) Discarding Variables in a Principal Component Analysis Applied Statistics, 21, 160-173; (1986) Principal Component Analysis, Springer-Verlag
Jones, N. C. and Pevzner, P. A. (2004) An Introduction to Bioinformatics Algorithms (Computational Molecular Biology) The MIT Press.
Kachigan, S. K. (1991) Multivariate Statistical Analysis 2nd edition. Radius.
Liu, A., Zhang, Y., Gehan, E., and Clarke. R. (2002) Block Principal Component Analysis with Application to Gene Microarray Data Classification. Statist. Med. 21:3465-3474.
Liu Z., Chen D., Bensmail H., and Xu Y. (2005) Clustering Gene Expression data with Kernel Principal Components Journal of Bioinformatics and Computational Biology Vol. 3, No. 2 2005: 303-316.
Miyano, S., Mesirov, J., Kasif, S., Istrail, S., Pevzner, P., and Waterman, M. (2005) Research in Computational Molecular Biology : 9th Annual International Conference, RECOMB 2005, Cambridge, MA, USA, MAY 14-18, 2005, Proceedings, RECOMB 2005 Springer.
Mount, D. W. (2001) Bioinformatics: Sequence and Genome Analysis Cold Spring Harbor Laboratory Press.
Nguyen D. V. and Rock D. M. (2001) Tumor Classification by Partial Least Squares Using Microarray Gene Expression Data. Bioinformatics, Vol. 18 no. 1 2001:39-50.
NIH Web Site: Cutaneous Malignant Melanoma data Lines: http://dc.nci.nih.gov/dataSets cDNA Microarray Data of the NCI 60 Cancer Cell: http://discover.nci.nih.gov
Qin J., Darrin P. L., and Noble W. S. (2003) Kernel Hierarchical Gene Clustering From Microarray Expression Data. Bioinformatics, Vol. 19 no. 16 2003:2097-2104.
Tootle, G., Singh; A. K.; Piechota, T.; Farnham, I. (in press). Long Lead-time Forecasting of U.S. Streamflow using Partial Least Squares Regression. ASCE Journal of Hydrologic Engineering.
Wold, H. (1966). Estimation of Principal Components and Related Models by Iterative Least Squares. Multivariate Analysis, (P. R. Krishnaiah, ed.), 391-420. New York: Acedemic Press.
Wold S. (1978) Cross-Validatory Estimation of the Number of Components in Factor and Principal Components Methods. Technometrics, 20:397-405
Wold S., Geladi K., and Ohman L. (1987) Multi-way Principal Components and PLS analysis. Journal of chemometrics, 1: 41-56
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Lee, S.H., Singh, A.K., Gewali, L.P. (2008). Dimension reduction for performing discriminant analysis for microarrays. In: Kelemen, A., Abraham, A., Liang, Y. (eds) Computational Intelligence in Medical Informatics. Studies in Computational Intelligence, vol 85. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75767-2_6
Download citation
DOI: https://doi.org/10.1007/978-3-540-75767-2_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75766-5
Online ISBN: 978-3-540-75767-2
eBook Packages: EngineeringEngineering (R0)