Abstract
Microarrays have unique ability to probe thousands of genes at a time that makes it a useful tool for variety of applications, ranging from diagnosis to drug discovery. However, data generated by microarrays often contains multiple missing gene expressions that affect the subsequent analysis, as most of the times these missing values are ignored. In this paper we have analyzed how accurate estimation of missing values can lead to better subsequent gene selection and class prediction. Collateral Missing Values Estimation (CMVE), which demonstrates superior imputation performance compared to Bayesian Principal Component Analysis (BPCA) Impute, K-Nearest Neighbour (KNN) algorithm, when estimating missing values in the BRCA1, BRCA2 and Sporadic genetic mutation samples present in ovarian cancer by exploiting both local/global and positive/negative correlation values. CMVE also consistently outperforms, in terms of classification accuracies, BPCA, KNN and ZeroImpute techniques. The imputation is followed by gene selection using fusion of Between Group to within Group Sum ofSquares and Weighted Partial Least Squares where Ridge Partial Least Square algorithm is used as a class predictor.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Sehgal, M.S.B., Gondal, I., Dooley, L.: Statistical Neural Networks and Support Vector Machine for the Classification of Genetic Mutations in Ovarian Cancer. In: IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB 2004), USA, pp. 140–146 (2004)
Bhattacharjee, A., Richards, W.G., Staunton, J., Li, C., Monti, S., Vasa, P., Ladd, C., Beheshti, J., Bueno, R., Gillette, M., Loda, M., Weber, G., Mark, E.F., Lander, E.S., Wong, W., Johnson, B.E., Golub, T.R., Sugarbaker, D.J., Meyerson, M.: Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Presented at Proc. Natl. Acad. Sci, USA (2001)
Oba, S., Sato, M.A., Takemasa, I., Monden, M., Matsubara, K., Ishii, S.: A Bayesian Missing Value Estimation Method for Gene Expression Profile Data. Bioinformatics 19, 2088–2096 (2003)
Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 77–78 (2002)
Sehgal, M.S.B., Gondal, I., Dooley, L.: K-Ranked Covariance Based Missing Values Estimation for Microarray Data Classification. IEEE Hybrid Intelligent Systems (HIS 2004) 00, 274–279 (2004)
Acuna, E., Rodriguez, C.: The treatment of missing values and its effect in the classifier accuracy. In: Classification, Clustering and Data Mining Applications, pp. 639–648 (2004)
Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D., Altman, R.: Missing Value Estimation Methods for DNA Microarrays. Bioinformatics 17, 520–525 (2001)
Sehgal, M.S.B., Gondal, I., Dooley, L.: A Collateral Missing Value Estimation Algorithm for DNA Microarrays. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2005, USA, pp. 377–380 (2005)
Sehgal, M.S.B., Gondal, I., Dooley, L.: Support Vector Machine and Generalized Regression Neural Network Based Classification Fusion Models for Cancer Diagnosis. In: IEEE Hybrid Intelligent Systems, HIS 2004, Japan, pp. 49–54 (2004)
Antoniadis, A., Lambert-Lacroix, S., Leblanc, F.: Effective dimension reduction methods for tumor classification using gene expression data. Bioinformatics 19(5), 563–570 (2003)
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasen-beek, M., Mesirov, J.P., Coller, H., Loh, M.L., Down-ing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lan-der, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
Broët, P., Lewin, A., Richardson, S., Dalmasso, C., Magdelenat, H.: A mixture model-based strategy for selecting sets of genes in multiclass response microarray experiments. Bioinformatics 20, 2562–2571 (2004)
Liu, X., Krishnan, A., Mondry, A.: An Entropy-based gene selection method for cancer classification using microarray data. BMC Bioinformatics 6(76) (2005)
Sehgal, M.S.B., Gondal, I., Dooley, L.: Collateral Missing Value Imputation: a new robust missing value estimation algorithm for microarray data. Bioinformatics 21(10), 2417–2423 (2005)
Hedenfalk, I., Duggan, D., Chen, Y., Radmacher, M., Bittner, M., Simon, R., Meltzer, P., Gusterson, B., Esteller, M., Kallioniemi, O.P., Wilfond, B., Borg, A., Trent, J.: Gene-expression profiles in hereditary breast cancer. N. Engl. J. Med. 344(8), 22, 539–548 (2001)
Fort, G., Lambert-Lacroix, S.: Classification using partial least squares with penalized logistic regression. Bioinformatics 21, 1104–1111 (2005)
Harvey, M., Arthur, C.: Fitting models to biological Data using linear and nonlinear regression. Oxford University Press, Oxford (2004)
Yeung, K.Y., Bumgarner, R.E., Raftery, A.E.: Bayesian Model Averaging: development of an improved multi-class, gene selection and classification tool for microarray data. Bioinformatics 21(10), 2394–2402 (2005)
Zhou, X., Wang, X., Dougherty, E.R.: Gene Selection Using Logistic Regressions Based on AIC, BIC and MDL Criteria. New Mathematics and Natural Computation 1, 129–145 (2005)
Amir, A.J., Yee, C.J., Sotiriou, C., Brantley, K.R., Boyd, J., Liu, E.T.: Gene Expression Profiles of Brca1-Linked, Brca2-Linked, and Sporadic Ovarian Cancers. Journal of the National Cancer Institute 94(13) (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sehgal, M.S.B., Gondal, I., Dooley, L. (2005). Collateral Missing Value Estimation: Robust Missing Value Estimation for Consequent Microarray Data Processing. In: Zhang, S., Jarvis, R. (eds) AI 2005: Advances in Artificial Intelligence. AI 2005. Lecture Notes in Computer Science(), vol 3809. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11589990_30
Download citation
DOI: https://doi.org/10.1007/11589990_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30462-3
Online ISBN: 978-3-540-31652-7
eBook Packages: Computer ScienceComputer Science (R0)