Abstract
One of the most challenging problems in microarray study is to analyze microarray data from different platforms. This will improve the reliability of the study, as number of samples is larger and it can be applied for rare disease study, for which only a few microarray data have been published. As different microarray platforms cover different number of genes, so the integrative study of two different platforms needs to be able to deal with the missing value issue. Many works have been done for cross-platform microarray data utilization but none of them have focused on gene-set based microarray data classification. In this study, we applied the Bayesian-based method to reconstruct the expression level of the missing genes before transforming it to the gene-set activity. Two gene-set activity transformation methods; Negatively Correlated Feature Set (NCFS-i) and Analysis-of-Variance Feature Set (AFS), were used to evaluate the performance of this method using actual microarray datasets. The results show that the imputation of missing data can improve the classification performance of the cross-platform study.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Kerr, M.K., Martin, M., Churchill, G.A.: Analysis of variance for gene expression microarray data. Journal of Computational Biology 7(6), 819–837 (2000)
Quackenbush, J.: Computational analysis of microarray data. Nature Reviews Genetics 2(6), 418–427 (2001)
Rhodes, D.R., Yu, J., Shanker, K., Deshpande, N., Varambally, R., Ghosh, D., Barrette, T., Pandey, A., Chinnaiyan, A.M.: Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. Proceedings of the National Academy of Sciences of the United States of America 101(25), 9309–9314 (2004)
Lee, E., Chuang, H.Y., Kim, J.W., Ideker, T., Lee, D.: Inferring pathway activity toward precise disease classification. PLoS Computational Biology 4(11), e1000217 (2008)
Sootanan, P., Prom-on, S., Meechai, A., Chan, J.H.: Pathway-based microarray analysis for robust disease classification. Neural Computing and Applications 21(4), 649–660 (2012)
Engchuan, W., Chan, J.H.: Pathway activity transformation for multi-class classification of lung cancer datasets. Neurocomputing (in press, 2014)
Choi, J.K., Yu, U., Kim, S., Yoo, O.J.: Combining multiple microarray studies and modeling inter study variation. Bioinformatics 19, i84–i90 (2003)
Xu, L., Tan, A.C., Naiman, D.Q., Geman, D., Winslow, R.L.: Robust prostate cancer marker genes emerge from direct integration of inter-study microarray data. Bioinformatics 21(20), 3905–3911 (2005)
Benito, M., Parker, J., Du, Q., Wu, J., Xiang, D., Perou, C.M., Marron, J.S.: Adjustment of systematic microarray data biases. Bioinformatics 20(1), 105–114 (2004)
Chen, C., Grennan, K., Badner, J., Zhang, D., Gershon, E., Jin, L., Liu, C.: Removing batch effects in analysis of expression microarray data: an evaluation of six batch adjustment methods. PloS One 6(2), e17238 (2011)
Tan, P.K., Downey, T.J., Spitznagel Jr., E.L., Xu, P., Fu, D., Dimitrov, D.S., Cam, M.C.: Evaluation of gene expression measurements from commercial microarray platforms. Nucleic Acids Research 31(19), 5676–5684 (2003)
Irizarry, R.A., Warren, D., Spencer, F., Kim, I.F., Biswal, S., Frank, B.C., Yu, W.: Multiple-laboratory comparison of microarray platforms. Nature Methods 2(5), 345–350 (2005)
Howell, D.C.: The treatment of missing data. In: The Sage Handbook of Social Science Methodology, pp. 208–224 (2007)
Donders, A.R.T., van der Heijden, G.J., Stijnen, T., Moons, K.G.: Review: a gentle introduction to imputation of missing values. Journal of Clinical Epidemiology 59(10), 1087–1091 (2006)
Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Bostein, D., Altman, R.B.: Missing value estimation methods for DNA microarrays. Bioinformatics 17(6), 520–525 (2001)
Oba, S., Sato, M.A., Takemasa, I., Monden, M., Matsubara, K.I., Ishii, S.: A Bayesian missing value estimation method for gene expression profile data. Bioinformatics 19(16), 2088–2096 (2003)
Brock, G.N., Shaffer, J.R., Blakesley, R.E., Lotz, M.J., Tseng, G.C.: Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes. BMC Bioinformatics 9(1), 12 (2008)
Pedreschi, R., Hertog, M.L., Carpentier, S.C., Lammertyn, J., Robben, J., Noben, J.P., Panis, B., Swennen, R., Nicolaï, B.M.: Treatment of missing values for multivariate statistical analysis of gel based proteomics data. Proteomics 8(7), 1371–1383 (2008)
Liew, A.W.C., Law, N.F., Yan, H.: Missing value imputation for gene expression data: computational techniques to recover missing data from available information. Briefings in Bioinformatics 12(5), 498–513 (2011)
Edgar, R., Domrachev, M., Lash, A.E.: Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Research 30(1), 207–210 (2002)
Liberzon, A., Subramanian, A., Pinchback, R., Thorvaldsdóttir, H., Tamayo, P., Mesirov, J.P.: Molecular signatures database (MSigDB) 3.0. Bioinformatics 27(12), 1739–1740 (2011)
Stacklies, W., Redestig, H., Scholz, M., Walther, D., Selbig, J.: pcaMethods—a biocon-ductor package providing PCA methods for incomplete data. Bioinformatics 23(9), 1164–1167 (2007)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. The Journal of Machine Learning Research 3, 1157–1182 (2003)
Kononenko, I.: Estimating attributes: analysis and extensions of RELIEF. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994)
Robnik-Šikonja, M., Kononenko, I.: Theoretical and empirical analysis of Relief-F and RRelief-F. Machine Learning 53(1-2), 23–69 (2003)
Wang, Y., Makedon, F.: Application of Relief-F feature filtering algorithm to selecting informative genes for cancer classification using microarray data. In: Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference, CSB 2004, pp. 497–498 (2004)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter 11(1), 10–18 (2009)
Pletscher-Frankild, S., Pallejà, A., Tsafou, K., Binder, J.X., Jensen, L.J.: DISEASES: Text mining and data integration of disease–gene associations, bioRxiv, 008425 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Engchuan, W., Meechai, A., Tongsima, S., Chan, J.H. (2015). Cross-Platform Pathway Activity Transformation and Classification of Microarray Data. In: Phon-Amnuaisuk, S., Au, T. (eds) Computational Intelligence in Information Systems. Advances in Intelligent Systems and Computing, vol 331. Springer, Cham. https://doi.org/10.1007/978-3-319-13153-5_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-13153-5_14
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13152-8
Online ISBN: 978-3-319-13153-5
eBook Packages: EngineeringEngineering (R0)