Abstract
We have recently proposed a rank-based approach as a new microarray data integration method. The rank-based approach, which converts the expression value of each sample into a rank value within the sample, enables us to directly integrate samples generated by different laboratories and microarray technologies. In this study, we show that a non-parametric scoring method can be efficiently employed for the rank-based data, and informative genes can be effectively extracted from the integrated rank-based data. To verify the statistical significance of the scoring results from the rank-based data, we compared the distribution of the score statistics to a set of distributions obtained from the randomly column-permuted data. We also validate our methods with experimental study using publicly available prostate microarray data. We compared the informative genes extracted from each individual data to the informative genes extracted from the integrated data. The results show that we can extract important prostate marker genes by directly integrating inter-study microarray data, which are missed in either single analysis.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Yoon, Y.M., Lee, J.C., Park, S.H.: Building a Classifier for Integrated Microarray Datasets through Two-Stage Approach. In: Proc. IEEE Symposium on Bioinformatics & Bioengineering, vol. 6, pp. 94–102 (2006)
Park, P.J., Pagano, M., Bonetti, M.: A nonparametric scoring algorithm for identifying informative genes from microarray data. In: Pacific Symposium on Biocomputing, pp. 52–63 (2001)
Golub, T.R., et al.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286, 531–537 (1999)
Rhodes, D.R., Barrette, T.R., Rubin, M.A., Ghosh, D., Chinnaiyan, M.: Meta-Analysis of Microarrays: Interstudy Validation of Gene Expression Profiles Reveals Pathway Dysregulation in Prostate Cancer. Cancer Research 52, 4427–4433 (2002)
Jiang, H., Deng, Y., Chen, H.S., Tao, L., Sha, Q., Chen, J., Tsai, C.J., Zhang, S.: Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes. BMC Bioinformatics 5, 81–93 (2004)
Cheadle, C., Vawter, M., Freed, W., Becker, K.: Analysis of Microarray Data Using Z Score Transformation. Journal of Molecular Diagnostics 5-2, 62–73 (2003)
Xu, L., Tan, A.C., Naiman, D.Q., Geman, D., Winslow, R.L.: Robust prostate cancer marker genes emerge from direct integration of inter–study microarray data. Bioinformatics Advance Access 21, 3905–3911 (2005)
Rosner, B.: Fundamentals of Biostatistics. Thompson 6, 540–544 (2003)
Shamir, B.A., Yakhini, R.Z.: Clustering gene expression patterns. J. Comput. Biol., 281–297 (1999)
Drăghici, S., Khatri, P., Martins, R.P., Ostermeier, G.C., Krawetz, S.A.: Global functional profiling of gene expression. Genomics 81, 98–104 (2003)
Rogers, S., Williams, R.D., Campbell, C.: Class Prediction with Microarray Datasets. In: Bioinformatics using Computational Intelligence paradigms. Studies in Fuzziness and Soft Computing, vol. 176, pp. 119–141 (2005)
Deng, L., Pei, J., Ma, J., Lee, D.L.: A Rank Sum Test Method for Informative Gene Discovery. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2004), vol. 176, pp. 410–419 (2004)
Witten, I.H., Frank, E.: DATA MINING Practical Machine Learning Tools and Techniques, pp. 97–112. Morgan Kaufmann, San Francisco (2005)
Marko, R., Igor, K.: Theoretical and Empirical Analysis of ReliefF and RReliefF. Machine Learning Journal 53, 23–69 (2003)
Bailey, N.: Statistical methods in biology. Cambridge University Press, Cambridge (1995)
LaTulippe, E., Satagopan, J., Smith, A., Scher, H., Scardino, P., Reuter, V.: Comprehensive gene expression analysis of prostate cancer reveals distinct transcriptional programs associated with metastatic disease. Cancer Res. 62, 4499–4506 (2002)
Welsh, J.B., Sapinoso, L.M., Su, A.I., Kern, S.G., Wang-Rodriguez, J., Moskaluk, C.A.: Analysis of gene expression identifies candidate markers and pharmacological targets in prostate cancer. Cancer Res. 61, 5974–5978 (2001)
Singh, D., Febbo, P., Ross, K., Jackson, D., Manola, J., Ladd, C.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1, 203–209 (2002)
Hood, B., et al.: Proteomic Analysis of Formalin Fixed Prostate Cancer Tissue. Molecular & Cellular Proteomics 4, 1741–1753 (2005)
Pal, P., et al.: Variants in the HEPSIN gene are associated with prostate cancer in men of European origin. Hum. Genet. 210, 187–192 (2006)
Bemd, G., et al.: Mass spectrometric identification of human prostate cancer-derived proteins in serum of xenograft-bearing mice. Molecular & Cellular Proteomics 5, 1830–1839 (2006)
Iwaki, H., et al.: A novel tumor-related protein, C7orf24, identified by proteome differential display of bladder urothelial carcinoma. PROTEOMICS - Clinical Applications 1, 192–199 (2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hong, D., Lee, J., Hong, S., Yoon, J., Park, S. (2008). Extraction of Informative Genes from Integrated Microarray Data. In: An, A., Matwin, S., Raś, Z.W., Ślęzak, D. (eds) Foundations of Intelligent Systems. ISMIS 2008. Lecture Notes in Computer Science(), vol 4994. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68123-6_68
Download citation
DOI: https://doi.org/10.1007/978-3-540-68123-6_68
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68122-9
Online ISBN: 978-3-540-68123-6
eBook Packages: Computer ScienceComputer Science (R0)