Skip to main content

An Empirical Comparison of Dimensionality Reduction Methods for Classifying Gene and Protein Expression Datasets

  • Conference paper
Book cover Bioinformatics Research and Applications (ISBRA 2007)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4463))

Included in the following conference series:

Abstract

The recent explosion in availability of gene and protein expression data for cancer detection has necessitated the development of sophisticated machine learning tools for high dimensional data analysis. Previous attempts at gene expression analysis have typically used a linear dimensionality reduction method such as Principal Components Analysis (PCA). Linear dimensionality reduction methods do not however account for the inherent nonlinearity within the data. The motivation behind this work is to demonstrate that nonlinear dimensionality reduction methods are more adept at capturing the nonlinearity within the data compared to linear methods, and hence would result in better classification and potentially aid in the visualization and identification of new data classes. Consequently, in this paper, we empirically compare the performance of 3 commonly used linear versus 3 nonlinear dimensionality reduction techniques from the perspective of (a) distinguishing objects belonging to cancer and non-cancer classes and (b) new class discovery in high dimensional gene and protein expression studies for different types of cancer. Quantitative evaluation using a support vector machine and a decision tree classifier revealed statistically significant improvement in classification accuracy by using nonlinear dimensionality reduction methods compared to linear methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Peng, Y.: A novel ensemble machine learning for robust microarray data classification. Comput. Biol. Med. 36(6), 553–573 (2006)

    Article  Google Scholar 

  2. Shi, C., Chen, L.: Feature Dimension Reduction for Microarray Data Analysis Using Locally Linear Embedding. In: APBC, pp. 211–217 (2005)

    Google Scholar 

  3. Ye, J., et al.: Using Uncorrelated Discriminant Analysis for Tissue Classification with Gene Expression Data. IEEE/ACM Trans. Comput. Biology Bioinform. 1(6), 181–190 (2004)

    Google Scholar 

  4. Tan, A.C., Gilbert, D.: Ensemble machine learning on gene expression data for cancer classification. Applied Bioinformatics, 65–83 (2003)

    Google Scholar 

  5. Dai, J., et al.: Dimension Reduction for Classification with Gene Expression Microarray Data. Statistical Applications in Genetics and Mol. Biol. 5(1), 1–15 (2006)

    MathSciNet  Google Scholar 

  6. Madabhushi, A., et al.: Graph Embedding to Improve Supervised Classification and Novel Class Detection: Application to Prostate Cancer. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3749, pp. 729–737. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  7. Tenenbaum, J.B., et al.: A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science 290, 2319–2322 (2000)

    Article  Google Scholar 

  8. Roweis, S.T., Saul, L.: Nonlinear Dimensionality Reduction by Local Linear Embedding. Science 290, 2323–2326 (2000)

    Article  Google Scholar 

  9. Dawson, K., et al.: Sample phenotype clusters in high-density oligonucleotide microarray data sets are revealed using Isomap, a nonlinear algorithm. BMC Bioinformatics 6, 195 (2005)

    Article  Google Scholar 

  10. Nilsson, J., et al.: Approximate geodesic distances reveal biologically relevant structures in microarray data. Bioinformatics 20, 874–880 (2004)

    Article  Google Scholar 

  11. Shi, J., et al.: Comparing Ensembles of Learners: Detecting Prostate Cancer from High Resolution MRI. In: Beichel, R.R., Sonka, M. (eds.) CVAMIA 2006. LNCS, vol. 4241, pp. 25–36. Springer, Heidelberg (2006)

    Google Scholar 

  12. Golub, T.R., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)

    Article  Google Scholar 

  13. Shipp, M.A., et al.: Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat. Med. 8, 68–74 (2002)

    Article  Google Scholar 

  14. Gordon, G.J., et al.: Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res. 62, 4963–4967 (2002)

    Google Scholar 

  15. Beer, D., et al.: Gene-expression Profiles Predict Survival of Patients with Lung Adenocarcinoma. Nature Medicine 8(8), 816–823 (2002)

    Google Scholar 

  16. Petricoin, E.F., et al.: Use of proteomic patterns in serum to identify ovarian cancer. The Lancet 359(9306), 572–577 (2002)

    Article  Google Scholar 

  17. Singh, D., et al.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1, 203–209 (2002)

    Article  Google Scholar 

  18. Alizadeh, A.A., et al.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000)

    Article  Google Scholar 

  19. Yeoh, E.J., et al.: Classification, Subtype Discovery, and Prediction of Outcome in Pediatric Acute Lymphoblastic Leukemia by Gene Expression Profiling. Cancer Cell 1(2), 133–143 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Ion Măndoiu Alexander Zelikovsky

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lee, G., Rodriguez, C., Madabhushi, A. (2007). An Empirical Comparison of Dimensionality Reduction Methods for Classifying Gene and Protein Expression Datasets. In: Măndoiu, I., Zelikovsky, A. (eds) Bioinformatics Research and Applications. ISBRA 2007. Lecture Notes in Computer Science(), vol 4463. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72031-7_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-72031-7_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-72030-0

  • Online ISBN: 978-3-540-72031-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics