Skip to main content

Nonlinear Sparse Component Analysis with a Reference: Variable Selection in Genomics and Proteomics

  • Conference paper
  • First Online:
  • 2453 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9237))

Abstract

Many scenarios occurring in genomics and proteomics involve small number of labeled data and large number of variables. To create prediction models robust to overfitting variable selection is necessary. We propose variable selection method using nonlinear sparse component analysis with a reference representing either negative (healthy) or positive (cancer) class. Thereby, component comprised of cancer related variables is automatically inferred from the geometry of nonlinear mixture model with a reference. Proposed method is compared with 3 supervised and 2 unsupervised variable selection methods on two-class problems using 2 genomic and 2 proteomic datasets. Obtained results, which include analysis of biological relevance of selected genes, are comparable with those achieved by supervised methods. Thus, proposed method can possibly perform better on unseen data of the same cancer type.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Alon, U., et al.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. U.S.A. 96, 6745–6750 (1999)

    Article  Google Scholar 

  2. Singh, D., et al.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1, 203–209 (2002)

    Article  Google Scholar 

  3. Petricoin, E.F., et al.: Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359, 572–577 (2002)

    Article  Google Scholar 

  4. Petricoin, E.F., et al.: Serum proteomic patterns for detection of prostate cancer. J. Natl. Canc. Inst. 94, 1576–1578 (2002)

    Article  Google Scholar 

  5. Guyon, I., et al.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002)

    Article  MATH  Google Scholar 

  6. Statnikov, A., et al.: A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 21, 631–643 (2005)

    Article  Google Scholar 

  7. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2002)

    Google Scholar 

  8. Schölkopf, B., Smola, A.: Learning with Kernels. The MIT Press, Cambridge (2002)

    Google Scholar 

  9. Yuh, C.H., Bolouri, H., Davidson, E.H.: Genomic cis-regulatory logic: experimental and computational analysis of a sea urchin gene. Science 279, 1896–1902 (1998)

    Article  Google Scholar 

  10. Lee, S.I., Batzoglou, S.: Application of independent component analysis to microarrays. Genome Biol. 4, R76 (2003)

    Article  Google Scholar 

  11. Schachtner, R., et al.: Knowledge-based gene expression classification via matrix factorization. Bioinformatics 24, 1688–1697 (2008)

    Article  Google Scholar 

  12. Stadtlthanner, K., et al.: Hybridizing sparse component analysis with genetic algorithms for microarray analysis. Neurocomputing 71, 2356–2376 (2008)

    Article  Google Scholar 

  13. Gao, Y., Church, G.: Improving molecular cancer class discovery through sparse non-negative matrix factorization. Bioinformatics 21, 3970–3975 (2005)

    Article  Google Scholar 

  14. Kim, H., Park, H.: Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics 23, 1495–1502 (2007)

    Article  Google Scholar 

  15. Kopriva, I., Filipović, M.: A mixture model with a reference-based automatic selection of components for disease classification from protein and/or gene expression levels. BMC Bioinformatics 12, 496 (2011)

    Article  Google Scholar 

  16. Kopriva, I.: A Nonlinear Mixture Model Based Unsupervised Variable Selection in Genomics and Proteomics. In: Bioinformatics 2015 – 6th International Conference on Bioinformatics Models, Methods and Algorithms, pp. 85–92, Scitepress (2015)

    Google Scholar 

  17. Vapnik, V.: Statistical Learning Theory. Wiley-Interscience, New York (1998)

    MATH  Google Scholar 

  18. Brown, G.: A new perspective for information theoretic feature selection. J. Mach. Learn. Res. 5, 49–56 (2009)

    Google Scholar 

  19. Aliferis, C.F., et al.: Local causal and markov blanket induction for causal discovery and feature selection for classification - Part I: algorithms and empirical evaluation. J. Mach. Learn. Res. 11, 171–234 (2010)

    MATH  MathSciNet  Google Scholar 

  20. Gillis, N., Vavasis, S.A.: Fast and robust recursive algorithms for separable nonnegative matrix factorization. IEEE Trans. Pattern Anal. Mach. Intell. 36, 698–714 (2014)

    Article  Google Scholar 

  21. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2, 183–202 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  22. Statnikov, A., et al.: GEMS: A system for automated cancer diagnosis and biomarker discovery from microarray gene expression data. Int. J. Med. Inf. 74, 491–503 (2003)

    Article  Google Scholar 

  23. Artero-Castro, A., et al.: Rplp1 bypasses replicative senescence and contributes to transformation. Exp. Cell Res. 315, 1372–1383 (2009)

    Article  Google Scholar 

  24. Bin Amer, S.M., et al.: Gene expression profiling in women with breast cancer in a Saudi population. Saudi Med. J. 29, 507–513 (2008)

    Google Scholar 

  25. Alkhateeb, A.A., Connor, J.R.: The significance of ferritin in cancer: anti-oxidation, inflammation and tumorigenesis. Biochim. Biophys. Acta 1836, 245–254 (2013)

    Google Scholar 

  26. Guo, C., Liu, S., Sun, M.Z.: Novel insight into the role of GAPDH playing in tumor. Clin. Transl. Oncol. 15, 167–172 (2013)

    Article  Google Scholar 

  27. Leśniak, W., Słomnicki, Ł.P., Filipek, A.: S100A6 - new facts and features. Biochem Biophys Res Commun. 390, 1087–1092 (2009)

    Article  Google Scholar 

  28. Sribenja, S., et al.: Roles and mechanisms of β-thymosins in cell migration and cancer metastasis: an update. Cancer Invest. 31, 103–110 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ivica Kopriva .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Kopriva, I., Kapitanović, S., Čačev, T. (2015). Nonlinear Sparse Component Analysis with a Reference: Variable Selection in Genomics and Proteomics. In: Vincent, E., Yeredor, A., Koldovský, Z., Tichavský, P. (eds) Latent Variable Analysis and Signal Separation. LVA/ICA 2015. Lecture Notes in Computer Science(), vol 9237. Springer, Cham. https://doi.org/10.1007/978-3-319-22482-4_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-22482-4_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-22481-7

  • Online ISBN: 978-3-319-22482-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics