Abstract
Many scenarios occurring in genomics and proteomics involve small number of labeled data and large number of variables. To create prediction models robust to overfitting variable selection is necessary. We propose variable selection method using nonlinear sparse component analysis with a reference representing either negative (healthy) or positive (cancer) class. Thereby, component comprised of cancer related variables is automatically inferred from the geometry of nonlinear mixture model with a reference. Proposed method is compared with 3 supervised and 2 unsupervised variable selection methods on two-class problems using 2 genomic and 2 proteomic datasets. Obtained results, which include analysis of biological relevance of selected genes, are comparable with those achieved by supervised methods. Thus, proposed method can possibly perform better on unseen data of the same cancer type.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Alon, U., et al.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. U.S.A. 96, 6745–6750 (1999)
Singh, D., et al.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1, 203–209 (2002)
Petricoin, E.F., et al.: Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359, 572–577 (2002)
Petricoin, E.F., et al.: Serum proteomic patterns for detection of prostate cancer. J. Natl. Canc. Inst. 94, 1576–1578 (2002)
Guyon, I., et al.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002)
Statnikov, A., et al.: A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 21, 631–643 (2005)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2002)
Schölkopf, B., Smola, A.: Learning with Kernels. The MIT Press, Cambridge (2002)
Yuh, C.H., Bolouri, H., Davidson, E.H.: Genomic cis-regulatory logic: experimental and computational analysis of a sea urchin gene. Science 279, 1896–1902 (1998)
Lee, S.I., Batzoglou, S.: Application of independent component analysis to microarrays. Genome Biol. 4, R76 (2003)
Schachtner, R., et al.: Knowledge-based gene expression classification via matrix factorization. Bioinformatics 24, 1688–1697 (2008)
Stadtlthanner, K., et al.: Hybridizing sparse component analysis with genetic algorithms for microarray analysis. Neurocomputing 71, 2356–2376 (2008)
Gao, Y., Church, G.: Improving molecular cancer class discovery through sparse non-negative matrix factorization. Bioinformatics 21, 3970–3975 (2005)
Kim, H., Park, H.: Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics 23, 1495–1502 (2007)
Kopriva, I., Filipović, M.: A mixture model with a reference-based automatic selection of components for disease classification from protein and/or gene expression levels. BMC Bioinformatics 12, 496 (2011)
Kopriva, I.: A Nonlinear Mixture Model Based Unsupervised Variable Selection in Genomics and Proteomics. In: Bioinformatics 2015 – 6th International Conference on Bioinformatics Models, Methods and Algorithms, pp. 85–92, Scitepress (2015)
Vapnik, V.: Statistical Learning Theory. Wiley-Interscience, New York (1998)
Brown, G.: A new perspective for information theoretic feature selection. J. Mach. Learn. Res. 5, 49–56 (2009)
Aliferis, C.F., et al.: Local causal and markov blanket induction for causal discovery and feature selection for classification - Part I: algorithms and empirical evaluation. J. Mach. Learn. Res. 11, 171–234 (2010)
Gillis, N., Vavasis, S.A.: Fast and robust recursive algorithms for separable nonnegative matrix factorization. IEEE Trans. Pattern Anal. Mach. Intell. 36, 698–714 (2014)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2, 183–202 (2009)
Statnikov, A., et al.: GEMS: A system for automated cancer diagnosis and biomarker discovery from microarray gene expression data. Int. J. Med. Inf. 74, 491–503 (2003)
Artero-Castro, A., et al.: Rplp1 bypasses replicative senescence and contributes to transformation. Exp. Cell Res. 315, 1372–1383 (2009)
Bin Amer, S.M., et al.: Gene expression profiling in women with breast cancer in a Saudi population. Saudi Med. J. 29, 507–513 (2008)
Alkhateeb, A.A., Connor, J.R.: The significance of ferritin in cancer: anti-oxidation, inflammation and tumorigenesis. Biochim. Biophys. Acta 1836, 245–254 (2013)
Guo, C., Liu, S., Sun, M.Z.: Novel insight into the role of GAPDH playing in tumor. Clin. Transl. Oncol. 15, 167–172 (2013)
Leśniak, W., Słomnicki, Ł.P., Filipek, A.: S100A6 - new facts and features. Biochem Biophys Res Commun. 390, 1087–1092 (2009)
Sribenja, S., et al.: Roles and mechanisms of β-thymosins in cell migration and cancer metastasis: an update. Cancer Invest. 31, 103–110 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Kopriva, I., Kapitanović, S., Čačev, T. (2015). Nonlinear Sparse Component Analysis with a Reference: Variable Selection in Genomics and Proteomics. In: Vincent, E., Yeredor, A., Koldovský, Z., Tichavský, P. (eds) Latent Variable Analysis and Signal Separation. LVA/ICA 2015. Lecture Notes in Computer Science(), vol 9237. Springer, Cham. https://doi.org/10.1007/978-3-319-22482-4_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-22482-4_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22481-7
Online ISBN: 978-3-319-22482-4
eBook Packages: Computer ScienceComputer Science (R0)