Abstract
High-dimensional genomic and proteomic data play an important role in many applications in medicine such as prognosis of diseases, diagnosis, prevention and molecular biology, to name a few. Classifying such data is a challenging task due to the various issues such as curse of dimensionality, noise and redundancy. Recently, some researchers have used the sparse representation (SR) techniques to analyze high-dimensional biological data in various applications in classification of cancer patients based on gene expression datasets. A common problem with all SR-based biological data classification methods is that they cannot utilize the topological (geometrical) structure of data. More precisely, these methods transfer the data into sparse feature space without preserving the local structure of data points. In this paper, we proposed a novel SR-based cancer classification algorithm based on gene expression data that takes into account the geometrical information of all data. Precisely speaking, we incorporate the local linear embedding algorithm into the sparse coding framework, by which we can preserve the geometrical structure of all data. For performance comparison, we applied our algorithm on six tumor gene expression datasets, by which we demonstrate that the proposed method achieves higher classification accuracy than state-of-the-art SR-based tumor classification algorithms.



Similar content being viewed by others
Notes
References
Aharon M, Elad M, Bruckstein A (2006) SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 54(11):4311–4322
Babagholami-Mohamadabadi B, Zarghami A, Pourhaghighi HA, Manzuri-Shalmani MT (2013) Probabilistic non-linear distance metric learning for constrained clustering. In: Proceedings of the 4th MultiClust Workshop on Multiple Clusterings, Multi-view Data, and Multi-source Knowledge-driven Clustering. ACM, pp 1–4
Babagholami-Mohamadabadi B, Jourabloo A, Zarghami A, Kasaei S (2014) A bayesian framework for sparse representation-based 3-d human pose estimation. IEEE Signal Process Lett 21(3):297–300
Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396
Bertucci F, Salas S, Eysteries S, Nasser V, Finetti P, Ginestier C, Charafe-Jauffret E, Loriod B, Bachelart L, Montfort J et al (2004) Gene expression profiling of colon cancer by DNA microarrays and correlation with histoclinical parameters. Oncogene 23(7):1377–1391
Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M et al (2001) Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci 98(24):13790–13795
Bryan K, Cunningham P, Bolshakova N (2006) Application of simulated annealing to the biclustering of gene expression data. IEEE Trans Inf Technol Biomed 10(3):519–525
Chen SS, Donoho DL, Saunders MA (1998) Atomic decomposition by basis pursuit. SIAM J Sci Comput 20(1):33–61
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Dyrskjøt L, Thykjaer T, Kruhøffer M, Jensen JL, Marcussen N, Hamilton-Dutoit S, Wolf H, Ørntoft TF (2003) Identifying distinct classes of bladder carcinoma using microarrays. Nat Genet 33(1):90–96
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537
Hale ET, Yin W, Zhang Y (2007) A fixed-point continuation method for l1-regularized minimization with applications to compressed sensing. CAAM TR07-07, Rice University, Houston
Hang X, Wu FX (2009) Sparse representation for classification of tumors using gene expression data. BioMed Res Int 2009:1–6
Huang DS, Zheng CH (2006) Independent component analysis-based penalized discriminant method for tumor classification using gene expression data. Bioinformatics 22(15):1855–1862
Kreutz-Delgado K, Murray JF, Rao BD, Engan K, Lee TW, Sejnowski TJ (2003) Dictionary learning algorithms for sparse representation. Neural Comput 15(2):349–396
Lopez FJ, Cuadros M, Cano C, Concha A, Blanco A (2012) Biomedical application of fuzzy association rules for identifying breast cancer biomarkers. Med Biol Eng Comput 50(9):981–990
Mairal J, Bach F, Ponce J, Sapiro G, Zisserman A (2008) Supervised dictionary learning. NIPS 21:1033–1040
Mallat SG, Zhang Z (1993) Matching pursuits with time-frequency dictionaries. IEEE Trans Signal Process 41(12):3397–3415
Mu T, Nandi AK, Rangayyan RM (2007) Classification of breast masses via nonlinear transformation of features based on a kernel matrix. Med Biol Eng Comput 45(8):769–780
Paul TK, Iba H (2009) Prediction of cancer class with majority voting genetic programming classifier using gene expression data. IEEE/ACM Trans Comput Biol Bioinf TCBB 6(2):353–367
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Sanchez R, Argaez M, Guillen P (2011) Sparse representation via ℓ1-minimization for underdetermined systems in classification of tumors with gene expression data. In: Annual international conference of the IEEE in engineering in medicine and biology society, EMBC, pp 3362–3366
Sawiris GP, Sherman-Baust CA, Becker KG, Cheadle C, Teichberg D, Morin PJ (2002) Development of a highly specialized cDNA array for the study and diagnosis of epithelial ovarian cancer. Cancer Res 62(10):2923–2928
Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RC, Gaasenbeek M, Angelo M, Reich M, Pinkus GS et al (2002) Diffuse large b-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 8(1):68–74
Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D’Amico AV, Richie JP et al (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209
Tang Y, Zhang YQ, Huang Z (2007) Development of two-stage SVM-RFE gene selection strategy for microarray expression data analysis. IEEE/ACM Trans Comput Biol Bioinf (TCBB) 4(3):365–381
Tenenbaum JB, De Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323
van’t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart A, Bernards R, Friend SH et al (2003) Expression profiling predicts outcome in breast cancer. Breast Cancer Res 5(1):57–58
Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31(2):210–227
Zhang Zy, Zha Hy (2004) Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. J Shanghai Univ (English Edition) 8(4):406–424
Zheng CH, Zhang L, Ng TY, Shiu CK, Huang DS (2011) Metasample-based sparse representation for tumor classification. IEEE/ACM Trans Comput Biol Bioinf (TCBB) 8(5):1273–1282
Zolfaghari M, Jourabloo A, Gozlou SG, Pedrood B, Manzuri-Shalmani MT (2014) 3D human pose estimation from image using couple sparse coding. Mach Vis Appl 25(6):1489–1499
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kolali Khormuji, M., Bazrafkan, M. A novel sparse coding algorithm for classification of tumors based on gene expression data. Med Biol Eng Comput 54, 869–876 (2016). https://doi.org/10.1007/s11517-015-1382-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11517-015-1382-8