Skip to main content

Advertisement

Log in

A novel sparse coding algorithm for classification of tumors based on gene expression data

  • Original Article
  • Published:
Medical & Biological Engineering & Computing Aims and scope Submit manuscript

Abstract

High-dimensional genomic and proteomic data play an important role in many applications in medicine such as prognosis of diseases, diagnosis, prevention and molecular biology, to name a few. Classifying such data is a challenging task due to the various issues such as curse of dimensionality, noise and redundancy. Recently, some researchers have used the sparse representation (SR) techniques to analyze high-dimensional biological data in various applications in classification of cancer patients based on gene expression datasets. A common problem with all SR-based biological data classification methods is that they cannot utilize the topological (geometrical) structure of data. More precisely, these methods transfer the data into sparse feature space without preserving the local structure of data points. In this paper, we proposed a novel SR-based cancer classification algorithm based on gene expression data that takes into account the geometrical information of all data. Precisely speaking, we incorporate the local linear embedding algorithm into the sparse coding framework, by which we can preserve the geometrical structure of all data. For performance comparison, we applied our algorithm on six tumor gene expression datasets, by which we demonstrate that the proposed method achieves higher classification accuracy than state-of-the-art SR-based tumor classification algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. http://www.gems-system.org/.

References

  1. Aharon M, Elad M, Bruckstein A (2006) SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 54(11):4311–4322

    Article  Google Scholar 

  2. Babagholami-Mohamadabadi B, Zarghami A, Pourhaghighi HA, Manzuri-Shalmani MT (2013) Probabilistic non-linear distance metric learning for constrained clustering. In: Proceedings of the 4th MultiClust Workshop on Multiple Clusterings, Multi-view Data, and Multi-source Knowledge-driven Clustering. ACM, pp 1–4

  3. Babagholami-Mohamadabadi B, Jourabloo A, Zarghami A, Kasaei S (2014) A bayesian framework for sparse representation-based 3-d human pose estimation. IEEE Signal Process Lett 21(3):297–300

    Article  Google Scholar 

  4. Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396

    Article  Google Scholar 

  5. Bertucci F, Salas S, Eysteries S, Nasser V, Finetti P, Ginestier C, Charafe-Jauffret E, Loriod B, Bachelart L, Montfort J et al (2004) Gene expression profiling of colon cancer by DNA microarrays and correlation with histoclinical parameters. Oncogene 23(7):1377–1391

    Article  CAS  PubMed  Google Scholar 

  6. Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M et al (2001) Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci 98(24):13790–13795

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Bryan K, Cunningham P, Bolshakova N (2006) Application of simulated annealing to the biclustering of gene expression data. IEEE Trans Inf Technol Biomed 10(3):519–525

    Article  PubMed  Google Scholar 

  8. Chen SS, Donoho DL, Saunders MA (1998) Atomic decomposition by basis pursuit. SIAM J Sci Comput 20(1):33–61

    Article  CAS  Google Scholar 

  9. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    Google Scholar 

  10. Dyrskjøt L, Thykjaer T, Kruhøffer M, Jensen JL, Marcussen N, Hamilton-Dutoit S, Wolf H, Ørntoft TF (2003) Identifying distinct classes of bladder carcinoma using microarrays. Nat Genet 33(1):90–96

    Article  PubMed  Google Scholar 

  11. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537

    Article  CAS  PubMed  Google Scholar 

  12. Hale ET, Yin W, Zhang Y (2007) A fixed-point continuation method for l1-regularized minimization with applications to compressed sensing. CAAM TR07-07, Rice University, Houston

    Google Scholar 

  13. Hang X, Wu FX (2009) Sparse representation for classification of tumors using gene expression data. BioMed Res Int 2009:1–6

    Google Scholar 

  14. Huang DS, Zheng CH (2006) Independent component analysis-based penalized discriminant method for tumor classification using gene expression data. Bioinformatics 22(15):1855–1862

    Article  CAS  PubMed  Google Scholar 

  15. Kreutz-Delgado K, Murray JF, Rao BD, Engan K, Lee TW, Sejnowski TJ (2003) Dictionary learning algorithms for sparse representation. Neural Comput 15(2):349–396

    Article  PubMed  PubMed Central  Google Scholar 

  16. Lopez FJ, Cuadros M, Cano C, Concha A, Blanco A (2012) Biomedical application of fuzzy association rules for identifying breast cancer biomarkers. Med Biol Eng Comput 50(9):981–990

    Article  CAS  PubMed  Google Scholar 

  17. Mairal J, Bach F, Ponce J, Sapiro G, Zisserman A (2008) Supervised dictionary learning. NIPS 21:1033–1040

    Google Scholar 

  18. Mallat SG, Zhang Z (1993) Matching pursuits with time-frequency dictionaries. IEEE Trans Signal Process 41(12):3397–3415

    Article  Google Scholar 

  19. Mu T, Nandi AK, Rangayyan RM (2007) Classification of breast masses via nonlinear transformation of features based on a kernel matrix. Med Biol Eng Comput 45(8):769–780

    Article  PubMed  Google Scholar 

  20. Paul TK, Iba H (2009) Prediction of cancer class with majority voting genetic programming classifier using gene expression data. IEEE/ACM Trans Comput Biol Bioinf TCBB 6(2):353–367

    Article  CAS  Google Scholar 

  21. Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326

    Article  CAS  PubMed  Google Scholar 

  22. Sanchez R, Argaez M, Guillen P (2011) Sparse representation via ℓ1-minimization for underdetermined systems in classification of tumors with gene expression data. In: Annual international conference of the IEEE in engineering in medicine and biology society, EMBC, pp 3362–3366

  23. Sawiris GP, Sherman-Baust CA, Becker KG, Cheadle C, Teichberg D, Morin PJ (2002) Development of a highly specialized cDNA array for the study and diagnosis of epithelial ovarian cancer. Cancer Res 62(10):2923–2928

    CAS  PubMed  Google Scholar 

  24. Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RC, Gaasenbeek M, Angelo M, Reich M, Pinkus GS et al (2002) Diffuse large b-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 8(1):68–74

    Article  CAS  PubMed  Google Scholar 

  25. Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D’Amico AV, Richie JP et al (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209

    Article  CAS  PubMed  Google Scholar 

  26. Tang Y, Zhang YQ, Huang Z (2007) Development of two-stage SVM-RFE gene selection strategy for microarray expression data analysis. IEEE/ACM Trans Comput Biol Bioinf (TCBB) 4(3):365–381

    Article  CAS  Google Scholar 

  27. Tenenbaum JB, De Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323

    Article  CAS  PubMed  Google Scholar 

  28. van’t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart A, Bernards R, Friend SH et al (2003) Expression profiling predicts outcome in breast cancer. Breast Cancer Res 5(1):57–58

    Article  Google Scholar 

  29. Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31(2):210–227

    Article  PubMed  Google Scholar 

  30. Zhang Zy, Zha Hy (2004) Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. J Shanghai Univ (English Edition) 8(4):406–424

    Article  Google Scholar 

  31. Zheng CH, Zhang L, Ng TY, Shiu CK, Huang DS (2011) Metasample-based sparse representation for tumor classification. IEEE/ACM Trans Comput Biol Bioinf (TCBB) 8(5):1273–1282

    Article  Google Scholar 

  32. Zolfaghari M, Jourabloo A, Gozlou SG, Pedrood B, Manzuri-Shalmani MT (2014) 3D human pose estimation from image using couple sparse coding. Mach Vis Appl 25(6):1489–1499

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Morteza Kolali Khormuji.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kolali Khormuji, M., Bazrafkan, M. A novel sparse coding algorithm for classification of tumors based on gene expression data. Med Biol Eng Comput 54, 869–876 (2016). https://doi.org/10.1007/s11517-015-1382-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11517-015-1382-8

Keywords