Skip to main content
Log in

Multinomial event naive Bayesian modeling for SAGE data classification

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Recently developed SAGE technology enables us to simultaneously quantify the expression levels of thousands of genes in a population of cells. SAGE data is helpful in classification of different types of cancers. However, one main challenge in this task is the availability of a smaller number of samples compared to huge number of genes, many of which are irrelevant for classification. Another main challenge is that there is a lack of appropriate statistical methods that consider the specific properties of SAGE data. We propose an efficient solution by selecting relevant genes by information gain and building a multinomial event model for SAGE data. Promising results, in terms of accuracy, were obtained for the model proposed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Buckhaults P, Zhang Z, Chen YC, Wang TL, St Croix B, Saha S, Bardelli A, Morin PJ, Polyak K, Hruban RH, Velculescu VE, Shih IM (2003) Identifying tumor origin using a gene expression-based classification map. Cancer Res 63:4144–4149

    Google Scholar 

  • Cai L, Huang H, Blackshaw S, Liu J, Cepko C, Wong W (2004) Clustering analysis of SAGE data using a Poisson approach. Genome Biol 5:R51

    Article  Google Scholar 

  • Cover T, Thomas J (1991) Elements of information theory. Wiley, New York

    MATH  Google Scholar 

  • Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer class discovery and class prediction by gene expression monitoring. Science 286: 531–537

    Article  Google Scholar 

  • Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1/3):389–422

    Article  MATH  Google Scholar 

  • Karl-Michael S (2003) A comparison of event models for naive Bayes anti-spam e-mail filtering. In: Proceedings of the 10th conference of the European chapter of the Association for Computational Linguistics, Budapest, pp 307–314

  • McCallum A, Nigam K (1998) A comparison of event models for naive Bayes text classification. In: Proceedings of AAAI-98 workshop on learning for text categorization. AAAI Press, Menlo Park, pp 41–48

  • Ng RT, Sander J, Sleumer MC (2001) Hierarchical cluster analysis of SAGE data for cancer profiling. In: Proceedings of the ACM SIGKDD workshop on data mining in bioinformatics (BIOKDD), pp 65–72

  • Porter D, Weremowicz S, Chin K, Seth P, Keshaviah A, Lahti-Domenici J, Bae YK, Monitto CL, Merlos-Suarez A, Chan J, Hulette CM, Richardson A, Morton CC, Marks J, Duyao M, Hruban R, Gabrielson E, Gelman R, Polyak K (2003) A neural survival factor is a candidate oncogene in breast cancer. Proc Natl Acad Sci USA 100:10931–10936

    Google Scholar 

  • SAGEMap (2005) http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL4

  • Sander J, Ng RT, Sleumer MC, Yuen MS, Jones SJ (2005) A methodology for analyzing SAGE libraries for cancer profiling. Special issue on genomic information retrieval. ACM Trans Inf Syst 23(1):35–60

    Article  Google Scholar 

  • Statnikov A, Aliferis CF, Tsamardinos I, Hardin D, Levy S (2004) A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics. 21(5):631–643

    Article  Google Scholar 

  • Velculescu VE, Zhang L, Vogelstein B, Kinzler KW (1995) Serial analysis of gene expression. Science 270:484–487

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rongfang Bie.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jin, X., Zhou, W. & Bie, R. Multinomial event naive Bayesian modeling for SAGE data classification. Computational Statistics 22, 133–143 (2007). https://doi.org/10.1007/s00180-007-0029-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-007-0029-0

Keywords

Mathematics Subject Classification (2000)

Navigation