Cancer classification from serial analysis of gene expression with event models

Jin, Xin; Xu, Anbang; Bie, Rongfang

doi:10.1007/s10489-007-0079-6

Cancer classification from serial analysis of gene expression with event models

Published: 24 July 2007

Volume 29, pages 35–46, (2008)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Xin Jin¹,
Anbang Xu¹ &
Rongfang Bie¹

96 Accesses
3 Citations
Explore all metrics

Abstract

Cancer class prediction and discovery is beneficial to imperfect non-automated cancer diagnoses which affect patient cancer treatments. Serial Analysis of Gene Expression (SAGE) is a relatively new method for monitoring gene expression levels and is expected to contribute significantly to the progress in cancer treatment by enabling an automatic, precise and early diagnosis. A promising application of SAGE gene expression data is classification of cancers. In this paper, we build three event models (the multivariate Bernoulli model, the multinomial model and the normalized multinomial model) for SAGE gene expression profiles. The event models based methods are compared with the standard Naïve Bayes method. Both binary classification and multicategory classification are investigated. Experiments results on several SAGE datasets show that event models are better than standard Naïve Bayes in general. Normalized Information Gain (NIG), an extension of Information Gain (IG), is proposed for gene selection. The impact of gene correlation on the classification performance is investigated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T Jr, Hudson J, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Staudt LM (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403(6769):503–511
Article Google Scholar
Bittner M, Meltzer P, Chen Y, Jiang Y, Seftor E, Hendrix M, Radmacher M, Simon R, Yakhini Z, Ben-Dor A, Sampas N, Dougherty E, Wang E, Marincola F, Gooden C, Lueders J, Glatfelter A, Pollock P, Carpten J, Gillanders E, Leja D, Dietrich K, Beaudry C, Berens M, Alberts D, Sondak V, Hayward N, Trent J (2000) Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature 406(6795):536–540
Article Google Scholar
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537
Article Google Scholar
Helman P, Veroff R, Atlas SR, Willman CL (2004) A Bayesian network classification methodology for gene expression data. J Comput Biol 11(4):581–615
Article Google Scholar
Dettling M (2004) BagBoosting for tumor classification with gene expression data. Bioinformatics 20(18):3583–3593
Article Google Scholar
Velculescu VE, Zhang L, Vogelstein B, Kinzler KW (1995) Serial analysis of gene expression. Science 270:484–487
Article Google Scholar
Sander J, Ng RT, Sleumer MC, Saint Yuen M, Jones SJ (2005) A methodology for analyzing SAGE libraries for cancer profiling. ACM Trans Inf Syst 23(1):35–60
Article Google Scholar
Yamamoto M, Wakatsuki T, Hada A, Ryo A (2001) Use of serial analysis of gene expression (SAGE) technology. J Immunol Methods 250:45–66
Article Google Scholar
Ruijter JM, Van Kampen AHC, Baas F (2002) Statistical evaluation of SAGE libraries: consequences for experimental design. Physiol Genomics 11:37–44
Google Scholar
Patino WD, Mian OY, Hwang PM (2002) Serial analysis of gene expression. Circ Res 91:565–569
Article Google Scholar
Man MZ, Wang X, Wang Y (2000) Power SAGE: comparing statistical tests for SAGE experiments. Bioinformatics 16:953–959
Article Google Scholar
Ryo A, Kondoh N, Wakatsuki T, Hada A, Yamamoto N, Yamamoto M (2000) A modified serial analysis of gene expression that generates longer sequence tags by nonpalindromic cohesive linker ligation. Anal Biochem 277:160–162
Article Google Scholar
Polyak K, Riggins GJ (2001) Gene discovery using the serial analysis of gene expression technique: implication for cancer research. J Clin Oncol 19(11):2948–2958
Google Scholar
SAGENET (Accessed 2005) http://www.sagenet.org/findings/index.html
McCallum A, Nigam K (1998) A comparison of event models for naive Bayes text classification. In: Proceedings of AAAI-98 workshop on learning for text categorization. AAAI Press, Menlo Park pp 41–48
Google Scholar
Uren VS, Addis TA (2002) How weak text categorizers based upon different principles can strengthen performance. Comput J 45:511–524
Article MATH Google Scholar
Jin X, Xu A, Zhao G, Ma J, Bie R (2006) Event models for tumor classification with SAGE gene expression data. In: Alexandrov VN et al (eds) ICCS 2006, part II. Lecture notes in computer science, vol 3992, pp 775–782
Jin X, Zhou W, Bie R (2007) Multinomial event Naive Bayesian Modeling for SAGE Data Classification. Comput Stat
NCBI SAGE data: ftp://ftp.ncbi.nih.gov/pub/sage or http://www.ncbi.nlm.nih.gov/projects/SAGE/ (Accessed 2007)
SAGEMap (2005) http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL4
Weston GJ, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1/3):389–422
Article MATH Google Scholar
Cover T (1991) Elements of information theory. Wiley, New York
MATH Google Scholar
Han J, Kamber M (2000) Data mining concepts and techniques. Kaufmann, Los Altos
Google Scholar
Hall MA (1998) Correlation-based feature subset selection for machine learning. Hamilton, New Zealand
Google Scholar
Domingos P, Pazzani M (1997) On the optimality of the simple Bayesian classifier under zero-one loss. Mach Learn 29:103–130
Article MATH Google Scholar
Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29:131–163
Article MATH Google Scholar
Rennie JD, Shih L, Teevan J, Karger DR (2003) Tackling the poor assumptions of naive Bayes text classifiers. In: Twentieth international conference on machine learning, August 22 2003
Lewis DD (1998) Naive (Bayes) at forty: the independence assumption in information retrieval. In: Proceedings of ECML98
Hilden J (1984) Statistical diagnosis based on conditional independence does not require it. Comput Methods Biol Med 14(4):429–435
Article Google Scholar
Hellerstein J, Thathachar J, Rish I (2000) Recognizing end-user transactions in performance management. In: Proceedings of AAAI-2000, Austin, TX, pp 596–602
Li C, Haiyan H, Seth B, Jun L, Connie C, Wing W (2004) Clustering analysis of SAGE data using a Poisson approach. Genome Biol 5:R51
Article Google Scholar
Ng RT, Sander J, Sleumer MC (2001) Hierarchical cluster analysis of SAGE data for cancer profiling. BIOKDD 65-72

Download references

Author information

Authors and Affiliations

College of Information Science and Technology, Beijing Normal University, Beijing, 100875, China
Xin Jin, Anbang Xu & Rongfang Bie

Authors

Xin Jin
View author publications
You can also search for this author in PubMed Google Scholar
Anbang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Rongfang Bie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rongfang Bie.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jin, X., Xu, A. & Bie, R. Cancer classification from serial analysis of gene expression with event models. Appl Intell 29, 35–46 (2008). https://doi.org/10.1007/s10489-007-0079-6

Download citation

Received: 24 September 2006
Accepted: 14 June 2007
Published: 24 July 2007
Issue Date: August 2008
DOI: https://doi.org/10.1007/s10489-007-0079-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cancer classification from serial analysis of gene expression with event models

Abstract

Access this article

Similar content being viewed by others

iRDA: a new filter towards predictive, stable, and enriched candidate genes

A novel gene selection method for gene expression data for the task of cancer type classification

Tree Based Advanced Relative Expression Analysis

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cancer classification from serial analysis of gene expression with event models

Abstract

Access this article

Similar content being viewed by others

iRDA: a new filter towards predictive, stable, and enriched candidate genes

A novel gene selection method for gene expression data for the task of cancer type classification

Tree Based Advanced Relative Expression Analysis

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation