Mining gene–sample–time microarray data: a coherent gene cluster discovery approach

Jiang, Daxin; Pei, Jian; Ramanathan, Murali; Lin, Chuan; Tang, Chun; Zhang, Aidong

doi:10.1007/s10115-006-0031-9

Mining gene–sample–time microarray data: a coherent gene cluster discovery approach

Regular Paper
Published: 21 September 2006

Volume 13, pages 305–335, (2007)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Daxin Jiang¹,
Jian Pei²,
Murali Ramanathan³,
Chuan Lin³,
Chun Tang³ &
…
Aidong Zhang³

134 Accesses
9 Citations
Explore all metrics

Abstract

Extensive studies have shown that mining microarray data sets is important in bioinformatics research and biomedical applications. In this paper, we explore a novel type of gene–sample–time microarray data sets that records the expression levels of various genes under a set of samples during a series of time points. In particular, we propose the mining of coherent gene clusters from such data sets. Each cluster contains a subset of genes and a subset of samples such that the genes are coherent on the samples along the time series. The coherent gene clusters may identify the samples corresponding to some phenotypes (e.g., diseases), and suggest the candidate genes correlated to the phenotypes. We present two efficient algorithms, namely the Sample-Gene Search and the Gene–Sample Search, to mine the complete set of coherent gene clusters. We empirically evaluate the performance of our approaches on both a real microarray data set and synthetic data sets. The test results have shown that our approaches are both efficient and effective to find meaningful coherent gene clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

An Incremental Linear Programming Based Tool for Analyzing Gene Expression Data

Marked Point Processes for Microarray Data Clustering

Clustering: A Novel Meta-Analysis Approach for Differentially Expressed Gene Detection

References

Alizadeh AA, Eisen MB, Davis RE et al (2000) Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature 403:503–511
Article Google Scholar
Alon U, Barkai N, Notterman DA et al (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide array. Proc Natl Acad Sci USA 96(12):6745–6750
Article Google Scholar
Alter O, Brown PO, Bostein D (2000) Singular value decomposition for genome-wide expression data processing and modeling. Proc Natl Acad Sci USA 97(18):10101–10106
Article Google Scholar
Bayardo RJ (1998) Efficiently mining long patterns from databases. In: Proceeding of the 1998 ACM-SIGMOD international conference management of data (SIGMOD'98), Seattle, WA, pp 85–93
Ben-Dor A, Friedman N, Yakhini Z (2001) Class discovery in gene expression data. In: Proceeding of the fifth annual international conference on computational molecular biology (RECOMB 2001) ACM Press, pp 31–38
Blake JA, Harris M (2003) The gene ontology project: structured vocabularies for molecular biology and their application to genome and expression analysis. In: Current protocols in bioinformatics Wiley, New York
Google Scholar
Cheng Y, Church GM (2000) Biclustering of expression data. Proc ISMB'00 8:93–103
Google Scholar
Der SD, Zhou A, Williams BR, Silverman RH (1998) Identification of genes differentially regulated by interferon alpha, beta, or gamma using oligonucleotide arrays. Proc Natl Acad Sci USA 95(26):15623–15628
Article Google Scholar
Ding C (2002) Analysis of gene expression profiles: class discovery and leaf ordering. In: Proceeding of the international conference on computational molecular biology (RECOMB). Washington, DC, pp 127–136
Dudoit S, Fridlyand J, Speed TP (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 77–87
Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95(25):14863–14868
Article Google Scholar
Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41(8):578–588
Article MATH Google Scholar
Golub TR, Slonim DK, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(15):531–537
Article Google Scholar
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceeding of 2000 ACM-SIGMOD international conference management of data (SIGMOD'00), Dallas, TX, pp 1–12
Hartuv E, Shamir R (2000) A clustering algorithm based on graph connectivity. Inf Process Lett 76(4–6):175–181
Article MATH MathSciNet Google Scholar
Herrero J, Valencia A, Dopazo J (2001) A hierarchical unsupervised growing neural network for clustering gene expression patterns. Bioinformatics 17:126–136
Article Google Scholar
Heyer LJ, Kruglyak S, Yooseph S (1999) Exploring expression data: identification and analysis of coexpressed genes. Genome Res 9(11):1106–1115
Article Google Scholar
Holter NS, Mitra M, Maritan A, Cieplak M, Banavar JR, Fedoroff NV (2000) Fundamental patterns underlying gene expression profiles: simplicity from complexity. Proc Natl Acad Sci USA 97(15):8409–8414
Article Google Scholar
Jiang D, Pei J, Zhang A (2003) Interactive exploration of coherent patterns in time-series gene expression data. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (KDD'03), Washington, DC, USA
Jiang D, Pei J, Zhang A (2005) A general approach to mining quality pattern-based clusters from gene expression data. In: Proceedings of the 10th international conference on database systems for advanced applications (DASFAA'05), Beijing, China
Jiang D, Pei J, Ramanathan M, Tang C, Zhang A (2004) Mining coherent gene clusters from gene-sample-time microarray data. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining (KDD'04) ACM Press, pp 430–439
Kerr K, Churchill G (2001) Statistical design and the analysis of gene expression microarrays. Genet Res 77:123–128
Article Google Scholar
Liu J, Wang W (2003) Op-cluster: clustering by tendency in high dimensional space. In: Proceedings of the third IEEE international conference on data mining (ICDM'03), IEEE, Melbourne, Florida
Google Scholar
Moler EJ, Chow ML, Mian IS (2000) Analysis of molecular profile data using generative and discriminative methods. Physiol Genomics 4(2):109–126
Google Scholar
Pei J, Han J, Mao R (2000) CLOSET: an efficient algorithm for mining frequent closed itemsets. In: Proceeding of 2000 ACM-SIGMOD international workshop data mining and knowledge discovery (DMKD'00), Dallas, TX, pp 11–20
Pei J, Zhang X, Cho M, Wang H, Yu PS (2003) MaPle: a fast algorithm for maximal pattern-based clusterin. In: Proceedings of the third IEEE international conference on data mining (ICDM'03)
Ralf-Herwig PA, Muller C, Bull C, Lehrach H, O'Brien J (1999) Large-scale clustering of cDNA-fingerprinting data. Genome Res 9:1093–1105
Article Google Scholar
Rymon R (1992) Search through systematic set enumeration. In: Proceeding of 1992 international conference principle of knowledge representation and reasoning (KR'92), Cambridge, MA, pp 539–550
Seo J, Shneiderman B (2002) Interactively exploring hierarchical clustering results. IEEE Comput 35(7):80–86
Google Scholar
Shamir R, Sharan R (2000) Click: a clustering algorithm for gene expression analysis. In: Proceedings of ISMB '00
Smet FD, Mathys J, Marchal K et al (2002) Adaptive quality-based clustering of gene expression profiles. Bioinformatics 18:735–746
Article Google Scholar
Stark GR, Kerr IM, Williams BR, Silverman RH, Schreiber RD (1998) How cells respond to interferons. Ann Rev Biochem 67:227–264
Article Google Scholar
Tamayo P, Solni D, Mesirov J et al (1999) Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci USA 96(6):2907–2912
Article Google Scholar
Tang C, Zhang A, Pei J (2003) Mining phenotypes and informative genes from gene expression data. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD'03), Washington, DC, USA
Tavazoie S, Hughes D, Campbell MJ et al (1999) Systematic determination of genetic network architecture. Nature Genet 281–285
Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman R (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17(6):520–525
Article Google Scholar
Tusher VG, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 98(9):5116–5121
Article MATH Google Scholar
Wang W, Yang J, Wang H, Yu PS (2002) Clustering by pattern similarity in large data sets. In: Proceeding of 2002 ACM-SIGMOD international conference on management of data (SIGMOD'02), Madison, WI
Weinstock-Guttman B, Badgett D, Patrick K, Hartrich L, Hall D, Baier M, Feichter J, Ramanathan M (2003) Genomic effects of interferon-b in multiple sclerosis patients. J Immun 171(5):2694–2702
Google Scholar
Xing EP, Karp RM (2001) Cliff: clustering of high-dimensional microarray data via iterative feature filtering using normalized cuts. Bioinformatics 17(1):306–315
Google Scholar
Yang J, Wang W, Wang H, Yu PS (2002) δ-cluster: capturing subspace correlation in a large data set. In: Proceedings of 18th international conference on data engineering (ICDE 2002), pp 517–528
Yeung KY, Fraley C, Murua A, Raftery AE, Ruzzo WL (2001) Model-based clustering and data transformations for gene expression data. Bioinformatics 17:977–987
Article Google Scholar
Zaki MJ, Hsiao CJ (2002) CHARM: an efficient algorithm for closed itemset mining. In: Proceeding of 2002 SIAM international conference on data mining, Arlington, VA, pp 457–473

Download references

Author information

Authors and Affiliations

School of Computing Engineering, Nanyang Technological University, Blk N4, 2a-32, Nanyang Avenue, 639798, Singapore
Daxin Jiang
Simon Fraser University, Burnaby, Canada
Jian Pei
State University of New York at Buffalo, Buffalo, New York, USA
Murali Ramanathan, Chuan Lin, Chun Tang & Aidong Zhang

Authors

Daxin Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Jian Pei
View author publications
You can also search for this author in PubMed Google Scholar
Murali Ramanathan
View author publications
You can also search for this author in PubMed Google Scholar
Chuan Lin
View author publications
You can also search for this author in PubMed Google Scholar
Chun Tang
View author publications
You can also search for this author in PubMed Google Scholar
Aidong Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daxin Jiang.

Additional information

Daxin Jiang received the Ph.D. degree in computer science and engineering from the State University of New York at Buffalo in 2005. He received the B.S. degree in computer science from the University of Science and Technology of China. From 1998 to 2000, he was a M.S. student in Software Institute, Chinese Academy of Sciences. He is currently an assistant professor at the School of Computer Engineering, Nanyang Technology University, Singapore. His research interests include data mining, bioinformatics, machine learning, and information retrieval.

Jian Pei received the Ph.D. degree in computing science from Simon Fraser University, Canada, in 2002, under Dr. Jiawei Han's supervision. He also received the B.Eng. and the M.Eng. degrees from Shanghai Jiao Tong University, China, in 1991 and 1993, respectively, both in Computer Science. He is currently an assistant professor of computing science at Simon Fraser University. His research interests include developing effective and efficient data analysis techniques for novel data intensive applications. He is currently interested in various techniques of data mining, data warehousing, online analytical processing, and database systems, as well as their applications in bioinformatics. His current research is supported in part by the Natural Sciences and Engineering Research Council of Canada (NSERC) and the National Science Foundation (NSF) of the United States. Since 2000, he has published over 70 research papers in refereed journals, conferences, and workshops, has served in the organization committees and the program committees of over 60 international conferences and workshops, and has been a reviewer for some leading academic journals. He is a member of the ACM, the ACM SIGMOD, and the ACM SIGKDD.

Murali Ramanathan is an associate professor of pharmaceutical sciences and neurology. He received the B.Tech. (Honors) in chemical engineering from the Indian Institute of Technology, India, in 1983. After a 4-year stint in the chemical industry, he obtained the M.S. degree in chemical engineering from Iowa State University, Ames, IA, in 1987, and the Ph.D. degree in bioengineering from the University of California-San Francisco and University of California-Berkeley Joint Program in Bioengineering in 1994. Dr. Ramanathan research interests are primarily focused on the treatment of multiple sclerosis (MS), an inflammatory-demyelinating disease of the central nervous system that affects over 1 million patients worldwide. MS is a complex, variable disease that causes physical and cognitive disability and nearly 50% of patients diagnosed with MS are unable to walk after 15 years. The etiology and pathogenesis of MS remains poorly understood. Dr. Ramanathan's research interests include stochastic modeling of pharmaceutical systems and novel approaches to analyzing and using genetic and genomic data for improving patient care and optimizing therapy.

Chuan Lin is currently a Ph.D. student in the Department of Computer Science and Engineering, State University of New York at Buffalo. She received the B.E. and the M.S. degrees in computer science and technology from Tsinghua University in China. Her research interests include bioinformatics, data mining, and machine learning.

Chun Tang received the B.S. and M.S. degrees from Peking University, China, in 1996 and 1999, respectively, and the Ph.D. degree from State University of New York at Buffalo, USA, in 2005, all in computer science. Currently, she is a postdoctoral associate of Center for Medical Informatics, Yale University. Her research interests include bioinformatics, data mining, machine learning, database, and information retrieval.

Aidong Zhang received the Ph.D. degree in computer science from Purdue University, West Lafayette, Indiana, in 1994. She was an assistant professor from 1994 to 1999, an associate professor from 1999 to 2002, and has been a professor since 2002 in the Department of Computer Science and Engineering at State University of New York at Buffalo. Her research interests include multimedia systems, content-based image retrieval, bioinformatics, and data mining. She is an author of over 140 research publications in these areas. Dr. Zhang's research has been funded by NSF, NIH, NIMA, and Xerox. Zhang serves on the editorial boards of International Journal of Bioinformatics Research and Applications (IJBRA), ACM Multimedia Systems, International Journal of Multimedia Tools and Applications, and International Journal of Distributed and Parallel Databases. She was the editor for ACM SIGMOD DiSC (Digital Symposium Collection) from 2001 to 2003. She was co-chair of the technical program committee for ACM Multimedia in 2001. She has also served on various conference program committees. Dr. Zhang is a recipient of the National Science Foundation CAREER award and SUNY Chancellor's Research Recognition award.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, D., Pei, J., Ramanathan, M. et al. Mining gene–sample–time microarray data: a coherent gene cluster discovery approach. Knowl Inf Syst 13, 305–335 (2007). https://doi.org/10.1007/s10115-006-0031-9

Download citation

Received: 17 September 2004
Revised: 14 March 2005
Accepted: 27 May 2005
Published: 21 September 2006
Issue Date: November 2007
DOI: https://doi.org/10.1007/s10115-006-0031-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Mining gene–sample–time microarray data: a coherent gene cluster discovery approach

Abstract

Access this article

Similar content being viewed by others

An Incremental Linear Programming Based Tool for Analyzing Gene Expression Data

Marked Point Processes for Microarray Data Clustering

Clustering: A Novel Meta-Analysis Approach for Differentially Expressed Gene Detection

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mining gene–sample–time microarray data: a coherent gene cluster discovery approach

Abstract

Access this article

Similar content being viewed by others

An Incremental Linear Programming Based Tool for Analyzing Gene Expression Data

Marked Point Processes for Microarray Data Clustering

Clustering: A Novel Meta-Analysis Approach for Differentially Expressed Gene Detection

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation