Sleeved co-clustering of lagged data

Shaham, Eran; Sarne, David; Ben-Moshe, Boaz

doi:10.1007/s10115-011-0420-6

Sleeved co-clustering of lagged data

Regular paper
Published: 28 May 2011

Volume 31, pages 251–279, (2012)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Eran Shaham¹,
David Sarne¹ &
Boaz Ben-Moshe²

173 Accesses
5 Citations
Explore all metrics

Abstract

The paper focuses on mining clusters that are characterized by a lagged relationship between the data objects. We call such clusters lagged co-clusters. A lagged co-cluster of a matrix is a submatrix determined by a subset of rows and their corresponding lag over a subset of columns. Extracting such subsets may reveal an underlying governing regulatory mechanism. Such a regulatory mechanism is quite common in real-life settings. It appears in a variety of fields: meteorology, seismic activity, stock market behavior, neuronal brain activity, river flow, and navigation, but a limited list of examples. Mining such lagged co-clusters not only helps in understanding the relationship between objects in the domain, but assists in forecasting their future behavior. For most interesting variants of this problem, finding an optimal lagged co-cluster is NP-complete problem. We present a polynomial-time Monte-Carlo algorithm for mining lagged co-clusters. We prove that, with fixed probability, the algorithm mines a lagged co-cluster which encompasses the optimal lagged co-cluster by a maximum 2 ratio columns overhead and completely no rows overhead. Moreover, the algorithm handles noise, anti-correlations, missing values, and overlapping patterns. The algorithm is extensively evaluated using both artificial and real-world test environments. The first enable the evaluation of specific, isolated properties of the algorithm. The latter (river flow and topographic data) enable the evaluation of the algorithm to efficiently mine relevant and coherent lagged co-clusters in environments that are temporal, i.e., time reading data and non-temporal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Abraham T, Roddick J (1999) Survey of spatio-temporal databases. GeoInformatica 3(1): 61–99
Article Google Scholar
Anil Kumar V, Ramesh H (2003) Covering rectilinear polygons with axis-parallel rectangles. SIAM J Comput 32(6): 1509–1541
Article MathSciNet MATH Google Scholar
Ayadi W, Elloumi M, Hao J (2011) BicFinder: a biclustering algorithm for microarray data analysis. Knowl Inf Syst, pp 1–18
Bar-Joseph Z, Gifford D, Jaakkola T, Simon I (2002) A new approach to analyzing gene expression time series data. In: Proceedings of the sixth annual international conference on Computational biology. ACM, pp 39–48
Baralis E, Bruno G, Fiori A (2011) Measuring gene similarity by means of the classification distance. Knowl Inf Syst, pp 1–21
Barash Y, Friedman N (2002) Context-specific Bayesian clustering for gene expression data. J Comput Biol 9(2): 169–191
Article Google Scholar
Bellman R (1966) Dynamic programming. Science 153(3731): 34–37
Article Google Scholar
Berman P, DasGupta B (1997) Complexities of efficient solutions of rectilinear polygon cover problems. Algorithmica 17(4): 331–356
Article MathSciNet MATH Google Scholar
Cheng Y, Church G (2000) Biclustering of expression data. In: Proceedings of the eighth international conference on intelligent systems for molecular biology, vol 8, AAAI, pp 93–103
Chiu B, Keogh E, Lonardi S (2003) Probabilistic discovery of time series motifs. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 493–498
Chuang C, Jen C, Chen C, Shieh G (2008) A pattern recognition approach to infer time-lagged genetic interactions. Bioinformatics 24(9): 1183–1190
Article Google Scholar
Dantzig G (1998) Linear programming and extensions. Princeton University Press, Princeton
MATH Google Scholar
Erdal S, Ozturk O, Armbruster D, Ferhatosmanoglu H, Ray W (2004) A time series analysis of microarray data. In: Proceedings of the 4th IEEE symposium on bioinformatics and bioengineering. IEEE, pp 366–378
Faloutsos M, Faloutsos P, Faloutsos C (1999) On power-law relationships of the internet topology. In: Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication. ACM, pp 251–262
Getz G, Levine E, Domany E (2000) Coupled two-way clustering analysis of gene microarray data. Proc Natl Acad Sci 97(22): 12079–12084
Article Google Scholar
Granger C (1969) Investigating causal relations by econometric models and cross-spectral methods. Econometr J Econometr Soc 37(3): 424–438
Google Scholar
Håstad J (1999) Clique is hard to approximate within 1- ε. Acta Math 182(1): 105–142
Article MathSciNet MATH Google Scholar
Huang J (2006) Identifying co-regulated gene group from time-lagged gene cluster using cell cycle expression data. PhD thesis, National Central University, Taiwan
Jain A, Murty M, Flynn P (1999) Data clustering: a review. ACM Comput Surv 31(3): 264–323
Article Google Scholar
Ji L, Tan K (2005) Identifying time-lagged gene clusters using gene expression data. Bioinformatics 21(4): 509–516
Article Google Scholar
Jiang D, Pei J, Ramanathan M, Tang C, Zhang A (2004) Mining coherent gene clusters from gene-sample-time microarray data. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 430–439
Jiang D, Pei J, Zhang A (2003) Interactive exploration of coherent patterns in time-series gene expression data. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 565–570
Jiang D, Tang C, Zhang A (2004) Cluster analysis for gene expression data: a survey. IEEE Trans Knowl Data Eng 16(11): 1370–1386
Article Google Scholar
Kang U, Tsourakakis C, Faloutsos C (2010) Pegasus: mining peta-scale graphs. Knowl Inf Syst, pp 1–23
Kenett D, Shapira Y, Ben-Jacob E (2009) RMT assessments of the market latent information embedded in the stocks’ raw, normalized and partial correlations. J Probab Stat
Khot S (2002) Improved inapproximability results for maxclique, chromatic number and approximate graph coloring. In: Proceedings of the 42nd IEEE symposium on foundations of computer science. IEEE, pp 600–609
Kluger Y, Basri R, Chang J, Gerstein M (2003) Spectral biclustering of microarray data: coclustering genes and conditions. Genome Res 13(4): 703–716
Article Google Scholar
Kumar V (1992) Algorithms for constraint-satisfaction problems: a survey. AI Mag 13(1): 32–44
Google Scholar
Lonardi S, Szpankowski W, Yang Q (2006) Finding biclusters by random projections. Theor Comput Sci 368(3): 217–230
Article MathSciNet MATH Google Scholar
Madeira SC, Gonçalves JP, Oliveira AL (2007) Efficient biclustering algorithms for identifying transcriptional regulation relationships using time series gene expression data. Technical Report 22/2007, INESC-ID
Madeira S, Oliveira A (2004) Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans Comput Biol Bioinform 1(1): 24–45
Article Google Scholar
Mei C, Stiassnie M, Dick K (2005) Theory and applications of ocean surface waves: nonlinear aspects. World Scientific, Singapore
Google Scholar
Melkman A, Shaham E (2004) Sleeved CoClustering. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 635–640
Moise G, Zimek A, Kroeger P, Kriegel H, Sander J (2009) Subspace and projected clustering: experimental evaluation and analysis. Knowl Inf Syst 21(3): 299–326
Article Google Scholar
Moller-Levet C, Klawonn F, Cho K, Yin H, Wolkenhauer O (2005) Clustering of unevenly sampled gene expression time-series data. Fuzzy Sets Syst 152: 49–66
Article MathSciNet Google Scholar
Procopiuc C, Jones, M, Agarwal P, Murali T (2002) A Monte Carlo algorithm for fast projective clustering. In: Proceedings of the 2002 ACM SIGMOD international conference on management of data. ACM, pp 418–427
Ramsey S, Klemm S, Zak D, Kennedy K, Thorsson V, Li B, Gilchrist M, Gold E, Johnson C, Litvak V, et al (2008) Uncovering a macrophage transcriptional program by integrating evidence from motif scanning and expression dynamics. PLoS Comput Biol 4(3)
Roddick J, Spiliopoulou M (2002) A survey of temporal knowledge discovery paradigms and methods. IEEE Trans Knowl Data Eng, pp 750–767
Tanay A, Sharan R, Shamir R (2002) Discovering statistically significant biclusters in gene expression data. Bioinformatics 1(1): 1–9
Google Scholar
Tanay A, Sharan R, Shamir R (2005) Biclustering algorithms: a survey. Handbook Comput Mol Biol 9: 26–31
Google Scholar
USGS: Real Time Water Information System (2010) U.S. Geological Survey, National Water Information System. http://waterdata.usgs.gov/nwis/
Wang G, Yin L, Zhao Y, Mao K (2010) Efficiently mining time-delayed gene expression patterns. IEEE Trans Syst Man Cybern B Cybern 40(2): 400–411
Article Google Scholar
Wolfram|Alpha (access Dec 31, 2010) Wolfram Alpha LLC. http://www.wolframalpha.com/
Wu W, Li W, Chen B (2007) Identifying regulatory targets of cell cycle transcription factors using gene expression and ChIP-chip data. BMC Bioinform 8(1): 188
Article Google Scholar
Xu X, Lu Y, Tan K, Tung A (2008) Finding time-lagged 3D clusters. In: Proceedings of the 24th international conference on data engineering, pp 445–456
Xu X, Lu Y, Tung A, Wang W (2006) Mining shifting-and-scaling co-regulation patterns on gene expression profiles. In: Proceedings of the 22nd international conference on data engineering. IEEE Computer Society, pp 89–98
Yang J, Wang H, Wang W, Yu P (2003) Enhanced biclustering on expression data. In: Proceedings of the 3rd IEEE symposium on bioinformatics and bioengineering. IEEE, pp 321–327
Yilmaz O, Doherty S (2001) Seismic data analysis. Society of Exploration Geophysicists
Yin Y, Zhao Y, Zhang B, Wang G (2007) Mining time-shifting co-regulation patterns from gene expression data. Adv Data Web Manage, pp 62–73
Zakov S (2007) Power coclustering: a model guided approach for automated recognition of trascription reguratory mechanism by gene expression data analysis. PhD thesis, Ben Gurion University, Israel
Zeng T, Liu J (2008) Analysis on time-lagged gene clusters in time series gene expression data. In: Proceedings of the 2007 international conference on computational intelligence and security. IEEE, pp 181–185
Zipf G (1949) Human behavior and the principle of least effort: an introduction to human ecology. Addison-Wesley press, Reading
Google Scholar
Zuckerman D (2007) Linear degree extractors and the inapproximability of max clique and chromatic number. Theory Comput 3(1): 103–128
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Bar-Ilan University, 52900, Ramat-Gan, Israel
Eran Shaham & David Sarne
Department of Computer Science, Ariel University Center, 44837, Ariel, Israel
Boaz Ben-Moshe

Authors

Eran Shaham
View author publications
You can also search for this author in PubMed Google Scholar
David Sarne
View author publications
You can also search for this author in PubMed Google Scholar
Boaz Ben-Moshe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eran Shaham.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shaham, E., Sarne, D. & Ben-Moshe, B. Sleeved co-clustering of lagged data. Knowl Inf Syst 31, 251–279 (2012). https://doi.org/10.1007/s10115-011-0420-6

Download citation

Received: 10 January 2011
Revised: 28 February 2011
Accepted: 13 May 2011
Published: 28 May 2011
Issue Date: May 2012
DOI: https://doi.org/10.1007/s10115-011-0420-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sleeved co-clustering of lagged data

Abstract

Access this article

Similar content being viewed by others

Co-clustering of fuzzy lagged data

Co-Clustering for Object by Variable Data Matrices

Discovering Non-compliant Window Co-Occurrence Patterns: A Summary of Results

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Sleeved co-clustering of lagged data

Abstract

Access this article

Similar content being viewed by others

Co-clustering of fuzzy lagged data

Co-Clustering for Object by Variable Data Matrices

Discovering Non-compliant Window Co-Occurrence Patterns: A Summary of Results

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation