Bipartite isoperimetric graph partitioning for data co-clustering

Rege, Manjeet; Dong, Ming; Fotouhi, Farshad

doi:10.1007/s10618-008-0091-4

Bipartite isoperimetric graph partitioning for data co-clustering

Published: 09 February 2008

Volume 16, pages 276–312, (2008)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Manjeet Rege¹,
Ming Dong¹ &
Farshad Fotouhi¹

304 Accesses
18 Citations
Explore all metrics

Abstract

Data co-clustering refers to the problem of simultaneous clustering of two data types. Typically, the data is stored in a contingency or co-occurrence matrix C where rows and columns of the matrix represent the data types to be co-clustered. An entry C _ij of the matrix signifies the relation between the data type represented by row i and column j. Co-clustering is the problem of deriving sub-matrices from the larger data matrix by simultaneously clustering rows and columns of the data matrix. In this paper, we present a novel graph theoretic approach to data co-clustering. The two data types are modeled as the two sets of vertices of a weighted bipartite graph. We then propose Isoperimetric Co-clustering Algorithm (ICA)—a new method for partitioning the bipartite graph. ICA requires a simple solution to a sparse system of linear equations instead of the eigenvalue or SVD problem in the popular spectral co-clustering approach. Our theoretical analysis and extensive experiments performed on publicly available datasets demonstrate the advantages of ICA over other approaches in terms of the quality, efficiency and stability in partitioning the bipartite graph.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Alon N (1986). Eigenvalues and expanders. Combinatorica 6(2): 83–96
Article MATH MathSciNet Google Scholar
Alon N and Milman VD (1985). λ₁ isoperimetric inequalities for graphs and superconcentrators. J Comb Theory Ser B 38: 73–88
Article MATH MathSciNet Google Scholar
Alpert CJ and Kahng AB (1995). Recent directions in netlist partitioning: a survey. Integr VLSI J 19(12): 1–81
Article MATH Google Scholar
Anderson WN and Morley TD (1985). Eigenvalues of the laplacian of a graph. Linear Multilinear Algebra 18: 141–145
Article MATH MathSciNet Google Scholar
Arfken GB, Weber HJ (2000) Mathematical methods for physicists, 5th edn. Academic Press
Banerjee A, Dhillon IS, Ghosh J, Merugu S, Modha DS (2004) A generalized maximum entropy approach to bregman co-clustering and matrix approximation. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’04), pp 509–514
Biggs N (1974) Algebraic graph theory. Cambridge University Press
Boley D, Gini M, Gross R, Han E-H, Hastings K, Karypis G, Kumar V, Mobasher B and Moore J (1999). Document categorization and query generation on the world wide web using webace. AI Rev 11: 365–391
Google Scholar
Cai R, Lu L, Hanjalic A (2005) Unsupervised content discovery in composite audio. In: Proceedings of the 13th annual ACM international conference on Multimedia (MM ’05), pp 628–637
Cheeger J (1970) A lower bound for the smallest eigenvalue of the laplacian. In: Gunning RC (ed) Problems in Analysis. Princeton Univ. Press, pp 195–199
Chung FRK (1997) Spectral graph theory. American Mathematical Society
Demmel JW (1997) Applied numerical linear algebra. SIAM
Dhillon IS (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining (KDD)
Dhillon IS, Mallela S, Modha DS (2003) Information-theoretic co-clustering. In: Proceedings of ninth ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’03), pp 89–98
Ding CHQ (2003a) Document retrieval and clustering: from principal component analysis to self-aggregation networks. In: Proceedings of int’l parallel and distributed processing symposium proceedings of 9th int’l workshop on artificial intelligence and statistics
Ding CHQ (2003b). Unsupervised feature selection via two-way ordering in gene expression analysis. Bioinformatics 19: 1259–1266
Article Google Scholar
Ding CHQ, He X, Meraz RF and Holbrook SR (2004). A unified representation of multiprotein complex data for modeling interaction networks. Proteins: Struct Func Bioinform 57(1): 99–108
Article Google Scholar
Dodziuk J (1984). Difference equations, isoperimetric inequality and the transience of certain random walks. Trans Am Math Soc 284: 787–794
Article MATH MathSciNet Google Scholar
Dodziuk J, Kendall WS (1986) Combinatorial laplacians and isoperimetric inequality. In: From local times to global geometry, control and physics. Pitman Research Notes in Mathematics Series 150:68–74, [Longman Scientific and Techical]
Donath WE and Hoffman AJ (1972). Algorithms for partitioning of graphs and computer logic based on eigenvectors of connection matrices. IBM Tehn Disclosure Bull 15: 938–944
Google Scholar
Donath WE and Hoffman AJ (1973). Lower bounds for the partitioning of graphs. IBM J Res Dev 17: 420–425
Article MATH MathSciNet Google Scholar
Dongen SV (2000) Graph clustering by flow simulation. PhD thesis, University of Utrecht
Duda RO, Hart PE, Stork DG (2000) Pattern classification. Wiley
Enright AJ, Dongen SV and Ouzounis CA (2002). An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 30(7): 1575–1584
Article Google Scholar
Fiedler M (1973). Algebraic connectivity of graphs. Czech Math J 23: 298–305
MathSciNet Google Scholar
Fiedler M (1975a). Eigenvectors of acyclic matrices. Czech Math J 25: 607–618
MathSciNet Google Scholar
Fiedler M (1975b). A property of eigenvectors of nonnegative symmetric matrices and its application to graph theory. Czech Math J 25: 619–633
MathSciNet Google Scholar
Fiedler M (1986) Special matrices and their applications in numerical mathematics. Martinus Nijhoff Publishers
Garey MR, Johnson DS (1979) Computers and intractability; a guide to the theory of NP-completeness. W. H. Freeman and Company
George T, Merugu S (2005) A scalable collaborative filtering framework based on co-clustering. In: Proceedings of the fifth IEEE international conference on data mining (ICDM ’05)
Gilbert JR, Miller GL and Teng SH (1998). Geometric mesh partitioning: implementation and experiments. SIAM J Sci Comput 19(6): 2091–2110
Article MATH MathSciNet Google Scholar
Golub GH, Van-Loan CF (1989) Matrix computations. John Hopkins Press
Gonzalez RC and Woods RE (2002). Digital image processing. Prentice Hall, Upper Saddle River
Google Scholar
Grady L and Schwartz EL (2006a). Isoperimetric graph partitioning for image segmentation. IEEE Trans Pattern Anal Mach Intell 28(3): 469–475
Article Google Scholar
Grady L and Schwartz EL (2006b). Isoperimetric partitioning: A new algorithm for graph partitioning. SIAM J Sci Comput 27(6): 1844–1866
Article MATH MathSciNet Google Scholar
Guattery S and Miller GL (1998). On the quality of spectral separators. SIAM J Matrix Anal Appl 19(3): 701–719
Article MATH MathSciNet Google Scholar
Hagen L and Kahng AB (1992). New spectral methods for ratio cut partitioning and clustering. IEEE Trans Comput Aid Design Integr Circuits Sys 11(9): 1074–1085
Article Google Scholar
Han E-H, Karypis G (2000) Centroid-based document classification: analysis and experimental results. In: Proceedings of 4th European conference on principles and practice of knowledge discovery in databases (PKDD ’00), pp 424–431
Hendrickson B, Leland R (1995) The chaco user’s guide. Technical Report SAND95-2344, Sandia National Laboratories, Albuquerque
Hersh W, Buckley C, Leone TJ, Hickam D (1994) Ohsumed: an interactive retrieval evaluation and new large test collection for research. In: Proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR ’94), pp 192–201
Hopfield JJ (1982). Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci USA 79: 2554–2558
Article MathSciNet Google Scholar
Jain AK, Murty MN and Flynn PJ (1999). Data clustering: a review. ACM Comput Surv 31(3): 264–323
Article Google Scholar
Jolliffe IT (2002). Principal component analysis, 2nd edn. Springer, New York
Google Scholar
Kuijlaars ABJ (2001). Which eigenvalues are found by the Lanczos method. SIAM J Matrix Anal Appl 22(1): 306–321
Article MathSciNet Google Scholar
Kumar R, Mahadevan U, Sivakumar D (2004) A graph-theoretic approach to extract storylines from search results. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’04), pp 216–225
Kummamuru K, Dhawale A, Krishnapuram R (2003) Fuzzy co-clustering of documents and keywords. In: Proceedings of The 12th IEEE international conference on fuzzy systems (FUZZ ’03), pp 772–777
Lewis DD (1999) Reuters-21578 text categorization test collection distribution 1.0, http://www.daviddlewis.com/resources/testcollections/reuters21578/
Long B, Zhang Z, Yu PS (2005) Co-clustering by block value decomposition. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining (KDD ’05), pp 635–640
Mandhani B, Joshi S, Kummamuru K (2003) A matrix density based algorithm to hierarchically co-cluster documents and words. In: Proceedings of the 12th international conference on World Wide Web (WWW ’03), pp 511–518
Merris R (1994). Laplacian matrices of graphs: a survey. Linear Algebra Appl 197: 143–176
Article MathSciNet Google Scholar
Mohar B (1989). Isoperimetric numbers of graphs. J Comb Theory Ser B 47: 274–291
Article MATH MathSciNet Google Scholar
Mohar B (1991). The Laplacian spectrum of graphs. Graph Theory Comb Appl 2: 871–898
MathSciNet Google Scholar
Oh C-H, Honda K, Ichihashi H (2001) Fuzzy clustering for categorical multivariate data. In: Proceedings of joint 9th IFSA world congress and 20th NAFIPS international conference, pp 2154–2159
Porter MF (1980). An algorithm for suffix stripping. Program 14(3): 130–137
Google Scholar
Qiu G (2004) Image and feature co-clustering. In: Proceedings of IEEE ICPR
Rege M, Dong M, Fotouhi F (2006a) Co-clustering documents and words using bipartite isoperimetric graph partitioning. In: Proceedings of the 6th IEEE international conference on data mining (ICDM)
Rege M, Dong M, Fotouhi F (2006b) Co-clustering image features and semantic concepts. In: Proceedings of IEEE international conference on image processing
Rui Y, Huang TS, Mehrotra S (1997) Content-based image retrieval with relevance feedback in mars. In: Proceedins of IEEE International conference on image processing
Shi J and Malik J (2000). Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8): 888–905
Article Google Scholar
Simon HD (1991). Partitioning of unstructured problems for parallel processing. Comput Syst Eng 2: 135–148
Article Google Scholar
Slonim N, Tishby N (2000) Document clustering using word clusters via the information bottleneck method. In: Research and development in information retrieval, pp 208–215
Smeulders AWM, Worring M, Santini S, Gupta A and Jain R (2000). Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12): 1349–1380
Article Google Scholar
TREC (1996, 1997, 1998) Text retrieval conference, http://trec.nist.gov
Wu X, Ngo CW, Li Q (2005) Co-clustering of time-evolving news story with transcript and keyframe. In: Proceedings of IEEE international conference on multimedia and expo (ICME ’05), pp 117–120
Zha H, He X, Ding CHQ, Simon H, Gu M (2001) Bipartite graph partitioning and data clustering. In: Proceedings of the tenth international conference on information and knowledge management (CIKM)
Zha H, Ji X (2002) Correlating multilingual documents via bipartite graph modeling. In: Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR ’02)
Zhao R and Grosky WI (2002). Narrowing the semantic gap-improved text-based web document retrieval using visual features. IEEE Trans Multimedia 4(2): 189–200
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Wayne State University, Detroit, MI, 48202, USA
Manjeet Rege, Ming Dong & Farshad Fotouhi

Authors

Manjeet Rege
View author publications
You can also search for this author in PubMed Google Scholar
Ming Dong
View author publications
You can also search for this author in PubMed Google Scholar
Farshad Fotouhi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Manjeet Rege.

Additional information

Responsible editor: Charu Aggarwal.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rege, M., Dong, M. & Fotouhi, F. Bipartite isoperimetric graph partitioning for data co-clustering. Data Min Knowl Disc 16, 276–312 (2008). https://doi.org/10.1007/s10618-008-0091-4

Download citation

Received: 12 February 2007
Accepted: 23 January 2008
Published: 09 February 2008
Issue Date: June 2008
DOI: https://doi.org/10.1007/s10618-008-0091-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bipartite isoperimetric graph partitioning for data co-clustering

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Bipartite isoperimetric graph partitioning for data co-clustering

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation