research-article

Symmetrizations for clustering directed graphs

Authors:
Venu Satuluri

The Ohio State University

The Ohio State University
View Profile

,
Srinivasan Parthasarathy

The Ohio State University

The Ohio State University
View Profile

EDBT/ICDT '11: Proceedings of the 14th International Conference on Extending Database TechnologyMarch 2011Pages 343–354https://doi.org/10.1145/1951365.1951407

Published:21 March 2011Publication History

EDBT/ICDT '11: Proceedings of the 14th International Conference on Extending Database Technology

Pages 343–354

ABSTRACT

Graph clustering has generally concerned itself with clustering undirected graphs; however the graphs from a number of important domains are essentially directed, e.g. networks of web pages, research papers and Twitter users. This paper investigates various ways of symmetrizing a directed graph into an undirected graph so that previous work on clustering undirected graphs may subsequently be leveraged. Recent work on clustering directed graphs has looked at generalizing objective functions such as conductance to directed graphs and minimizing such objective functions using spectral methods. We show that more meaningful clusters (as measured by an external ground truth criterion) can be obtained by symmetrizing the graph using measures that capture in- and out-link similarity, such as bibliographic coupling and co-citation strength. However, direct application of these similarity measures to modern large-scale power-law networks is problematic because of the presence of hub nodes, which become connected to the vast majority of the network in the transformed undirected graph. We carefully analyze this problem and propose a Degree-discounted similarity measure which is much more suitable for large-scale networks. We show extensive empirical validation.

References

R. Andersen, F. R. K. Chung, and K. J. Lang. Local partitioning for directed graphs using pagerank. In WAW, pages 166--178, 2007. Google ScholarDigital Library
R. Bayardo, Y. Ma, and R. Srikant. Scaling up all pairs similarity search. In Proceedings of the 16th international conference on World Wide Web, pages 131--140. ACM, 2007. Google ScholarDigital Library
D. Chakrabarti and C. Faloutsos. Graph mining: Laws, generators, and algorithms. ACM Comput. Surv., 38(1):2, 2006. Google ScholarDigital Library
F. Chung. Laplacians and the Cheeger inequality for directed graphs. Annals of Combinatorics, 9(1):1--19, 2005.Google ScholarCross Ref
I. S. Dhillon, Y. Guan, and B. Kulis. Weighted Graph Cuts without Eigenvectors: A Multilevel Approach. IEEE Trans. Pattern Anal. Mach. Intell., 29(11):1944--1957, 2007. Google ScholarDigital Library
C. Ding, X. He, P. Husbands, H. Zha, and H. Simon. Pagerank, hits and a unified framework for link analysis. In SIAM Conference on Data Mining, 2003.Google ScholarCross Ref
M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the internet topology. In Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication. ACM, 1999. Google ScholarDigital Library
S. Fortunato. Community detection in graphs. Physics Reports, 486:75--174, 2010.Google ScholarCross Ref
D. Gleich. Hierarchical Directed Spectral Graph Partitioning. 2006.Google Scholar
J. Huang, T. Zhu, and D. Schuurmans. Web communities identification from random walks. Lecture Notes in Computer Science, 4213:187, 2006. Google ScholarDigital Library
R. Kannan, S. Vempala, and A. Veta. On clusterings-good, bad and spectral. In FOCS '00, page 367. IEEE Computer Society, 2000. Google ScholarDigital Library
G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 20, 1999. Google ScholarDigital Library
M. Kessler. Bibliographic coupling between scientific papers. American Documentation, 14:10--25, 1963.Google ScholarCross Ref
J. Leskovec, D. Chakrabarti, J. Kleinberg, C. Faloutsos, and Z. Ghahramani. Kronecker graphs: An approach to modeling networks. The Journal of Machine Learning Research, 11:985--1042, 2010. Google ScholarDigital Library
J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters. CoRR, abs/0810.1355, 2008.Google Scholar
C. Manning, P. Raghavan, and H. Schutze. An introduction to information retrieval. 2008. Google ScholarDigital Library
M. Meila and W. Pentney. Clustering by Weighted Cuts in Directed Graphs. In SDM, 2007.Google ScholarCross Ref
M. Meila and J. Shi. A random walks view of spectral segmentation. In Artificial Intelligence and Statistics AISTATS, 2001.Google Scholar
A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and Analysis of Online Social Networks. In Proceedings of the 5th ACM/Usenix Internet Measurement Conference (IMC'07), San Diego, CA, October 2007. Google ScholarDigital Library
V. Satuluri and S. Parthasarathy. Scalable graph clustering using stochastic flows: applications to community discovery. In KDD '09, pages 737--746, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
J. Shi and J. Malik. Normalized Cuts and Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000. Google ScholarDigital Library
H. Small. Co-citation in the scientific literature: A new measure of the relationship between documents. Journal of the American Society for Information Science, 24:265--269, 1973.Google ScholarCross Ref
E. Spertus, M. Sahami, and O. Buyukkokten. Evaluating similarity measures: a large-scale study in the orkut social network. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, KDD '05, pages 678--684, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
D. Zhou, J. Huang, and B. Scholkopf. Learning from labeled and unlabeled data on a directed graph. In ICML '05, pages 1036--1043, 2005. Google ScholarDigital Library
D. Zhou, B. Scholkopf, and T. Hofmann. Semi-supervised learning on directed graphs. Advances in neural information processing systems, 17:1633--1640, 2005.Google Scholar

Index Terms

Symmetrizations for clustering directed graphs
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis
2. Mathematics of computing
  1. Discrete mathematics
    1. Graph theory
      1. Graph algorithms

Recommendations

Spanning trees in dense directed graphs
Abstract
In 2001, Komlós, Sárközy and Szemerédi proved that, for each α > 0, there is some c > 0 and n 0 such that, if n ≥ n 0, then every n-vertex graph with minimum degree at least ( 1 / 2 + α ) n contains a copy of every n-vertex tree with ...
Read More
Testing subgraphs in directed graphs
STOC '03: Proceedings of the thirty-fifth annual ACM symposium on Theory of computing

Let H be a fixed directed graph on h vertices, let G be a directed graph on n vertices and suppose that at least ε n² edges have to be deleted from it to make it H-free. We show that in this case G contains at least f(ε,H) n^h copies of H. This is proved ...
Read More
Parameterized complexity of the induced subgraph problem in directed graphs

In this Letter, we consider the parameterized complexity of the following problem: Given a hereditary property P on digraphs, an input digraph D and a positive integer k, does D have an induced subdigraph on k vertices with property P? We completely ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EDBT/ICDT '11: Proceedings of the 14th International Conference on Extending Database Technology
March 2011
587 pages
ISBN:9781450305280
DOI:10.1145/1951365
Editors:
Anastasia Ailamaki
EPFL, Switzerland
,
Sihem Amer-Yahia
Yahoo! Research
,
Jignesh Pate
University of Wisconsin-Madison
,
Tore Risch
Uppsala University, Sweden
,
Pierre Senellart
Télécom ParisTech, France
,
Julia Stoyanovich
University of Pennsylvania
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 March 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
clustering
directed graphs
graph transformations
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate7of10submissions,70%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 87
  Total Citations
  View Citations
- 842
  Total Downloads
- Downloads (Last 12 months)60
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Symmetrizations for clustering directed graphs

EDBT/ICDT '11: Proceedings of the 14th International Conference on Extending Database Technology

ABSTRACT

References

Cited By

Index Terms

Recommendations

Spanning trees in dense directed graphs

Testing subgraphs in directed graphs

Parameterized complexity of the induced subgraph problem in directed graphs