skip to main content
10.1145/1951365.1951407acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article

Symmetrizations for clustering directed graphs

Published:21 March 2011Publication History

ABSTRACT

Graph clustering has generally concerned itself with clustering undirected graphs; however the graphs from a number of important domains are essentially directed, e.g. networks of web pages, research papers and Twitter users. This paper investigates various ways of symmetrizing a directed graph into an undirected graph so that previous work on clustering undirected graphs may subsequently be leveraged. Recent work on clustering directed graphs has looked at generalizing objective functions such as conductance to directed graphs and minimizing such objective functions using spectral methods. We show that more meaningful clusters (as measured by an external ground truth criterion) can be obtained by symmetrizing the graph using measures that capture in- and out-link similarity, such as bibliographic coupling and co-citation strength. However, direct application of these similarity measures to modern large-scale power-law networks is problematic because of the presence of hub nodes, which become connected to the vast majority of the network in the transformed undirected graph. We carefully analyze this problem and propose a Degree-discounted similarity measure which is much more suitable for large-scale networks. We show extensive empirical validation.

References

  1. R. Andersen, F. R. K. Chung, and K. J. Lang. Local partitioning for directed graphs using pagerank. In WAW, pages 166--178, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Bayardo, Y. Ma, and R. Srikant. Scaling up all pairs similarity search. In Proceedings of the 16th international conference on World Wide Web, pages 131--140. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Chakrabarti and C. Faloutsos. Graph mining: Laws, generators, and algorithms. ACM Comput. Surv., 38(1):2, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. F. Chung. Laplacians and the Cheeger inequality for directed graphs. Annals of Combinatorics, 9(1):1--19, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  5. I. S. Dhillon, Y. Guan, and B. Kulis. Weighted Graph Cuts without Eigenvectors: A Multilevel Approach. IEEE Trans. Pattern Anal. Mach. Intell., 29(11):1944--1957, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Ding, X. He, P. Husbands, H. Zha, and H. Simon. Pagerank, hits and a unified framework for link analysis. In SIAM Conference on Data Mining, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  7. M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the internet topology. In Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication. ACM, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Fortunato. Community detection in graphs. Physics Reports, 486:75--174, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  9. D. Gleich. Hierarchical Directed Spectral Graph Partitioning. 2006.Google ScholarGoogle Scholar
  10. J. Huang, T. Zhu, and D. Schuurmans. Web communities identification from random walks. Lecture Notes in Computer Science, 4213:187, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Kannan, S. Vempala, and A. Veta. On clusterings-good, bad and spectral. In FOCS '00, page 367. IEEE Computer Society, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 20, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Kessler. Bibliographic coupling between scientific papers. American Documentation, 14:10--25, 1963.Google ScholarGoogle ScholarCross RefCross Ref
  14. J. Leskovec, D. Chakrabarti, J. Kleinberg, C. Faloutsos, and Z. Ghahramani. Kronecker graphs: An approach to modeling networks. The Journal of Machine Learning Research, 11:985--1042, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters. CoRR, abs/0810.1355, 2008.Google ScholarGoogle Scholar
  16. C. Manning, P. Raghavan, and H. Schutze. An introduction to information retrieval. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Meila and W. Pentney. Clustering by Weighted Cuts in Directed Graphs. In SDM, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  18. M. Meila and J. Shi. A random walks view of spectral segmentation. In Artificial Intelligence and Statistics AISTATS, 2001.Google ScholarGoogle Scholar
  19. A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and Analysis of Online Social Networks. In Proceedings of the 5th ACM/Usenix Internet Measurement Conference (IMC'07), San Diego, CA, October 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. V. Satuluri and S. Parthasarathy. Scalable graph clustering using stochastic flows: applications to community discovery. In KDD '09, pages 737--746, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Shi and J. Malik. Normalized Cuts and Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. H. Small. Co-citation in the scientific literature: A new measure of the relationship between documents. Journal of the American Society for Information Science, 24:265--269, 1973.Google ScholarGoogle ScholarCross RefCross Ref
  23. E. Spertus, M. Sahami, and O. Buyukkokten. Evaluating similarity measures: a large-scale study in the orkut social network. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, KDD '05, pages 678--684, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. Zhou, J. Huang, and B. Scholkopf. Learning from labeled and unlabeled data on a directed graph. In ICML '05, pages 1036--1043, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. Zhou, B. Scholkopf, and T. Hofmann. Semi-supervised learning on directed graphs. Advances in neural information processing systems, 17:1633--1640, 2005.Google ScholarGoogle Scholar

Index Terms

  1. Symmetrizations for clustering directed graphs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        EDBT/ICDT '11: Proceedings of the 14th International Conference on Extending Database Technology
        March 2011
        587 pages
        ISBN:9781450305280
        DOI:10.1145/1951365

        Copyright © 2011 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 21 March 2011

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate7of10submissions,70%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader