skip to main content
10.1145/2505515.2505563acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

An efficient MapReduce algorithm for counting triangles in a very large graph

Published:27 October 2013Publication History

ABSTRACT

Triangle counting problem is one of the fundamental problem in various domains. The problem can be utilized for computation of clustering coefficient, transitivity, trianglular connectivity, trusses, etc. The problem have been extensively studied in internal memory but the algorithms are not scalable for enormous graphs. In recent years, the MapReduce has emerged as a de facto standard framework for processing large data through parallel computing. A MapReduce algorithm was proposed for the problem based on graph partitioning. However, the algorithm redundantly generates a large number of intermediate data that cause network overload and prolong the processing time. In this paper, we propose a new algorithm based on graph partitioning with a novel idea of triangle classification to count the number of triangles in a graph. The algorithm substantially reduces the duplication by classifying triangles into three types and processing each triangle differently according to its type. In the experiments, we compare the proposed algorithm with recent existing algorithms using both synthetic datasets and real-world datasets that are composed of millions of nodes and billions of edges. The proposed algorithm outperforms other algorithms in most cases. Especially, for a twitter dataset, the proposed algorithm is more than twice as fast as existing MapReduce algorithms. Moreover, the performance gap increases as the graph becomes larger and denser.

References

  1. http://newsroom.fb.com/Key-Facts.Google ScholarGoogle Scholar
  2. http://snap.stanford.edu/.Google ScholarGoogle Scholar
  3. http://an.kaist.ac.kr/pub date.html.Google ScholarGoogle Scholar
  4. N. Alon, R. Yuster, and U. Zwick. Finding and counting given length cycles. Algorithmica, 17(3):209--223, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  5. A.-L. Barabási and R. Albert. Emergence of scaling in random networks. science, 286(5439):509--512, 1999.Google ScholarGoogle Scholar
  6. L. Becchetti, P. Boldi, C. Castillo, and A. Gionis. Efficient semi-streaming algorithms for local triangle counting in massive graphs. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 16--24. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. W. Berry, B. Hendrickson, R. A. LaViolette, and C. A. Phillips. Tolerating the community detection resolution limit with edge weighting. Physical Review E, 83:056119, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  8. S. Chu and J. Cheng. Triangle listing in massive networks and its applications. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 672--680, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Cohen. Graph twiddling in a mapreduce world. Computing in Science & Engineering, 11(4):29--41, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Coppersmith and S. Winograd. Matrix multiplication via arithmetic progressions. Journal of symbolic computation, 9(3):251--280, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Dementiev. Algorithm engineering for large data sets. PhD thesis, Doktorarbeit, Universität des Saarlandes, 2006.Google ScholarGoogle Scholar
  13. J.-P. Eckmann and E. Moses. Curvature of co-links uncovers hidden thematic layers in the world wide web. Proceedings of the national academy of sciences, 99(9):5825--5829, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  14. X. Hu, Y. Tao, and C.-W. Chung. Massive graph triangulation. In Proceedings of the 2013 ACM SIGMOD international conference on Management Of data, pages 325--336, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Itai and M. Rodeh. Finding a minimum circuit in a graph. SIAM Journal on Computing, 7(4):413--423, 1978.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. U. Kang, C. E. Tsourakakis, and C. Faloutsos. Pegasus: A peta-scale graph mining system implementation and observations. In Ninth IEEE International Conference on Data Mining, pages 229--238, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Latapy. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theoretical Computer Science, 407(1):458--473, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. Menegola. An external memory algorithm for listing triangles. Technical report, Universidade Federal do Rio Grande do Sul, 2010.Google ScholarGoogle Scholar
  19. J. Myung and S.-g. Lee. Matrix chain multiplication via multi-way join algorithms in mapreduce. In Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication, pages 53:1--53:5, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T. Opsahl and P. Panzarasa. Clustering in weighted networks. Social networks, 31(2):155--163, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  21. T. Schank and D. Wagner. Finding, counting and listing all triangles in large graphs, an experimental study. In Experimental and Efficient Algorithms, pages 606--609. Springer, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Suri and S. Vassilvitskii. Counting triangles and the curse of the last reducer. In Proceedings of the 20th international conference on World wide web, pages 607--614, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. E. Tsourakakis, U. Kang, G. L. Miller, and C. Faloutsos. Doulion: counting triangles in massive graphs with a coin. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 837--846, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Ugander, B. Karrer, L. Backstrom, and C. Marlow. The anatomy of the facebook social graph. CoRR, abs/1111.4503, 2011.Google ScholarGoogle Scholar
  25. D. J. Watts and S. H. Strogatz. Collective dynamics of 'small-world'networks. nature, 393(6684):440--442, 1998.Google ScholarGoogle Scholar
  26. T. White. Hadoop: The definitive guide. O'Reilly Media, Inc., 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Z. Yang, C. Wilson, X. Wang, T. Gao, B. Y. Zhao, and Y. Dai. Uncovering social network sybils in the wild. In Proceedings of the 2011 ACM SIGCOMM conference on Intern Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An efficient MapReduce algorithm for counting triangles in a very large graph

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management
      October 2013
      2612 pages
      ISBN:9781450322638
      DOI:10.1145/2505515

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 27 October 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CIKM '13 Paper Acceptance Rate143of848submissions,17%Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader