research-article

An efficient MapReduce algorithm for counting triangles in a very large graph

Authors:
Ha-Myung Park

KAIST, Daejeon, South Korea

KAIST, Daejeon, South Korea
View Profile

,
Chin-Wan Chung

KAIST, Daejeon, South Korea

KAIST, Daejeon, South Korea
View Profile

CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge ManagementOctober 2013Pages 539–548https://doi.org/10.1145/2505515.2505563

Published:27 October 2013Publication History

CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management

Pages 539–548

ABSTRACT

Triangle counting problem is one of the fundamental problem in various domains. The problem can be utilized for computation of clustering coefficient, transitivity, trianglular connectivity, trusses, etc. The problem have been extensively studied in internal memory but the algorithms are not scalable for enormous graphs. In recent years, the MapReduce has emerged as a de facto standard framework for processing large data through parallel computing. A MapReduce algorithm was proposed for the problem based on graph partitioning. However, the algorithm redundantly generates a large number of intermediate data that cause network overload and prolong the processing time. In this paper, we propose a new algorithm based on graph partitioning with a novel idea of triangle classification to count the number of triangles in a graph. The algorithm substantially reduces the duplication by classifying triangles into three types and processing each triangle differently according to its type. In the experiments, we compare the proposed algorithm with recent existing algorithms using both synthetic datasets and real-world datasets that are composed of millions of nodes and billions of edges. The proposed algorithm outperforms other algorithms in most cases. Especially, for a twitter dataset, the proposed algorithm is more than twice as fast as existing MapReduce algorithms. Moreover, the performance gap increases as the graph becomes larger and denser.

References

http://newsroom.fb.com/Key-Facts.Google Scholar
http://snap.stanford.edu/.Google Scholar
http://an.kaist.ac.kr/pub date.html.Google Scholar
N. Alon, R. Yuster, and U. Zwick. Finding and counting given length cycles. Algorithmica, 17(3):209--223, 1997.Google ScholarCross Ref
A.-L. Barabási and R. Albert. Emergence of scaling in random networks. science, 286(5439):509--512, 1999.Google Scholar
L. Becchetti, P. Boldi, C. Castillo, and A. Gionis. Efficient semi-streaming algorithms for local triangle counting in massive graphs. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 16--24. ACM, 2008. Google ScholarDigital Library
J. W. Berry, B. Hendrickson, R. A. LaViolette, and C. A. Phillips. Tolerating the community detection resolution limit with edge weighting. Physical Review E, 83:056119, 2011.Google ScholarCross Ref
S. Chu and J. Cheng. Triangle listing in massive networks and its applications. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 672--680, 2011. Google ScholarDigital Library
J. Cohen. Graph twiddling in a mapreduce world. Computing in Science & Engineering, 11(4):29--41, 2009. Google ScholarDigital Library
D. Coppersmith and S. Winograd. Matrix multiplication via arithmetic progressions. Journal of symbolic computation, 9(3):251--280, 1990. Google ScholarDigital Library
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008. Google ScholarDigital Library
R. Dementiev. Algorithm engineering for large data sets. PhD thesis, Doktorarbeit, Universität des Saarlandes, 2006.Google Scholar
J.-P. Eckmann and E. Moses. Curvature of co-links uncovers hidden thematic layers in the world wide web. Proceedings of the national academy of sciences, 99(9):5825--5829, 2002.Google ScholarCross Ref
X. Hu, Y. Tao, and C.-W. Chung. Massive graph triangulation. In Proceedings of the 2013 ACM SIGMOD international conference on Management Of data, pages 325--336, 2013. Google ScholarDigital Library
A. Itai and M. Rodeh. Finding a minimum circuit in a graph. SIAM Journal on Computing, 7(4):413--423, 1978.Google ScholarDigital Library
U. Kang, C. E. Tsourakakis, and C. Faloutsos. Pegasus: A peta-scale graph mining system implementation and observations. In Ninth IEEE International Conference on Data Mining, pages 229--238, 2009. Google ScholarDigital Library
M. Latapy. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theoretical Computer Science, 407(1):458--473, 2008. Google ScholarDigital Library
B. Menegola. An external memory algorithm for listing triangles. Technical report, Universidade Federal do Rio Grande do Sul, 2010.Google Scholar
J. Myung and S.-g. Lee. Matrix chain multiplication via multi-way join algorithms in mapreduce. In Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication, pages 53:1--53:5, 2012. Google ScholarDigital Library
T. Opsahl and P. Panzarasa. Clustering in weighted networks. Social networks, 31(2):155--163, 2009.Google ScholarCross Ref
T. Schank and D. Wagner. Finding, counting and listing all triangles in large graphs, an experimental study. In Experimental and Efficient Algorithms, pages 606--609. Springer, 2005. Google ScholarDigital Library
S. Suri and S. Vassilvitskii. Counting triangles and the curse of the last reducer. In Proceedings of the 20th international conference on World wide web, pages 607--614, 2011. Google ScholarDigital Library
C. E. Tsourakakis, U. Kang, G. L. Miller, and C. Faloutsos. Doulion: counting triangles in massive graphs with a coin. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 837--846, 2009. Google ScholarDigital Library
J. Ugander, B. Karrer, L. Backstrom, and C. Marlow. The anatomy of the facebook social graph. CoRR, abs/1111.4503, 2011.Google Scholar
D. J. Watts and S. H. Strogatz. Collective dynamics of 'small-world'networks. nature, 393(6684):440--442, 1998.Google Scholar
T. White. Hadoop: The definitive guide. O'Reilly Media, Inc., 2012. Google ScholarDigital Library
Z. Yang, C. Wilson, X. Wang, T. Gao, B. Y. Zhao, and Y. Dai. Uncovering social network sybils in the wild. In Proceedings of the 2011 ACM SIGCOMM conference on Intern Google ScholarDigital Library

Index Terms

An efficient MapReduce algorithm for counting triangles in a very large graph
1. Information systems
  1. Information retrieval

Recommendations

Scalable big graph processing in MapReduce
SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data

MapReduce has become one of the most popular parallel computing paradigms in cloud, due to its high scalability, reliability, and fault-tolerance achieved for a large variety of applications in big data processing. In the literature, there are MapReduce ...
Read More
Efficient Large-Scale Multi-graph Similarity Search Using MapReduce
Web Information Systems and Applications
Abstract
A multi-graph is a set consisting of multiple graphs. Multi-graph similarity search aims to find the multi-graphs similar to the query multi-graphs from the multi-graph datasets. It plays important role in a wide range of application fields, such ...
Read More
MapReduce: Review and open challenges

The continuous increase in computational capacity over the past years has produced an overwhelming flow of data or big data, which exceeds the capabilities of conventional processing tools. Big data signify a new era in data exploration and utilization. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management
October 2013
2612 pages
ISBN:9781450322638
DOI:10.1145/2505515
General Chairs:
Qi He
LinkedIn, USA
,
Arun Iyengar
IBM T.J. Watson Research Center, USA
,
Program Chairs:
Wolfgang Nejdl
L3S Research Center, Germany
,
Jian Pei
Simon Fraser University, Canada
,
Rajeev Rastogi
Amazon, India
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 October 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
graph
mapreduce
triangle
Qualifiers
- research-article
Conference

Acceptance Rates
CIKM '13 Paper Acceptance Rate143of848submissions,17%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 46
  Total Citations
  View Citations
- 661
  Total Downloads
- Downloads (Last 12 months)33
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

An efficient MapReduce algorithm for counting triangles in a very large graph

CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Scalable big graph processing in MapReduce

Efficient Large-Scale Multi-graph Similarity Search Using MapReduce

MapReduce: Review and open challenges