ABSTRACT
Triangle counting problem is one of the fundamental problem in various domains. The problem can be utilized for computation of clustering coefficient, transitivity, trianglular connectivity, trusses, etc. The problem have been extensively studied in internal memory but the algorithms are not scalable for enormous graphs. In recent years, the MapReduce has emerged as a de facto standard framework for processing large data through parallel computing. A MapReduce algorithm was proposed for the problem based on graph partitioning. However, the algorithm redundantly generates a large number of intermediate data that cause network overload and prolong the processing time. In this paper, we propose a new algorithm based on graph partitioning with a novel idea of triangle classification to count the number of triangles in a graph. The algorithm substantially reduces the duplication by classifying triangles into three types and processing each triangle differently according to its type. In the experiments, we compare the proposed algorithm with recent existing algorithms using both synthetic datasets and real-world datasets that are composed of millions of nodes and billions of edges. The proposed algorithm outperforms other algorithms in most cases. Especially, for a twitter dataset, the proposed algorithm is more than twice as fast as existing MapReduce algorithms. Moreover, the performance gap increases as the graph becomes larger and denser.
- http://newsroom.fb.com/Key-Facts.Google Scholar
- http://snap.stanford.edu/.Google Scholar
- http://an.kaist.ac.kr/pub date.html.Google Scholar
- N. Alon, R. Yuster, and U. Zwick. Finding and counting given length cycles. Algorithmica, 17(3):209--223, 1997.Google ScholarCross Ref
- A.-L. Barabási and R. Albert. Emergence of scaling in random networks. science, 286(5439):509--512, 1999.Google Scholar
- L. Becchetti, P. Boldi, C. Castillo, and A. Gionis. Efficient semi-streaming algorithms for local triangle counting in massive graphs. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 16--24. ACM, 2008. Google ScholarDigital Library
- J. W. Berry, B. Hendrickson, R. A. LaViolette, and C. A. Phillips. Tolerating the community detection resolution limit with edge weighting. Physical Review E, 83:056119, 2011.Google ScholarCross Ref
- S. Chu and J. Cheng. Triangle listing in massive networks and its applications. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 672--680, 2011. Google ScholarDigital Library
- J. Cohen. Graph twiddling in a mapreduce world. Computing in Science & Engineering, 11(4):29--41, 2009. Google ScholarDigital Library
- D. Coppersmith and S. Winograd. Matrix multiplication via arithmetic progressions. Journal of symbolic computation, 9(3):251--280, 1990. Google ScholarDigital Library
- J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008. Google ScholarDigital Library
- R. Dementiev. Algorithm engineering for large data sets. PhD thesis, Doktorarbeit, Universität des Saarlandes, 2006.Google Scholar
- J.-P. Eckmann and E. Moses. Curvature of co-links uncovers hidden thematic layers in the world wide web. Proceedings of the national academy of sciences, 99(9):5825--5829, 2002.Google ScholarCross Ref
- X. Hu, Y. Tao, and C.-W. Chung. Massive graph triangulation. In Proceedings of the 2013 ACM SIGMOD international conference on Management Of data, pages 325--336, 2013. Google ScholarDigital Library
- A. Itai and M. Rodeh. Finding a minimum circuit in a graph. SIAM Journal on Computing, 7(4):413--423, 1978.Google ScholarDigital Library
- U. Kang, C. E. Tsourakakis, and C. Faloutsos. Pegasus: A peta-scale graph mining system implementation and observations. In Ninth IEEE International Conference on Data Mining, pages 229--238, 2009. Google ScholarDigital Library
- M. Latapy. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theoretical Computer Science, 407(1):458--473, 2008. Google ScholarDigital Library
- B. Menegola. An external memory algorithm for listing triangles. Technical report, Universidade Federal do Rio Grande do Sul, 2010.Google Scholar
- J. Myung and S.-g. Lee. Matrix chain multiplication via multi-way join algorithms in mapreduce. In Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication, pages 53:1--53:5, 2012. Google ScholarDigital Library
- T. Opsahl and P. Panzarasa. Clustering in weighted networks. Social networks, 31(2):155--163, 2009.Google ScholarCross Ref
- T. Schank and D. Wagner. Finding, counting and listing all triangles in large graphs, an experimental study. In Experimental and Efficient Algorithms, pages 606--609. Springer, 2005. Google ScholarDigital Library
- S. Suri and S. Vassilvitskii. Counting triangles and the curse of the last reducer. In Proceedings of the 20th international conference on World wide web, pages 607--614, 2011. Google ScholarDigital Library
- C. E. Tsourakakis, U. Kang, G. L. Miller, and C. Faloutsos. Doulion: counting triangles in massive graphs with a coin. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 837--846, 2009. Google ScholarDigital Library
- J. Ugander, B. Karrer, L. Backstrom, and C. Marlow. The anatomy of the facebook social graph. CoRR, abs/1111.4503, 2011.Google Scholar
- D. J. Watts and S. H. Strogatz. Collective dynamics of 'small-world'networks. nature, 393(6684):440--442, 1998.Google Scholar
- T. White. Hadoop: The definitive guide. O'Reilly Media, Inc., 2012. Google ScholarDigital Library
- Z. Yang, C. Wilson, X. Wang, T. Gao, B. Y. Zhao, and Y. Dai. Uncovering social network sybils in the wild. In Proceedings of the 2011 ACM SIGCOMM conference on Intern Google ScholarDigital Library
Index Terms
- An efficient MapReduce algorithm for counting triangles in a very large graph
Recommendations
Scalable big graph processing in MapReduce
SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of DataMapReduce has become one of the most popular parallel computing paradigms in cloud, due to its high scalability, reliability, and fault-tolerance achieved for a large variety of applications in big data processing. In the literature, there are MapReduce ...
Efficient Large-Scale Multi-graph Similarity Search Using MapReduce
Web Information Systems and ApplicationsAbstractA multi-graph is a set consisting of multiple graphs. Multi-graph similarity search aims to find the multi-graphs similar to the query multi-graphs from the multi-graph datasets. It plays important role in a wide range of application fields, such ...
MapReduce: Review and open challenges
The continuous increase in computational capacity over the past years has produced an overwhelming flow of data or big data, which exceeds the capabilities of conventional processing tools. Big data signify a new era in data exploration and utilization. ...
Comments