Abstract
Given a graph stream, how can we estimate the number of triangles in it using multiple machines with limited storage?
Counting triangles (i.e., cycles of length three) is a classical graph problem whose importance has been recognized in diverse fields, including data mining, social network analysis, and databases. Recently, for triangle counting in massive graphs, two approaches have been intensively studied. One approach is streaming algorithms, which estimate the count of triangles incrementally in time-evolving graphs or in large graphs only part of which can be stored. The other approach is distributed algorithms for utilizing computational power and storage of multiple machines.
Can we have the best of both worlds? We propose Tri-Fly, the first distributed streaming algorithm for approximate triangle counting. Making one pass over a graph stream, Tri-Fly rapidly and accurately estimates the counts of global triangles and local triangles incident to each node. Compared to state-of-the-art single-machine streaming algorithms, Tri-Fly is (a) Accurate: yields up to 4.5\(\times \) smaller estimation error, (b) Fast: runs up to 8.8\(\times \) faster with linear scalability, and (c) Theoretically sound: gives unbiased estimates with smaller variances.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
e.g., WRS [15] can be used instead if edges are streamed in the chronological order.
References
Supplementary document (2018). http://www.cs.cmu.edu/~kijungs/codes/trifly/supple.pdf
Ahmed, N.K., Duffield, N., Neville, J., Kompella, R.: Graph sample and hold: a framework for big-graph analytics. In: KDD (2014)
Arifuzzaman, S., Khan, M., Marathe, M.: PATRIC: a parallel algorithm for counting triangles in massive networks. In: CIKM (2013)
Bar-Yossef, Z., Kumar, R., Sivakumar, D.: Reductions in streaming algorithms, with an application to counting triangles in graphs. In: SODA (2002)
Becchetti, L., Boldi, P., Castillo, C., Gionis, A.: Efficient algorithms for large-scale local triangle counting. TKDD 4(3), 13 (2010)
Cohen, J.: Graph twiddling in a MapReduce world. Comput. Sci. Eng. 11(4), 29–41 (2009)
De Stefani, L., Epasto, A., Riondato, M., Upfal, E.: TRIEST: counting local and global triangles in fully-dynamic streams with fixed memory size. In: KDD (2016)
Eckmann, J.P., Moses, E.: Curvature of co-links uncovers hidden thematic layers in the world wide web. PNAS 99(9), 5825–5829 (2002)
Jha, M., Seshadhri, C., Pinar, A.: A space efficient streaming algorithm for triangle counting using the birthday paradox. In: KDD (2013)
Kutzkov, K., Pagh, R.: On the streaming complexity of computing local clustering coefficients. In: WSDM (2013)
Lim, Y., Kang, U.: MASCOT: memory-efficient and accurate sampling for counting local triangles in graph streams. In: KDD (2015)
Park, H.M., Myaeng, S.H., Kang, U.: PTE: enumerating trillion triangles on distributed systems. In: KDD (2016)
Pavan, A., Tangwongan, K., Tirthapura, S.: Parallel and distributed triangle counting on graph streams. Technical report, IBM (2013)
Pavan, A., Tangwongsan, K., Tirthapura, S., Wu, K.L.: Counting and sampling triangles from a graph stream. PVLDB 6(14), 1870–1881 (2013)
Shin, K.: WRS: waiting room sampling for accurate triangle counting in real graph streams. In: ICDM (2017)
Shin, K., Eliassi-Rad, T., Faloutsos, C.: Patterns and anomalies in k-cores of real-world graphs with applications. Knowl. Inf. Syst. 54(3), 677–710 (2018)
Suri, S., Vassilvitskii, S.: Counting triangles and the curse of the last reducer. In: WWW (2011)
Tsourakakis, C.E., Kang, U., Miller, G.L., Faloutsos, C.: DOULION: counting triangles in massive graphs with a coin. In: KDD (2009)
Wang, J., Cheng, J.: Truss decomposition in massive networks. PVLDB 5(9), 812–823 (2012)
Wasserman, S., Faust, K.: Social Network Analysis: Methods and Applications, vol. 8. Cambridge University Press, Cambridge (1994)
Acknowledgements
This material is based upon work supported by the National Science Foundation under Grants No. CNS-1314632 and IIS-1408924. Research was sponsored by the Army Research Laboratory and was accomplished under Cooperative Agreement Number W911NF-09-2-0053. This publication was made possible by NPRP grant # 7-1330-2-483 from the Qatar National Research Fund (a member of Qatar Foundation). Shin was supported by KFAS Scholarship. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, or other funding parties. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Shin, K., Hammoud, M., Lee, E., Oh, J., Faloutsos, C. (2018). Tri-Fly: Distributed Estimation of Global and Local Triangle Counts in Graph Streams. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10939. Springer, Cham. https://doi.org/10.1007/978-3-319-93040-4_51
Download citation
DOI: https://doi.org/10.1007/978-3-319-93040-4_51
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93039-8
Online ISBN: 978-3-319-93040-4
eBook Packages: Computer ScienceComputer Science (R0)