Abstract
Recent results [5, 9, 23] prove that edge partitioning approaches (also known as vertex-cut) outperform vertex partitioning (edge-cut) approaches for computations on large and skewed graphs like social networks. These vertex-cut approaches generally avoid unbalanced computation due to the power-law degree distribution problem. However, these methods, like evenly random assigning [23] or greedy assignment strategy [9], are generic and do not consider any computation pattern for specific graph algorithm. We propose in this paper a vertex-cut partitioning dedicated to random walks algorithms which takes advantage of graph topological properties. It relies on a blocks approach which captures local communities. Our split and merge algorithms allow to achieve load balancing of the workers and to maintain it dynamically. Our experiments illustrate the benefit of our partitioning since it significantly reduce the communication cost when performing random walks-based algorithms compared with existing approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
See details and implementations at http://spark.apache.org/docs/latest/api/scala/index.html.
References
Andersen, R., Chung, F., Lang, K.: Local graph partitioning using PageRank vectors. In: FOCS, pp. 475–486 (2006)
Apache. Giraph. http://giraph.apache.org
Bahmani, B., Chakrabarti, K., Xin, D.: Fast personalized PageRank on MapReduce. In: SIGMOD, pp. 973–984 (2011)
Bahmani, B., Chowdhury, A., Goel, A.: Fast incremental and personalized PageRank. Proc. VLDB Endow. 4(3), 173–184 (2010)
Bourse, F., Lelarge, M., Vojnovic, M.: Balanced graph edge partition. In: SIGKDD, pp. 1456–1465 (2014)
Chierichetti, F., Kumar, R., Lattanzi, S., Mitzenmacher, M., Panconesi, A., Raghavan, P.: On compressing social networks. In: SIGKDD, pp. 219–228 (2009)
Dahimene, R., Constantin, C., du Mouza, C.: RecLand: a recommender system for social networks. In: CIKM, pp. 2063–2065 (2014)
Gleich, D.F., Seshadhri, C.: Vertex neighborhoods, low conductance cuts, and good seeds for local community methods. In: SIGKDD, pp. 597–605 (2012)
Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: PowerGraph: distributed graph-parallel computation on natural graphs. In: OSDI, pp. 17–30 (2012)
Jeh, G., Widom, J.: Scaling personalized web search. In: WWW, pp. 271–279 (2003)
Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998)
Kernighan, B.W., Lin, S.: An efficient heuristic procedure for partitioning graphs. Bell Syst. Techn. J. 49(2), 291–307 (1970)
Leskovec, J., Lang, K.J., Dasgupta, A., Mahoney, M.W.: Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Math. 6, 29–123 (2008)
Low, Y., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., Hellerstein, J.M.: Distributed GraphLab: a framework for machine learning and data mining in the cloud. VLDB Endow. 5(8), 716–727 (2012)
Lubos Takac, M.Z.: Data analysis in public social networks. In: Present Day Trends of Innovations, pp. 1–6 (2012)
Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: SIGMOD, pp. 135–146 (2010)
Newman, M., Barabasi, A.-L., Watts, D.J., Structure, T.: Dynamics of Networks: (Princeton Studies in Complexity). Princeton University Press, Princeton (2006)
Roy, A., Bindschaedler, L., Malicevic, J., Zwaenepoel, W.: Chaos: scale-out graph processing from secondary storage. In: SOSP, pp. 410–424 (2015)
Salihoglu, S., Widom, J.: GPS: a graph processing system. In: SSDBM, pp. 22:1–22:12 (2013)
Sarkar, P., Moore, A.W.: Fast nearest-neighbor search in disk-resident graphs. In: SIGKDD, pp. 513–522 (2010)
Valiant, L.G.: A bridging model for multi-core computing. J. Comput. Syst. Sci. 77(1), 154–166 (2011)
Whang, J.J., Gleich, D.F., Dhillon, I.S.: Overlapping community detection using seed set expansion. In: CIKM, pp. 2099–2108 (2013)
Xin, R.S., Gonzalez, J.E., Franklin, M.J., Stoica, I.: GraphX: a resilient distributed graph system on spark. In: GRADES, pp. 1–6 (2013)
Yang, S., Yan, X., Zong, B., Khan, A.: Towards effective partition management for large graphs. In: SIGMOD, pp. 517–528 (2012)
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI, p. 2 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Li, Y., Constantin, C., du Mouza, C. (2016). A Block-Based Edge Partitioning for Random Walks Algorithms over Large Social Graphs. In: Cellary, W., Mokbel, M., Wang, J., Wang, H., Zhou, R., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2016. WISE 2016. Lecture Notes in Computer Science(), vol 10042. Springer, Cham. https://doi.org/10.1007/978-3-319-48743-4_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-48743-4_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48742-7
Online ISBN: 978-3-319-48743-4
eBook Packages: Computer ScienceComputer Science (R0)