skip to main content
10.1145/2433396.2433424acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Sharding social networks

Published: 04 February 2013 Publication History

Abstract

Online social networking platforms regularly support hundreds of millions of users, who in aggregate generate substantially more data than can be stored on any single physical server. As such, user data are distributed, or sharded, across many machines. A key requirement in this setting is rapid retrieval not only of a given user's information, but also of all data associated with his or her social contacts, suggesting that one should consider the topology of the social network in selecting a sharding policy. In this paper we formalize the problem of efficiently sharding large social network databases, and evaluate several sharding strategies, both analytically and empirically. We find that random sharding---the de facto standard---results in provably poor performance even when frequently accessed nodes are replicated to many shards. By contrast, we demonstrate that one can substantially reduce querying costs by identifying and assigning tightly knit communities to shards. In particular, our theoretical analysis motivates a novel, scalable sharding algorithm that outperforms both random and location-based sharding schemes.

References

[1]
S. Agarwal, J. Dunagan, N. Jain, S. Saroiu, A. Wolman, and H. Bhogan. Volley: Automated data placement for geo-distributed cloud services. In Seventh USDI, pages 2--2, San Jose, CA, 2010.
[2]
C. M. Bishop. Pattern Recognition and Machine Learning. Springer, New York, 2006.
[3]
B. Bollobas. Random Graphs, volume 73. Cambridge University Press, 2001.
[4]
I. Dhillon, Y. Guan, and B. Kulis. A fast kernel-based multilevel algorithm for graph clustering. In Eleventh ACM SIGKDD, Chicago, IL, 2005.
[5]
M. Garey and D. Johnson. Computers and Intractability: A Guide to the Theory of NP-completeness. WH Freeman & Co. New York, NY, 1979.
[6]
J. M. Hofman and C. H. Wiggins. Bayesian approach to network modularity. Physical Review Letters, 100:258701, June 2008.
[7]
P. Holland. Local structure in social networks. Sociological Methodology, 7:1--45, 1976.
[8]
T. Karagiannis, C. Gkantsidis, D. Narayanan, and A. Rowstron. Hermes: clustering users in large-scale e-mail services. In First ACM Symposium on Cloud Computing, pages 89--100, Indianapolis, IN, 2010.
[9]
G. Karypis and V. Kumar. Multilevel algorithms for multi-constraint graph partitioning. In ACM/IEEE Conference on Supercomputing, pages 1--13, San Jose, CA, 1998.
[10]
H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In Nineteenth ACM WWW Conference, pages 591--600, Raleigh, NC, 2010.
[11]
J. Leskovec, K. Lang, A. Dasgupta, and M. Mahoney. Statistical properties of community structure in large social and information networks. In Seventeenth WWW Conference, pages 695-704, Beijing, China, 2008.
[12]
D. Liben-Nowell, J. Novak, R. Kumar, P. Raghavan, and A. Tomkins. Geographic routing in social networks. Proceedings of the National Academy of Sciences, 102(33):11623-11628, Aug. 2005.
[13]
J. M. Pujol, V. Erramilli, G. Siganos, X. Yang, N. Laoutaris, P. Chhabra, and P. Rodriguez. The little engine(s) that could: scaling online social networks. In ACM SIGCOMM Conference, pages 375--386, New Delhi, India, 2010.
[14]
U. N. Raghavan, R. Albert, and S. Kumara. Near linear time algorithm to detect community structures in large-scale networks. Physical Review E, 76:036106, Sep 2007.
[15]
V. Satuluri, S. Parthasarathy, and Y. Ruan. Local graph sparsification for scalable clustering. In ACM SIGMOD Conference, pages 721--732, Athens, Greece, 2011.
[16]
J. Ugander and L. Backstrom. Balanced label propagation for partitioning massive graphs. In Sixth ACM WSDM, 2013.

Cited By

View all
  • (2024)Exact Vertex Migration Model of Graph Partitioning Based on Mixed 0–1 Linear Programming and Iteration AlgorithmJournal of the Operations Research Society of China10.1007/s40305-023-00534-9Online publication date: 24-Jan-2024
  • (2023)Memcached vs Redis Caching Optimization Comparison using Machine Learning2023 2nd International Conference on Automation, Computing and Renewable Systems (ICACRS)10.1109/ICACRS58579.2023.10404339(1153-1159)Online publication date: 11-Dec-2023
  • (2021)SDP: Scalable Real-time Dynamic Graph PartitionerIEEE Transactions on Services Computing10.1109/TSC.2021.3137932(1-1)Online publication date: 2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '13: Proceedings of the sixth ACM international conference on Web search and data mining
February 2013
816 pages
ISBN:9781450318693
DOI:10.1145/2433396
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 February 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. community detection
  2. sharding
  3. social networks

Qualifiers

  • Research-article

Conference

WSDM 2013

Acceptance Rates

Overall Acceptance Rate 431 of 2,495 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)2
Reflects downloads up to 10 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Exact Vertex Migration Model of Graph Partitioning Based on Mixed 0–1 Linear Programming and Iteration AlgorithmJournal of the Operations Research Society of China10.1007/s40305-023-00534-9Online publication date: 24-Jan-2024
  • (2023)Memcached vs Redis Caching Optimization Comparison using Machine Learning2023 2nd International Conference on Automation, Computing and Renewable Systems (ICACRS)10.1109/ICACRS58579.2023.10404339(1153-1159)Online publication date: 11-Dec-2023
  • (2021)SDP: Scalable Real-time Dynamic Graph PartitionerIEEE Transactions on Services Computing10.1109/TSC.2021.3137932(1-1)Online publication date: 2021
  • (2021)Optimizing Iterative Algorithms for Social Network Sharding2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9671621(400-408)Online publication date: 15-Dec-2021
  • (2020)Prioritized Restreaming Algorithms for Balanced Graph PartitioningProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3394486.3403239(1877-1887)Online publication date: 23-Aug-2020
  • (2020)Towards quantum computing based community detectionComputer Science Review10.1016/j.cosrev.2020.10031338(100313)Online publication date: Nov-2020
  • (2019)HenosisProceedings of the ACM International Conference on Supercomputing10.1145/3330345.3330380(392-402)Online publication date: 26-Jun-2019
  • (2019)Capacity Aware Consistent Hashing on the Cloud Using Cryptographic HashesProceedings of ICETIT 201910.1007/978-3-030-30577-2_49(561-574)Online publication date: 24-Sep-2019
  • (2018)REHDFSJournal of Network and Computer Applications10.1016/j.jnca.2017.11.017103:C(85-100)Online publication date: 1-Feb-2018
  • (2017)A Pareto Framework for Data Analytics on Heterogeneous Systems: Implications for Green Energy Usage and Performance2017 46th International Conference on Parallel Processing (ICPP)10.1109/ICPP.2017.62(533-542)Online publication date: Aug-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media