research-article

Estimating Clustering Coefficients and Size of Social Networks via Random Walk

Authors:
Liran Katzir

Microsoft Research, Advanced Technology Labs, Herzliya, Israel

Microsoft Research, Advanced Technology Labs, Herzliya, Israel
View Profile

,
Stephen J. Hardiman

Research was conducted while the author was unaffiliated, Paris, France

Research was conducted while the author was unaffiliated, Paris, France
View Profile

Authors Info & Claims

ACM Transactions on the Web Volume 9 Issue 4Article No.: 19pp 1–20https://doi.org/10.1145/2790304

Published:28 September 2015Publication History

ACM Transactions on the Web

Abstract

This work addresses the problem of estimating social network measures. Specifically, the measures at hand are the network average and global clustering coefficients and the number of registered users. The algorithms at hand (1) assume no prior knowledge about the network and (2) access the network using only the publicly available interface. More precisely, this work provides (a) a unified approach for clustering coefficients estimation and (b) a new network size estimator. The unified approach for the clustering coefficients yields the first external access algorithm for estimating the global clustering coefficient. The new network size estimator offers improved accuracy compared to prior art estimators.

Our approach is to view a social network as an undirected graph and use the public interface to retrieve a random walk. To estimate the clustering coefficient, the connectivity of each node in the random walk sequence is tested in turn. We show that the error drops exponentially in the number of random walk steps. For the network size estimation we offer a generalized view of prior art estimators that in turn yields an improved estimator. All algorithms are validated on several publicly available social network datasets.

References

Louigi Addario-Berry and Tao Lei. 2012. The mixing time of the Newman--Watts small world. In SODA. 1661--1668. Google ScholarDigital Library
Yong-Yeol Ahn, Seungyeop Han, Haewoon Kwak, Sue B. Moon, and Hawoong Jeong. 2007. Analysis of topological characteristics of huge online social networking services. In WWW. 835--844. Google ScholarDigital Library
Noga Alon, Raphael Yuster, and Uri Zwick. 1997. Finding and counting given length cycles. Algorithmica 17, 3 (1997), 209--223.Google ScholarCross Ref
Haim Avron. 2010. Counting triangles in large graphs using randomized matrix trace estimation. In Large-Scale Data Mining: Theory and Applications (KDD Workshop).Google Scholar
Lars Backstrom, Daniel P. Huttenlocher, Jon M. Kleinberg, and Xiangyang Lan. 2006. Group formation in large social networks: Membership, growth, and evolution. In KDD. 44--54. Google ScholarDigital Library
Ziv Bar-Yossef and Maxim Gurevich. 2008. Random sampling from a search engine’s index. J. ACM 55, 5, Article 24 (Oct. 2008). Google ScholarDigital Library
Ziv Bar-Yossef and Maxim Gurevich. 2009. Estimating the impressionrank of web pages. In WWW. 41--50. Google ScholarDigital Library
Ziv Bar-Yossef and Maxim Gurevich. 2011. Efficient search engine measurements. TWEB 5, 4, Article 18 (Oct 2011). Google ScholarDigital Library
Luca Becchetti, Paolo Boldi, Carlos Castillo, and Aristides Gionis. 2010. Efficient algorithms for large-scale local triangle counting. TKDD 4, 3, Article 13 (Oct. 2010). Google ScholarDigital Library
Luciana S. Buriol, Gereon Frahling, Stefano Leonardi, Alberto Marchetti-Spaccamela, and Christian Sohler. 2006. Counting triangles in data streams. In PODS. 253--262. Google ScholarDigital Library
Kai-Min Chung, Henry Lam, Zhenming Liu, and Michael Mitzenmacher. 2012. Chernoff-Hoeffding bounds for Markov chains: Generalized and simplified. In STACS. 124--135.Google Scholar
Luciano da F. Costa, Francisco A. Rodrigues, Gonzalo Travieso, and Paulino R. Villas Boas. 2006. Characterization of complex networks: A survey of measurements. Adv. Phys. 56, 1 (Aug. 2006), 167--242. http://dx.doi.org/10.1080/00018730601170527Google Scholar
Bradley Efron and Robert J. Tibshirani. 1993. An Introduction to the Bootstrap. Chapman & Hall, New York.Google Scholar
Minas Gjoka, Maciej Kurant, Carter T. Butts, and Athina Markopoulou. 2010. Walking in Facebook: A case study of unbiased sampling of OSNs. Proceedings of IEEE INFOCOM 2010, 1--9. Google ScholarDigital Library
Minas Gjoka, Maciej Kurant, and Athina Markopoulou. 2013. 2.5K-Graphs: From sampling to generation. In Proceedings of IEEE INFOCOM’13.Google ScholarCross Ref
Stephen James Hardiman, Peter Richmond, and Stefan Hutzler. 2009. Calculating statistics of complex networks through random walks with an application to the on-line social network Bebo. Eur. Phys. J. B 71, 4 (2009), 611--622.Google ScholarCross Ref
Wolfgang Härdle, Joel Horowitz, and Jens Peter Kreiss. 2003. Bootstrap methods for time series. Int. Stat. Rev. 71, 2 (Aug. 2003), 435--459.Google ScholarCross Ref
Liran Katzir, Edo Liberty, and Oren Somekh. 2011. Estimating sizes of social networks via biased sampling. In WWW. 597--606. Google ScholarDigital Library
Jerome Kunegis. 2012. KONECT—The Koblenz Network Collection. http://konect.uni-koblenz.de/.Google Scholar
Hans R. Künsch. 1989. The jackknife and the bootstrap for general stationary observations. Ann. Stat. 17, 1217--1241.Google ScholarCross Ref
Maciej Kurant, Carter T. Butts, and Athina Markopoulou. 2012. Graph size estimation. CoRR abs/1210.0460.Google Scholar
David A. Levin, Yuval Peres, and Elizabeth L. Wilmer. 2008. Markov Chains and Mixing Times. American Mathematical Society.Google Scholar
Michael Ley. 2002. The DBLP computer science bibliography: Evolution, research issues, perspectives. In Proceedings of the International Symposium on String Processing and Information Retrieval. 1--10. Google ScholarDigital Library
László Lovász and Peter Winkler. 1998. Mixing times. Microsurveys in discrete. In DimacsWorkshop.Google Scholar
Laurent Massoulié, Erwan Le Merrer, Anne-Marie Kermarrec, and Ayalvadi Ganesh. 2006. Peer counting and sampling in overlay networks: Random walk methods. In Proceedings of the 25th Annual ACM Symposium on Principles of Distributed Computing (PODC’06). ACM, New York, NY, 123--132. DOI:http://dx.doi.org/10.1145/1146381.1146402 Google ScholarDigital Library
Alan Mislove, Hema Swetha Koppula, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. 2008. Growth of the flickr social network. In Proceedings of the 1st ACM SIGCOMM Workshop on Social Networks (WOSN’08). Google ScholarDigital Library
Alan Mislove, Massimiliano Marcon, P. Krishna Gummadi, Peter Druschel, and Bobby Bhattacharjee. 2007. Measurement and analysis of online social networks. In Internet Measurement Comference. 29--42. Google ScholarDigital Library
Abedelaziz Mohaisen, Aaram Yun, and Yongdae Kim. 2010. Measuring the mixing time of social graphs. In Internet Measurement Conference. 383--389. Google ScholarDigital Library
Mark E. J. Newman and Duncan J. Watts. 1999a. Renormalization group analysis of the small-world network model. Phys. Lett. A 263, 341--346.Google ScholarCross Ref
Mark E. J. Newman and Duncan J. Watts. 1999b. Scaling and percolation in the small-world network model. Phys. Rev. E 60, 7332--7342.Google ScholarCross Ref
Bruno F. Ribeiro and Donald F. Towsley. 2010. Estimating and sampling graphs with multidimensional random walks. In Internet Measurement Conference. 390--403. Google ScholarDigital Library
Reuven Y. Rubinstein and Dirk P. Kroese. 2007. Simulation and the Monte Carlo Method (2nd. ed.). Wiley Series in Probability and Statistics.Google Scholar
Thomas Schank and Dorothea Wagner. 2005. Approximating clustering coefficient and transitivity. J. Graph Algorithms Appl. 9, 2 (2005), 265--275.Google ScholarCross Ref
Pinghui Wang, John C. S. Lui, Bruno F. Ribeiro, Don Towsley, Junzhou Zhao, and Xiaohong Guan. 2014. Efficiently estimating motif statistics of large networks. TKDD 9, 2, Article 8 (Nov. 2014). DOI:http://dx.doi.org/10.1145/2629564 Google ScholarDigital Library
Shaozhi Ye and Felix Wu. 2010. Estimating the size of online social networks. In 2010 IEEE 2nd International Conference on Social Computing (SocialCom). 169--176. DOI:http://dx.doi.org/10.1109/SocialCom.2010.32 Google ScholarDigital Library

Index Terms

Estimating Clustering Coefficients and Size of Social Networks via Random Walk
1. Theory of computation
  1. Design and analysis of algorithms

Recommendations

Estimating sizes of social networks via biased sampling
WWW '11: Proceedings of the 20th international conference on World wide web

Online social networks have become very popular in recent years and their number of users is already measured in many hundreds of millions. For various commercial and sociological purposes, an independent estimate of their sizes is important. In this ...
Read More
Estimating clustering coefficients and size of social networks via random walk
WWW '13: Proceedings of the 22nd international conference on World Wide Web

Online social networks have become a major force in today's society and economy. The largest of today's social networks may have hundreds of millions to more than a billion users. Such networks are too large to be downloaded or stored locally, even if ...
Read More
Estimating Properties of Social Networks via Random Walk considering Private Nodes
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Accurately analyzing graph properties of social networks is a challenging task because of access limitations to the graph data. To address this challenge, several algorithms to obtain unbiased estimates of properties from few samples via a random walk ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on the Web Volume 9, Issue 4
October 2015
114 pages
ISSN:1559-1131
EISSN:1559-114X
DOI:10.1145/2830542
Editors:
Brian D. Davison
Lehigh University, USA
,
Marianne Winslett
University of Illinois at Urbana-Champaign
Issue’s Table of Contents
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 September 2015
- Accepted: 1 June 2015
- Revised: 1 April 2015
- Received: 1 November 2014
Published in tweb Volume 9, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Estimation
clustering coefficient
sampling
social network
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 28
  Total Citations
  View Citations
- 516
  Total Downloads
- Downloads (Last 12 months)20
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Estimating Clustering Coefficients and Size of Social Networks via Random Walk

ACM Transactions on the Web

Abstract

References

Cited By

Index Terms

Recommendations

Estimating sizes of social networks via biased sampling

Estimating clustering coefficients and size of social networks via random walk

Estimating Properties of Social Networks via Random Walk considering Private Nodes