Abstract
This work addresses the problem of estimating social network measures. Specifically, the measures at hand are the network average and global clustering coefficients and the number of registered users. The algorithms at hand (1) assume no prior knowledge about the network and (2) access the network using only the publicly available interface. More precisely, this work provides (a) a unified approach for clustering coefficients estimation and (b) a new network size estimator. The unified approach for the clustering coefficients yields the first external access algorithm for estimating the global clustering coefficient. The new network size estimator offers improved accuracy compared to prior art estimators.
Our approach is to view a social network as an undirected graph and use the public interface to retrieve a random walk. To estimate the clustering coefficient, the connectivity of each node in the random walk sequence is tested in turn. We show that the error drops exponentially in the number of random walk steps. For the network size estimation we offer a generalized view of prior art estimators that in turn yields an improved estimator. All algorithms are validated on several publicly available social network datasets.
- Louigi Addario-Berry and Tao Lei. 2012. The mixing time of the Newman--Watts small world. In SODA. 1661--1668. Google ScholarDigital Library
- Yong-Yeol Ahn, Seungyeop Han, Haewoon Kwak, Sue B. Moon, and Hawoong Jeong. 2007. Analysis of topological characteristics of huge online social networking services. In WWW. 835--844. Google ScholarDigital Library
- Noga Alon, Raphael Yuster, and Uri Zwick. 1997. Finding and counting given length cycles. Algorithmica 17, 3 (1997), 209--223.Google ScholarCross Ref
- Haim Avron. 2010. Counting triangles in large graphs using randomized matrix trace estimation. In Large-Scale Data Mining: Theory and Applications (KDD Workshop).Google Scholar
- Lars Backstrom, Daniel P. Huttenlocher, Jon M. Kleinberg, and Xiangyang Lan. 2006. Group formation in large social networks: Membership, growth, and evolution. In KDD. 44--54. Google ScholarDigital Library
- Ziv Bar-Yossef and Maxim Gurevich. 2008. Random sampling from a search engine’s index. J. ACM 55, 5, Article 24 (Oct. 2008). Google ScholarDigital Library
- Ziv Bar-Yossef and Maxim Gurevich. 2009. Estimating the impressionrank of web pages. In WWW. 41--50. Google ScholarDigital Library
- Ziv Bar-Yossef and Maxim Gurevich. 2011. Efficient search engine measurements. TWEB 5, 4, Article 18 (Oct 2011). Google ScholarDigital Library
- Luca Becchetti, Paolo Boldi, Carlos Castillo, and Aristides Gionis. 2010. Efficient algorithms for large-scale local triangle counting. TKDD 4, 3, Article 13 (Oct. 2010). Google ScholarDigital Library
- Luciana S. Buriol, Gereon Frahling, Stefano Leonardi, Alberto Marchetti-Spaccamela, and Christian Sohler. 2006. Counting triangles in data streams. In PODS. 253--262. Google ScholarDigital Library
- Kai-Min Chung, Henry Lam, Zhenming Liu, and Michael Mitzenmacher. 2012. Chernoff-Hoeffding bounds for Markov chains: Generalized and simplified. In STACS. 124--135.Google Scholar
- Luciano da F. Costa, Francisco A. Rodrigues, Gonzalo Travieso, and Paulino R. Villas Boas. 2006. Characterization of complex networks: A survey of measurements. Adv. Phys. 56, 1 (Aug. 2006), 167--242. http://dx.doi.org/10.1080/00018730601170527Google Scholar
- Bradley Efron and Robert J. Tibshirani. 1993. An Introduction to the Bootstrap. Chapman & Hall, New York.Google Scholar
- Minas Gjoka, Maciej Kurant, Carter T. Butts, and Athina Markopoulou. 2010. Walking in Facebook: A case study of unbiased sampling of OSNs. Proceedings of IEEE INFOCOM 2010, 1--9. Google ScholarDigital Library
- Minas Gjoka, Maciej Kurant, and Athina Markopoulou. 2013. 2.5K-Graphs: From sampling to generation. In Proceedings of IEEE INFOCOM’13.Google ScholarCross Ref
- Stephen James Hardiman, Peter Richmond, and Stefan Hutzler. 2009. Calculating statistics of complex networks through random walks with an application to the on-line social network Bebo. Eur. Phys. J. B 71, 4 (2009), 611--622.Google ScholarCross Ref
- Wolfgang Härdle, Joel Horowitz, and Jens Peter Kreiss. 2003. Bootstrap methods for time series. Int. Stat. Rev. 71, 2 (Aug. 2003), 435--459.Google ScholarCross Ref
- Liran Katzir, Edo Liberty, and Oren Somekh. 2011. Estimating sizes of social networks via biased sampling. In WWW. 597--606. Google ScholarDigital Library
- Jerome Kunegis. 2012. KONECT—The Koblenz Network Collection. http://konect.uni-koblenz.de/.Google Scholar
- Hans R. Künsch. 1989. The jackknife and the bootstrap for general stationary observations. Ann. Stat. 17, 1217--1241.Google ScholarCross Ref
- Maciej Kurant, Carter T. Butts, and Athina Markopoulou. 2012. Graph size estimation. CoRR abs/1210.0460.Google Scholar
- David A. Levin, Yuval Peres, and Elizabeth L. Wilmer. 2008. Markov Chains and Mixing Times. American Mathematical Society.Google Scholar
- Michael Ley. 2002. The DBLP computer science bibliography: Evolution, research issues, perspectives. In Proceedings of the International Symposium on String Processing and Information Retrieval. 1--10. Google ScholarDigital Library
- László Lovász and Peter Winkler. 1998. Mixing times. Microsurveys in discrete. In DimacsWorkshop.Google Scholar
- Laurent Massoulié, Erwan Le Merrer, Anne-Marie Kermarrec, and Ayalvadi Ganesh. 2006. Peer counting and sampling in overlay networks: Random walk methods. In Proceedings of the 25th Annual ACM Symposium on Principles of Distributed Computing (PODC’06). ACM, New York, NY, 123--132. DOI:http://dx.doi.org/10.1145/1146381.1146402 Google ScholarDigital Library
- Alan Mislove, Hema Swetha Koppula, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. 2008. Growth of the flickr social network. In Proceedings of the 1st ACM SIGCOMM Workshop on Social Networks (WOSN’08). Google ScholarDigital Library
- Alan Mislove, Massimiliano Marcon, P. Krishna Gummadi, Peter Druschel, and Bobby Bhattacharjee. 2007. Measurement and analysis of online social networks. In Internet Measurement Comference. 29--42. Google ScholarDigital Library
- Abedelaziz Mohaisen, Aaram Yun, and Yongdae Kim. 2010. Measuring the mixing time of social graphs. In Internet Measurement Conference. 383--389. Google ScholarDigital Library
- Mark E. J. Newman and Duncan J. Watts. 1999a. Renormalization group analysis of the small-world network model. Phys. Lett. A 263, 341--346.Google ScholarCross Ref
- Mark E. J. Newman and Duncan J. Watts. 1999b. Scaling and percolation in the small-world network model. Phys. Rev. E 60, 7332--7342.Google ScholarCross Ref
- Bruno F. Ribeiro and Donald F. Towsley. 2010. Estimating and sampling graphs with multidimensional random walks. In Internet Measurement Conference. 390--403. Google ScholarDigital Library
- Reuven Y. Rubinstein and Dirk P. Kroese. 2007. Simulation and the Monte Carlo Method (2nd. ed.). Wiley Series in Probability and Statistics.Google Scholar
- Thomas Schank and Dorothea Wagner. 2005. Approximating clustering coefficient and transitivity. J. Graph Algorithms Appl. 9, 2 (2005), 265--275.Google ScholarCross Ref
- Pinghui Wang, John C. S. Lui, Bruno F. Ribeiro, Don Towsley, Junzhou Zhao, and Xiaohong Guan. 2014. Efficiently estimating motif statistics of large networks. TKDD 9, 2, Article 8 (Nov. 2014). DOI:http://dx.doi.org/10.1145/2629564 Google ScholarDigital Library
- Shaozhi Ye and Felix Wu. 2010. Estimating the size of online social networks. In 2010 IEEE 2nd International Conference on Social Computing (SocialCom). 169--176. DOI:http://dx.doi.org/10.1109/SocialCom.2010.32 Google ScholarDigital Library
Index Terms
- Estimating Clustering Coefficients and Size of Social Networks via Random Walk
Recommendations
Estimating sizes of social networks via biased sampling
WWW '11: Proceedings of the 20th international conference on World wide webOnline social networks have become very popular in recent years and their number of users is already measured in many hundreds of millions. For various commercial and sociological purposes, an independent estimate of their sizes is important. In this ...
Estimating clustering coefficients and size of social networks via random walk
WWW '13: Proceedings of the 22nd international conference on World Wide WebOnline social networks have become a major force in today's society and economy. The largest of today's social networks may have hundreds of millions to more than a billion users. Such networks are too large to be downloaded or stored locally, even if ...
Estimating Properties of Social Networks via Random Walk considering Private Nodes
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningAccurately analyzing graph properties of social networks is a challenging task because of access limitations to the graph data. To address this challenge, several algorithms to obtain unbiased estimates of properties from few samples via a random walk ...
Comments