skip to main content
research-article

Estimating Clustering Coefficients and Size of Social Networks via Random Walk

Authors Info & Claims
Published:28 September 2015Publication History
Skip Abstract Section

Abstract

This work addresses the problem of estimating social network measures. Specifically, the measures at hand are the network average and global clustering coefficients and the number of registered users. The algorithms at hand (1) assume no prior knowledge about the network and (2) access the network using only the publicly available interface. More precisely, this work provides (a) a unified approach for clustering coefficients estimation and (b) a new network size estimator. The unified approach for the clustering coefficients yields the first external access algorithm for estimating the global clustering coefficient. The new network size estimator offers improved accuracy compared to prior art estimators.

Our approach is to view a social network as an undirected graph and use the public interface to retrieve a random walk. To estimate the clustering coefficient, the connectivity of each node in the random walk sequence is tested in turn. We show that the error drops exponentially in the number of random walk steps. For the network size estimation we offer a generalized view of prior art estimators that in turn yields an improved estimator. All algorithms are validated on several publicly available social network datasets.

References

  1. Louigi Addario-Berry and Tao Lei. 2012. The mixing time of the Newman--Watts small world. In SODA. 1661--1668. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Yong-Yeol Ahn, Seungyeop Han, Haewoon Kwak, Sue B. Moon, and Hawoong Jeong. 2007. Analysis of topological characteristics of huge online social networking services. In WWW. 835--844. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Noga Alon, Raphael Yuster, and Uri Zwick. 1997. Finding and counting given length cycles. Algorithmica 17, 3 (1997), 209--223.Google ScholarGoogle ScholarCross RefCross Ref
  4. Haim Avron. 2010. Counting triangles in large graphs using randomized matrix trace estimation. In Large-Scale Data Mining: Theory and Applications (KDD Workshop).Google ScholarGoogle Scholar
  5. Lars Backstrom, Daniel P. Huttenlocher, Jon M. Kleinberg, and Xiangyang Lan. 2006. Group formation in large social networks: Membership, growth, and evolution. In KDD. 44--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ziv Bar-Yossef and Maxim Gurevich. 2008. Random sampling from a search engine’s index. J. ACM 55, 5, Article 24 (Oct. 2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ziv Bar-Yossef and Maxim Gurevich. 2009. Estimating the impressionrank of web pages. In WWW. 41--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ziv Bar-Yossef and Maxim Gurevich. 2011. Efficient search engine measurements. TWEB 5, 4, Article 18 (Oct 2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Luca Becchetti, Paolo Boldi, Carlos Castillo, and Aristides Gionis. 2010. Efficient algorithms for large-scale local triangle counting. TKDD 4, 3, Article 13 (Oct. 2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Luciana S. Buriol, Gereon Frahling, Stefano Leonardi, Alberto Marchetti-Spaccamela, and Christian Sohler. 2006. Counting triangles in data streams. In PODS. 253--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Kai-Min Chung, Henry Lam, Zhenming Liu, and Michael Mitzenmacher. 2012. Chernoff-Hoeffding bounds for Markov chains: Generalized and simplified. In STACS. 124--135.Google ScholarGoogle Scholar
  12. Luciano da F. Costa, Francisco A. Rodrigues, Gonzalo Travieso, and Paulino R. Villas Boas. 2006. Characterization of complex networks: A survey of measurements. Adv. Phys. 56, 1 (Aug. 2006), 167--242. http://dx.doi.org/10.1080/00018730601170527Google ScholarGoogle Scholar
  13. Bradley Efron and Robert J. Tibshirani. 1993. An Introduction to the Bootstrap. Chapman & Hall, New York.Google ScholarGoogle Scholar
  14. Minas Gjoka, Maciej Kurant, Carter T. Butts, and Athina Markopoulou. 2010. Walking in Facebook: A case study of unbiased sampling of OSNs. Proceedings of IEEE INFOCOM 2010, 1--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Minas Gjoka, Maciej Kurant, and Athina Markopoulou. 2013. 2.5K-Graphs: From sampling to generation. In Proceedings of IEEE INFOCOM’13.Google ScholarGoogle ScholarCross RefCross Ref
  16. Stephen James Hardiman, Peter Richmond, and Stefan Hutzler. 2009. Calculating statistics of complex networks through random walks with an application to the on-line social network Bebo. Eur. Phys. J. B 71, 4 (2009), 611--622.Google ScholarGoogle ScholarCross RefCross Ref
  17. Wolfgang Härdle, Joel Horowitz, and Jens Peter Kreiss. 2003. Bootstrap methods for time series. Int. Stat. Rev. 71, 2 (Aug. 2003), 435--459.Google ScholarGoogle ScholarCross RefCross Ref
  18. Liran Katzir, Edo Liberty, and Oren Somekh. 2011. Estimating sizes of social networks via biased sampling. In WWW. 597--606. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jerome Kunegis. 2012. KONECT—The Koblenz Network Collection. http://konect.uni-koblenz.de/.Google ScholarGoogle Scholar
  20. Hans R. Künsch. 1989. The jackknife and the bootstrap for general stationary observations. Ann. Stat. 17, 1217--1241.Google ScholarGoogle ScholarCross RefCross Ref
  21. Maciej Kurant, Carter T. Butts, and Athina Markopoulou. 2012. Graph size estimation. CoRR abs/1210.0460.Google ScholarGoogle Scholar
  22. David A. Levin, Yuval Peres, and Elizabeth L. Wilmer. 2008. Markov Chains and Mixing Times. American Mathematical Society.Google ScholarGoogle Scholar
  23. Michael Ley. 2002. The DBLP computer science bibliography: Evolution, research issues, perspectives. In Proceedings of the International Symposium on String Processing and Information Retrieval. 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. László Lovász and Peter Winkler. 1998. Mixing times. Microsurveys in discrete. In DimacsWorkshop.Google ScholarGoogle Scholar
  25. Laurent Massoulié, Erwan Le Merrer, Anne-Marie Kermarrec, and Ayalvadi Ganesh. 2006. Peer counting and sampling in overlay networks: Random walk methods. In Proceedings of the 25th Annual ACM Symposium on Principles of Distributed Computing (PODC’06). ACM, New York, NY, 123--132. DOI:http://dx.doi.org/10.1145/1146381.1146402 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Alan Mislove, Hema Swetha Koppula, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. 2008. Growth of the flickr social network. In Proceedings of the 1st ACM SIGCOMM Workshop on Social Networks (WOSN’08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Alan Mislove, Massimiliano Marcon, P. Krishna Gummadi, Peter Druschel, and Bobby Bhattacharjee. 2007. Measurement and analysis of online social networks. In Internet Measurement Comference. 29--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Abedelaziz Mohaisen, Aaram Yun, and Yongdae Kim. 2010. Measuring the mixing time of social graphs. In Internet Measurement Conference. 383--389. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Mark E. J. Newman and Duncan J. Watts. 1999a. Renormalization group analysis of the small-world network model. Phys. Lett. A 263, 341--346.Google ScholarGoogle ScholarCross RefCross Ref
  30. Mark E. J. Newman and Duncan J. Watts. 1999b. Scaling and percolation in the small-world network model. Phys. Rev. E 60, 7332--7342.Google ScholarGoogle ScholarCross RefCross Ref
  31. Bruno F. Ribeiro and Donald F. Towsley. 2010. Estimating and sampling graphs with multidimensional random walks. In Internet Measurement Conference. 390--403. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Reuven Y. Rubinstein and Dirk P. Kroese. 2007. Simulation and the Monte Carlo Method (2nd. ed.). Wiley Series in Probability and Statistics.Google ScholarGoogle Scholar
  33. Thomas Schank and Dorothea Wagner. 2005. Approximating clustering coefficient and transitivity. J. Graph Algorithms Appl. 9, 2 (2005), 265--275.Google ScholarGoogle ScholarCross RefCross Ref
  34. Pinghui Wang, John C. S. Lui, Bruno F. Ribeiro, Don Towsley, Junzhou Zhao, and Xiaohong Guan. 2014. Efficiently estimating motif statistics of large networks. TKDD 9, 2, Article 8 (Nov. 2014). DOI:http://dx.doi.org/10.1145/2629564 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Shaozhi Ye and Felix Wu. 2010. Estimating the size of online social networks. In 2010 IEEE 2nd International Conference on Social Computing (SocialCom). 169--176. DOI:http://dx.doi.org/10.1109/SocialCom.2010.32 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Estimating Clustering Coefficients and Size of Social Networks via Random Walk

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on the Web
      ACM Transactions on the Web  Volume 9, Issue 4
      October 2015
      114 pages
      ISSN:1559-1131
      EISSN:1559-114X
      DOI:10.1145/2830542
      Issue’s Table of Contents

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 28 September 2015
      • Accepted: 1 June 2015
      • Revised: 1 April 2015
      • Received: 1 November 2014
      Published in tweb Volume 9, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader