skip to main content
10.1145/2020408.2020431acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Benefits of bias: towards better characterization of network sampling

Published: 21 August 2011 Publication History

Abstract

From social networks to P2P systems, network sampling arises in many settings. We present a detailed study on the nature of biases in network sampling strategies to shed light on how best to sample from networks. We investigate connections between specific biases and various measures of structural representativeness. We show that certain biases are, in fact, beneficial for many applications, as they "push" the sampling process towards inclusion of desired properties. Finally, we describe how these sampling biases can be exploited in several, real-world applications including disease outbreak detection and market research.

References

[1]
D. Achlioptas, A. Clauset, D. Kempe, and C. Moore. On the bias of traceroute sampling: or, power-law degree distributions in regular graphs. In STOC '05: Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, pages 694--703, New York, NY, USA, 2005. ACM.
[2]
L. A. Adamic, R. M. Lukose, A. R. Puniyani, and B. A. Huberman. Search in power-law networks. Physical Review E, 64(4):046135, Sept. 2001.
[3]
N. K. Ahmed, F. Berchmans, J. Neville, and R. Kompella. Time-based sampling of social network activity graphs. In Proceedings of the Eighth Workshop on Mining and Learning with Graphs, MLG '10, pages 1--9, New York, NY, USA, 2010. ACM.
[4]
N. K. Ahmed, J. Neville, and R. Kompella. Reconsidering the Foundations of Network Sampling. In WIN '10: Proceedings of the 2nd Workshop on Information in Networks, 2010.
[5]
A.-L. Barabasi and R. Albert. Emergence of Scaling in Random Networks. Science, 286(5439):509--512, Oct. 1999.
[6]
M. Boguná, R. P. Satorras, A. D. Guilera, and A. Arenas. Models of social networks based on social distance attachment. Physical Review E, 70(5):056122, Nov. 2004.
[7]
P. Boldi, M. Santini, and S. Vigna. Do Your Worst to Make the Best: Paradoxical Effects in PageRank Incremental Computations. In Algorithms and Models for the Web-Graph, pages 168--180. 2004.
[8]
T. Cao, X. Wu, S. Wang, and X. Hu. OASNET: an optimal allocation approach to influence maximization in modular social networks. In Proceedings of the 2010 ACM Symposium on Applied Computing, SAC '10, pages 1088--1094, New York, NY, USA, 2010. ACM.
[9]
J. Cho, H. G. Molina, and L. Page. Efficient crawling through URL ordering. Computer Networks and ISDN Systems, 30(1-7):161--172, 1998.
[10]
N. A. Christakis and J. H. Fowler. Social Network Sensors for Early Detection of Contagious Outbreaks. PLoS ONE, 5(9):e12948, Sept. 2010.
[11]
F. Chung and L. Lu. Connected Components in Random Graphs with Given Expected Degree Sequences. Annals of Combinatorics, 6(2):125--145, Nov. 2002.
[12]
A. Clauset, M. E. J. Newman, and C. Moore. Finding community structure in very large networks. Physical Review E, 70(6):066111, Dec. 2004.
[13]
R. Cohen, S. Havlin, and D. ben Avraham. Efficient Immunization Strategies for Computer Networks and Populations. arXiv:cond-mat/0207387v3, Dec. 2003.
[14]
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. McGraw-Hill Science / Engineering / Math, 2nd edition, Dec. 2003.
[15]
E. Costenbader. The stability of centrality measures when networks are sampled. Social Networks, 25(4):283--307, Oct. 2003.
[16]
U. Feige. A threshold of ln n for approximating set cover. J. ACM, 45(4):634--652, July 1998.
[17]
S. L. Feld. Why Your Friends Have More Friends Than You Do. The American Journal of Sociology, 96(6):1464--1477, 1991.
[18]
S. Fortunato. Community detection in graphs. arXiv:0906.0612v2 {physics.soc-ph}, Jan. 2010.
[19]
O. Frank. Models and Methods in Social Network Analysis (Structural Analysis in the Social Sciences), chapter 3. Cambridge University Press, Feb. 2005.
[20]
M. Gjoka, M. Kurant, C. T. Butts, and A. Markopoulou. A Walk in Facebook: Uniform Sampling of Users in Online Social Networks. arXiv e-print (arXiv:0906.0060v3), Feb. 2011.
[21]
B. H. Good, Y.-A. de Montjoye, and A. Clauset. Performance of modularity maximization in practical contexts. Physical Review E, 81(4):046106, Apr. 2010.
[22]
M. R. Henzinger, A. Heydon, M. Mitzenmacher, and M. Najork. On near-uniform URL sampling. In Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking, volume 33, pages 295--308, Amsterdam, The Netherlands, The Netherlands, 2000. North-Holland Publishing Co.
[23]
S. Hoory, N. Linial, and A. Wigderson. Expander Graphs and Their Applications. Bull. Amer. Math. Soc, 43, 2006.
[24]
J. Hopcroft and D. Sheldon. Manipulation-resistant reputations using hitting time. In WAW'07: Proceedings of the 5th international conference on Algorithms and models for the web-graph, pages 68--81, Berlin, Heidelberg, 2007. Springer-Verlag.
[25]
C. Hubler, H.-P. Kriegel, K. Borgwardt, and Z. Ghahramani. Metropolis Algorithms for Representative Subgraph Sampling. In ICDM '08: Proceedings of the 8th IEEE International Conference on Data Mining, pages 283--292, 2008.
[26]
R. Kannan, S. Vempala, and A. Vetta. On clusterings: Good, bad and spectral. J. ACM, 51(3):497--515, May 2004.
[27]
J. Kleinberg. Complex Networks and Decentralized Search Algorithms. In International Congress of Mathematicians (ICM), 2006.
[28]
E. D. Kolaczyk. Statistical Analysis of Network Data, chapter 5. Springer, 2009.
[29]
V. Krishnamurthy, M. Faloutsos, M. Chrobak, J. Cui, L. Lao, and A. Percus. Sampling large Internet topologies for simulation purposes. Computer Networks, 51(15):4284--4302, Oct. 2007.
[30]
M. Kurant, A. Markopoulou, and P. Thiran. On the bias of BFS. Arxiv e-print (arXiv:1004.1729v1), Apr. 2010.
[31]
A. Lakhina, J. W. Byers, M. Crovella, and P. Xie. Sampling Biases in IP Topology Measurements. In IEEE INFOCOM, pages 332--341, 2003.
[32]
S. H. Lee, P. J. Kim, and H. Jeong. Statistical properties of sampled networks. Physical Review E, 73(1):016102, Jan. 2006.
[33]
J. Leskovec. Stanford Large Network Dataset Collection. http://snap.stanford.edu/data/.
[34]
J. Leskovec and C. Faloutsos. Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '06, pages 631--636, New York, NY, USA, 2006. ACM.
[35]
J. Leskovec, J. Kleinberg, and C. Faloutsos. Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations. In KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 177--187, 2005.
[36]
J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. Glance. Cost-effective outbreak detection in networks. In KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 420--429, New York, NY, USA, 2007. ACM.
[37]
L. Lovasz. Random Walks on Graphs: A Survey. Combinatorics: Paul Erdos is 80, ||, 1994.
[38]
L. Lovasz. Very large graphs. arXiv:0902.0132v1 {math.CO}, Feb. 2009.
[39]
A. S. Maiya and T. Y. Berger Wolf. Expansion and search in networks. In Proceedings of the 19th ACM international conference on Information and knowledge management, CIKM '10, pages 239--248, New York, NY, USA, 2010. ACM.
[40]
A. S. Maiya and T. Y. Berger-Wolf. Sampling Community Structure. In WWW '10: Proceedings of the 19th International Conference on the World Wide Web, Apr. 2010.
[41]
A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, IMC '07, pages 29--42, New York, NY, USA, 2007. ACM.
[42]
M. Najork. Breadth-first search crawling yields high-quality pages. In WWW '01: Proc. 10th International World Wide Web Conference, pages 114--118, 2001.
[43]
M. E. J. Newman. The structure and function of complex networks. SIAM Review, 45:167--256, Mar. 2003.
[44]
M. Potamias, F. Bonchi, C. Castillo, and A. Gionis. Fast shortest path distance estimation in large networks. In Proceeding of the 18th ACM conference on Information and knowledge management, CIKM '09, pages 867--876, New York, NY, USA, 2009. ACM.
[45]
U. N. Raghavan, R. Albert, and S. Kumara. Near linear time algorithm to detect community structures in large-scale networks. Physical Review E, 76(3):036106, Sept. 2007.
[46]
M. P. H. Stumpf, C. Wiuf, and R. M. May. Subnets of scale-free networks are not scale-free: Sampling properties of networks. Proceedings of the National Academy of Sciences of the United States of America, 102(12):4221--4224, Mar. 2005.
[47]
D. Stutzbach, R. Rejaie, N. Duffield, S. Sen, and W. Willinger. On unbiased sampling for unstructured peer-to-peer networks. IEEE/ACM Trans. Netw., 17(2):377--390, 2009.
[48]
H. Tong, B. A. Prakash, C. Tsourakakis, T. E. Rad, C. Faloutsos, and D. H. Chau. On the Vulnerability of Large Graphs. In ICDM '10, volume 0, pages 1091--1096, Los Alamitos, CA, USA, 2010. IEEE Computer Society.
[49]
D. Tsoumakos and N. Roussopoulos. Analysis and comparison of P2P search methods. In InfoScale '06: Proceedings of the 1st international conference on Scalable information systems, pages 25, New York, NY, USA, 2006. ACM.
[50]
D. J. Watts and S. H. Strogatz. Collective dynamics of 'small-world' networks. Nature, 393(6684):440--442, June 1998.
[51]
E. W. Zuckerman and J. T. Jost. What Makes You Think You're so Popular? Self-Evaluation Maintenance and the Subjective Side of the "Friendship Paradox". Social Psychology Quarterly, 64(3), 2001.

Cited By

View all
  • (2024)Per-Packet Traffic Measurement in Storage, Computation and Bandwidth Limited Data PlaneIEEE/ACM Transactions on Networking10.1109/TNET.2024.340401132:5(3730-3742)Online publication date: Oct-2024
  • (2024)A spanning tree approach to social network sampling with degree constraintsSocial Network Analysis and Mining10.1007/s13278-024-01247-414:1Online publication date: 18-May-2024
  • (2023)Theoretical bounds on the network community profile from low-rank semi-definite programmingProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618976(13976-13992)Online publication date: 23-Jul-2023
  • Show More Cited By

Index Terms

  1. Benefits of bias: towards better characterization of network sampling

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2011
    1446 pages
    ISBN:9781450308137
    DOI:10.1145/2020408
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 August 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. bias
    2. complex networks
    3. crawling
    4. graph mining
    5. link mining
    6. online sampling
    7. sampling
    8. social network analysis

    Qualifiers

    • Research-article

    Conference

    KDD '11
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 605 of 4,597 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)28
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 22 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Per-Packet Traffic Measurement in Storage, Computation and Bandwidth Limited Data PlaneIEEE/ACM Transactions on Networking10.1109/TNET.2024.340401132:5(3730-3742)Online publication date: Oct-2024
    • (2024)A spanning tree approach to social network sampling with degree constraintsSocial Network Analysis and Mining10.1007/s13278-024-01247-414:1Online publication date: 18-May-2024
    • (2023)Theoretical bounds on the network community profile from low-rank semi-definite programmingProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618976(13976-13992)Online publication date: 23-Jul-2023
    • (2023)The Two Sides of the Environmental Kuznets Curve: A Socio-Semantic AnalysisOEconomia10.4000/oeconomia.15729(279-321)Online publication date: 1-Jun-2023
    • (2023)Subnetwork estimation for spatial autoregressive models in large-scale networksElectronic Journal of Statistics10.1214/23-EJS213917:1Online publication date: 1-Jan-2023
    • (2023)Randomness in Local Optima Network SamplingProceedings of the Companion Conference on Genetic and Evolutionary Computation10.1145/3583133.3596309(2099-2107)Online publication date: 15-Jul-2023
    • (2023)Partitioning Communication Streams Into Graph SnapshotsIEEE Transactions on Network Science and Engineering10.1109/TNSE.2022.322361410:2(809-826)Online publication date: 1-Mar-2023
    • (2023)Analyzing Effects of Social Media User’s Influence on Contents Caching in ICNIEEE Access10.1109/ACCESS.2023.333085011(127679-127688)Online publication date: 2023
    • (2022)Color Image Contrast Enhancement Using Modified Firefly AlgorithmInternational Journal of Information Retrieval Research10.4018/IJIRR.29994412:2(1-18)Online publication date: 8-Jul-2022
    • (2022)Credit Card Fraud Prediction Using XGBoostInternational Journal of Information Retrieval Research10.4018/IJIRR.29994012:2(1-17)Online publication date: 1-Apr-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media