skip to main content
research-article

On the bias of traceroute sampling: Or, power-law degree distributions in regular graphs

Published: 02 July 2009 Publication History

Abstract

Understanding the graph structure of the Internet is a crucial step for building accurate network models and designing efficient algorithms for Internet applications. Yet, obtaining this graph structure can be a surprisingly difficult task, as edges cannot be explicitly queried. For instance, empirical studies of the network of Internet Protocol (IP) addresses typically rely on indirect methods like traceroute to build what are approximately single-source, all-destinations, shortest-path trees. These trees only sample a fraction of the network's edges, and a paper by Lakhina et al. [2003] found empirically that the resulting sample is intrinsically biased. Further, in simulations, they observed that the degree distribution under traceroute sampling exhibits a power law even when the underlying degree distribution is Poisson.
In this article, we study the bias of traceroute sampling mathematically and, for a very general class of underlying degree distributions, explicitly calculate the distribution that will be observed. As example applications of our machinery, we prove that traceroute sampling finds power-law degree distributions in both δ-regular and Poisson-distributed random graphs. Thus, our work puts the observations of Lakhina et al. on a rigorous footing, and extends them to nearly arbitrary degree distributions.

References

[1]
Alderson, D., Li, L., Willinger, W., and Doyle, J. 2005. Understanding Internet topology: Principles, models, and validation. IEEE/ACM Trans. Netw. 13, 1205--1218.
[2]
Amini, L., Shaikh, A., and Schulzrinne, H. 2002. Issues with inferring Internet topological attributes. In SPIE IT-Com.
[3]
Barford, P., Bestavros, A., Byers, J., and Crovella, M. 2001. On the marginal utility of network topology measurements. In SIGCOMM Internet Measurement Workshop.
[4]
Blondel, V., Guillaume, J.-L., Hendrickx, J., and Jungers, R. 2007. Distance distribution in random graphs and application to network exploration. Phys. Rev. E 76, 066101.
[5]
Bollobás, B. 2001. Random Graphs, 2nd ed. Cambridge University Press, Cambridge.
[6]
Bollobás, B., and Chung, F. 1988. The diameter of a cycle plus a random matching. SIAM J. Disc. Math. 1, 328--333.
[7]
Chen, Q., Chang, H., Govindan, R., Jamin, S., Shenker, S., and Willinger, W. 2002. The origin of power laws in Internet topologies revisited. In Proceedings of the 21st ACM SIGCOMM Conference. ACM, New York.
[8]
Clauset, A., and Moore, C. 2005. Accuracy and scaling phenomena in Internet mapping. Phys. Rev. Lett. 94, 018701.
[9]
Cohen, R., Gonen, M., and Wool, A. 2007. Bounding the bias of tree-like sampling in ip topologies. In Proceedings of the European Conference on Complex Systems (ECCS).
[10]
Dall'Asta, L. 2007. Dynamic exploration of networks: From general principles to the traceroute process. Europ. Phys. J. B 60, 123--133.
[11]
Dall'Asta, L., Alvarez-Hamelin, I., Barrat, A., Vázquez, A., and Vespignani, A. 2006. Exploring networks with traceroute-like probes: Theory and simulations. Theoret. Comput. Sci. 355, 6--24.
[12]
Erdős, P., and Rényi, A. 1960. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci. 5, 17--61.
[13]
Fabrikant, A., Koutsoupias, E., and Papadimitriou, C. 2002. Heuristically optimized trade-offs: A new paradigm for power laws in the internet. In Proceedings of the 29th International Colloguium on Automata, Languages and Programming. 110--122.
[14]
Faloutsos, M., Faloutsos, P., and Faloutsos, C. 1999. On power-law relationships of the Internet topology. In Proceedings of the 18th ACM SIGCOMM Conference. ACM, New York. 251--262.
[15]
Flaxman, A., and Vera, J. 2007. Bias reduction in traceroute sampling: Towards a more accurate map of the internet. In Proceedings of the 5th Workshop on Algorithms and Models for the Web-Graph (WAW2007).
[16]
Govindan, R., and Tangmunarunkit, H. 2000. Heuristics for Internet map discovery. In Proceedings of the 19th ACM SIGCOMM Conference. ACM, New York.
[17]
Guillaume, J.-L., Latapy, M., and Magoni, D. 2006. Relevance of massively distributed explorations of the internet topology: Qualitative results. Comput. Netw. 50, 3197--3224.
[18]
Hoeffding, W. 1963. Probability inequalities for sums of bounded random variables. J. ASA 58, 13--30.
[19]
Kim, J. H. 2006. Poisson cloning model for random graphs. Preprint.
[20]
Kleinberg, J., Kumar, R., Raghavan, P., Rajagopalan, S., and Tomkins, A. 1999. The web as a graph: Measurements, models and methods. In Proceedings of the International Conference on Combinatorics and Computing.
[21]
Lakhina, A., Byers, J., Crovella, M., and Xie, P. 2003. Sampling biases in IP topology measurements. In Proceedings of the 22nd IEEE INFOCOM Conference. IEEE Computer Society Press, Los Alamitos.
[22]
Leguay, J., Latapy, M., Friedman, T., and Salamatian, K. 2007. Describing and simulating Internet routes. Comput. Netw. 51, 2067--2087.
[23]
McDiarmid, C. 1998. Concentration. In Probabilistic Methods for Algorithmic Discrete Mathematics, M. Habib, C. McDiarmid, J. Ramirez-Alfonsin, and B. Reed, Eds. Springer-Verlag, Berlin, Germany, 195--247.
[24]
Mihail, M., Papadimitriou, C., and Saberi, A. 2003. On certain connectivity properties of the Internet topology. In Proceedings of the 35th ACM Symposium on Theory of Computing. ACM, New York.
[25]
Molloy, M., and Reed, B. 1995. A critical point for random graphs with a given degree sequence. Rand. Struct. Algor. 6, 161--180.
[26]
Molloy, M., and Reed, B. 1998. The size of the largest component of a random graph on a fixed degree sequence. Combin. Probab. Comput. 7, 295--306.
[27]
Motwani, R., and Raghavan, P. 1990. Randomized Algorithms. Cambridge University Press. Cambridge.
[28]
Pansiot, J.-J., and Grad, D. 1998. On routers and multicast trees in the Internet. ACM SIGCOMM Commun. Rev. 28, 41--50.
[29]
Petermann, T., and De Los Rios, P. 2004. Exploration of scale-free networks: Do we measure the real exponents? Euro. Phys. J. B 38, 201--204.
[30]
Seaborn, J. 1991. Hypergeometric Functions and Their Applications. Springer-Verlag, Berlin, Germany.
[31]
Shavitt, Y., and Shir, E. 2005. Dimes: Let the Internet measure itself. ACM SIGCOMM Comput. Commun. Rev. 35, 71--74.
[32]
Spanier, J., and Oldham, K. 1987. The exponential integral ei(x) and related functions. In An Atlas of Functions. Hemisphere, Chapter 37, 351--360.
[33]
Spring, N., Mahajan, R., Wetherall, D., and Anderson, T. 2004. Measuring ISP topologies with rocketfuel. IEEE/ACM Trans. Netw. 12, 2--16.
[34]
Tangmunarunkit, H., Govindan, R., Shenker, S., and Estrin, D. 2001. The impact of policy on Internet paths. In Proceedings of the 20th IEEE INFOCOM Conference. IEEE Computer Society Press, Los Alamitos.
[35]
Viger, F., Augustin, B., Cuvellier, X., Magnien, C., Latapy, M., Friedman, T., and Teixeira, R. 2006. Detection, understanding, and prevention of traceroute measurement artifacts. In Proceedings of the 6th Internet Measurement Conference (IMC'06).
[36]
Wilf, H. 1994. Generatingfunctionology. Academic Press, Orlando, FL.
[37]
Wormald, N. 1999. Models of random regular graphs. In London Math. Soc. Lecture Note Series, J. Lamb and D. Preece, Eds. Vol. 276. Cambridge University Press, Cambridge. 239--298.

Cited By

View all
  • (2024)Compression-based inference of network motif setsPLOS Computational Biology10.1371/journal.pcbi.101246020:10(e1012460)Online publication date: 10-Oct-2024
  • (2024)QClique: Optimizing Performance and Accuracy in Maximum Weighted CliqueEuro-Par 2024: Parallel Processing10.1007/978-3-031-69583-4_7(88-102)Online publication date: 26-Aug-2024
  • (2023)Partitioning Communication Streams Into Graph SnapshotsIEEE Transactions on Network Science and Engineering10.1109/TNSE.2022.322361410:2(809-826)Online publication date: 1-Mar-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Journal of the ACM
Journal of the ACM  Volume 56, Issue 4
June 2009
195 pages
ISSN:0004-5411
EISSN:1557-735X
DOI:10.1145/1538902
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 July 2009
Accepted: 01 February 2008
Revised: 01 January 2008
Received: 01 July 2007
Published in JACM Volume 56, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Internet topology
  2. sampling bias
  3. traceroute

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)21
  • Downloads (Last 6 weeks)2
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Compression-based inference of network motif setsPLOS Computational Biology10.1371/journal.pcbi.101246020:10(e1012460)Online publication date: 10-Oct-2024
  • (2024)QClique: Optimizing Performance and Accuracy in Maximum Weighted CliqueEuro-Par 2024: Parallel Processing10.1007/978-3-031-69583-4_7(88-102)Online publication date: 26-Aug-2024
  • (2023)Partitioning Communication Streams Into Graph SnapshotsIEEE Transactions on Network Science and Engineering10.1109/TNSE.2022.322361410:2(809-826)Online publication date: 1-Mar-2023
  • (2023)Distributed Data-Driven Control of Network SystemsIEEE Open Journal of Control Systems10.1109/OJCSYS.2023.32592282(93-107)Online publication date: 2023
  • (2023)Web MiningMachine Learning for Data Science Handbook10.1007/978-3-031-24628-9_20(447-467)Online publication date: 26-Feb-2023
  • (2023)A simple algorithm for graph reconstructionRandom Structures & Algorithms10.1002/rsa.2114363:2(512-532)Online publication date: 16-Feb-2023
  • (2022)Exposure theory for learning complex networks with random walksJournal of Complex Networks10.1093/comnet/cnac02910:5Online publication date: 27-Aug-2022
  • (2022)Maximum entropy networks for large scale social network node analysisApplied Network Science10.1007/s41109-022-00506-77:1Online publication date: 28-Sep-2022
  • (2021)Bootstrap estimators for the tail-index and for the count statistics of graphex processesElectronic Journal of Statistics10.1214/20-EJS178915:1Online publication date: 1-Jan-2021
  • (2021)Trajectory Outlier DetectionACM Transactions on Knowledge Discovery from Data10.1145/342586715:2(1-28)Online publication date: 10-Feb-2021
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media