Elsevier

Computer Communications

Volume 33, Issue 5, 15 March 2010, Pages 533-550
Computer Communications

Review
Passive internet measurement: Overview and guidelines based on experiences

https://doi.org/10.1016/j.comcom.2009.10.021Get rights and content

Abstract

Due to its versatility, flexibility and fast development, the modern Internet is far from being well understood in its entirety. A good way to learn more about how the Internet functions is to collect and analyze real Internet traffic. This paper addresses several major challenges of Internet traffic monitoring, which is a prerequisite for performing traffic analysis. The issues discussed will eventually appear when planning to conduct passive measurements on high-speed network connections, such as Internet backbone links. After giving a brief summary of general network measurement approaches, a detailed overview of different design options and important considerations for backbone measurements is given. The challenges are discussed in order of their chronological appearance: First, a number of legal and ethical issues have to be sorted out with legislators and network operators, followed by operational difficulties that need to be solved. Once these legal and operational obstacles have been overcome, a third challenge is given by various technical difficulties when actually measuring high-speed links. Technical issues range from handling the vast amounts of network data to timing and synchronization issues. Policies regarding public availability of network data need to be established once data are successfully collected. Finally, a successful Internet measurement project is described by addressing the aforementioned issues, providing concrete lessons learned based on experiences. As a result, the paper presents tutorial guidelines for setting up and performing passive Internet measurements.

Introduction

The usage of the Internet has changed dramatically since its initial operation in the early-80s, when it was a research project connecting a handful of computers, facilitating a small set of remote operations. Nowadays (2009), the Internet serves as the data backbone for all kinds of protocols, making it possible to exchange not only text, but also voice, audio, video and various other forms of digital data between hundreds of millions of nodes, ranging from traditional desktop computers, servers or supercomputers to all kinds of wireless devices, embedded systems, sensors and even home equipment.

Traditionally, an illustration of the protocol layers of the Internet has the shape of an hourglass, with a single Internet Protocol (IP) on the central network layer and an increasingly wider spectrum of protocols above and below. Since the introduction of IP in 1981, which is basically still unchanged, technology and protocols have developed significantly. Underlying transmission media evolved from copper to fiber optics and WIFI, routers and switches became more and more intelligent and are able to handle Gbit/s instead of Kbit/s and additional middleware boxes have been introduced (e.g. NAT and firewalls). But also above the network layer new applications have constantly been added, ranging from basic services such as DNS and HTTP, to recent, complex P2P protocols allowing applications such as file-sharing, video streaming and telephony. With IPv6, even the foundation of the Internet is finally about to be substituted. This multiplicity of protocols and technologies leads to an ongoing increase in complexity of the Internet as a whole. Of course, individual network protocols and infrastructures are usually well understood when tested in isolated lab environments or network simulations. However, their behavior when observed while interacting with the vast diversity of applications and technologies in the hostile Internet environment is often unclear, especially on global scale.

This lack of understanding is further amplified by the fact that the topology of the Internet was not planned in advance. It is the result of an uncontrolled extension process, where heterogeneous networks of independent organizations have been connected one by one to the main Internet (INTERconnected NETworks). This means that each autonomous system (AS) has its own set of usage and pricing policies, QoS measures and resulting traffic mix. Thus usage of Internet protocols and applications is not only changing with time, but also with geographical locations. As an example, Nelson et al. [1] reported about an unusual application mix on a campus uplink in New Zealand due to a restrictive pricing policy, probably caused by higher prices for trans-pacific network capacities at this time.

Finally, higher connectivity bandwidths and growing numbers of Internet users also lead to increased misuse and anomalous behavior [2]. Not only the numbers of malicious incidents keep rising, but also the level of sophistication of attack methods and tools has increased. Today, automated attack tools employ more and more advanced attack patterns and react on the deployment of firewalls and intrusion detection systems by clever obfuscation of their malicious intentions. Malicious activities range from scanning to more advanced attack types such as worms and various denial of service attacks. Even well-known or anticipated attack types reappear in modified variants, as shown by the recent renaissance of cache poisoning attacks [3]. Unfortunately, the Internet, initially meant to be a friendly place, eventually became a very hostile environment that needs to be studied continuously in order to develop suitable counter strategies.

Overall, this means that even though the Internet may be considered to be the most important modern communication platform, its behavior is not well understood. It is therefore crucial that the Internet community understands the nature and detailed behavior of modern Internet traffic, in order to be able to improve network applications, protocols and devices and protect its users.

The best way to acquire a better and more detailed understanding of the modern Internet is to monitor and analyze real Internet traffic. Unfortunately, the above described rapid development has left little time or resources to integrate measurement and analysis possibilities into Internet infrastructure, applications and protocols. To compensate for this lack, the research community has started to launch dedicated Internet measurement projects, usually associated with considerable investment of both time and money. However, the experiences from a successful measurement project showed that measuring large-scale Internet traffic is not simple and involves a number of challenging tasks. In order to help future measurement projects to save some of their initial time expenses, this paper gives an overview of the major challenges which will eventually appear when planning to conduct measurements on high-speed network connections. Experiences from the MonNet project will then provide guidelines based on lessons learned (Section 8).

Section 2 starts by giving an overview of different network traffic measurement approaches and methodologies. Sections 3 Legal background, 4 Ethical and moral considerations, 5 Operational difficulties, 6 Technical aspects, 7 Data sharing then address the main challenges encountered while conducting passive Internet measurements. The challenges are discussed in order of their chronological appearance: First, a number of legal and ethical issues have to be sorted out with legislators and network operators before data collection can be started (Sections 3 Legal background, 4 Ethical and moral considerations). Second, operational difficulties need to be solved (Section 5) such as access privileges to the network operator’s premises. Once legal and operational obstacles are overcome, a third challenge is given by various technical difficulties when actually measuring high-speed links (Section 6), ranging from handling of vast data amounts to timing issues. Next public availability of network data are discussed, which should eventually be considered once data are successfully collected (Section 7). Afterwards, Section 8 outlines the MonNet project, which is the measurement project providing the experience for the present paper. Each point from Sections 3 Legal background, 4 Ethical and moral considerations, 5 Operational difficulties, 6 Technical aspects, 7 Data sharing will be revisited and the specific problems and solutions as experienced in the MonNet project are presented. These considerations are then summarized presenting the most important lessons learned regarding each particular issue, providing a quick guide for future measurement projects. Finally, Section 9 discusses future challenges of Internet measurement and concludes the paper.

Section snippets

Overview of network measurement methodologies

This section gives an overview of general network measurement approaches. The basic approaches are categorized among different axes and the most suitable methods for passive Internet measurements according to current best practice are pointed out.

The most common way to classify traffic measurement methods is to distinguish between active and passive approaches. Active measurement involves injection of traffic into the network in order to probe certain network devices (e.g. PING) or to measure

Legal background

In this section the legal background of Internet measurement is presented, which is somewhat in contrast to actual political developments and common academic practice. Current laws and regulations on electronic communication rarely explicitly consider or mention the recording or measurement of traffic for research purposes, which leaves scientific Internet measurement in some kind of legal limbo. In the following paragraphs the existing regulations for the EU and the US are briefly outlined in

Ethical and moral considerations

Besides potential conflicts with legal regulations and directives, Internet measurement activities raise also moral and ethical questions when it comes to privacy and security concerns of individual users or organizations using the networks. These considerations include discussions about what to store, how long to store and in which ways to modify stored data. The goal is to fulfill privacy and security requirements of individuals and organizations, while still keeping scientific relevant

Operational difficulties

Data centers and similar facilities housing networking equipment are usually well secured and access rights are not granted easily, which is especially true for external, non-operational staff, such as researchers. Often it is required that authorized personnel are present when access to certain premises is needed. This dependency makes planning and coordination difficult and reduces flexibility and time-efficiency. Flexibility constraints are further exaggerated by the geographic location of

Technical aspects

Measurement and analysis of Internet traffic is not only challenging in terms of legal and operational issues, but also it is above all a technical challenge. In the following subsections we first discuss methods to physically access and collect network traffic. We will then provide discussions about other important aspects regarding Internet measurement, including strategies to cope with the tremendous amounts of data and some considerations for how to get confidence in the measured data.

Data sharing

The discussions about all the legal, operational and technical difficulties involved in conducting Internet measurement clearly show that proper network traces are the result of a laborious and costly process. This explains why currently only few researchers and research groups have the possibilities to collect Internet backbone data, which makes proper traces a scarce resource. Therefore, the Internet measurement community has repeatedly been encouraged to share their valuable datasets and

Experiences from the MonNet project

This section provides a description and lessons learned from a project for passive Internet traffic monitoring and analysis conducted at Chalmers University of Technology: the MonNet project. The goal of the project is to provide a better understanding of Internet traffic characteristics based on empirical data, i.e. passive measurements on backbone links.

Summary and conclusions

The development of the Internet has without doubt not yet come to an end. In the next years, we have to expect a continuing growth in numbers of users and amounts of traffic. Traffic will exhibit an even higher diversity, with the Internet becoming a more and more unified backbone for all forms of communication and content (e.g. VoIP, IPTV, etc.). As a consequence, network bandwidths will continue to increase with at least the same pace as computer processing and storage capacities. However,

Acknowledgements

This work was supported by SUNET, the Swedish University Computer Network.

References (84)

  • J. Wang et al.

    Clock synchronization for internet measurements: a clustering algorithm

    Computer Networks

    (2004)
  • H. Khlifi et al.

    Low-complexity offline and online clock skew estimation and removal

    Computer Networks

    (2006)
  • R. Nelson et al.

    Analysis of long duration traces

    SIGCOMM Computer Communication Review

    (2005)
  • A. Householder et al.

    Computer attack trends challenge internet security

    Computer

    (2002)
  • RIPE NCC, YouTube Hijacking: A RIPE NCC RIS case study. <http://www.ripe.net/news/study-youtube-hijacking.html>...
  • S. McCanne, V. Jacobson, The BSD packet filter: a new architecture for user-level packet capture, in: USENIX Winter,...
  • S. Ubik, P. Zejdl, Passive monitoring of 10Gb/s lines with pc hardware, in: TNC’08: Terena Networking Conference,...
  • R. Braden, Requirements for Internet Hosts-Communication Layers, RFC 1122 (Standard),...
  • J. Case, M. Fedor, M. Schoffstall, J. Davin, Simple Network Management Protocol (SNMP), RFC 1157 (Historic),...
  • B. Claise, Cisco Systems NetFlow Services Export Version 9, RFC 3954 (Informational),...
  • W. John, S. Tafvelin, Differences between in- and outbound internet backbone traffic, in: TNC’07: Terena Networking...
  • W. John, S. Tafvelin, Heuristics to classify internet backbone traffic based on connection patterns, in: ICOIN’08:...
  • W. John, S. Tafvelin, T. Olovsson, Trends and differences in connection-behavior within classes of internet backbone...
  • K. Keys, D. Moore, R. Koga, E. Lagache, M. Tesch, k claffy, The architecture of CoralReef: an Internet traffic...
  • Directive 95/46/ec of the European parliament and of the council, 1995. Available from:...
  • Directive 2002/58/ec of the European parliament and of the council, 2002. Available from:...
  • Directive 2006/24/ec of the European parliament and of the council, 2006. Available from:...
  • AK-Vorrat, Overview of national data retention policies. <http://wiki.vorratsdatenspeicherung.de/Transposition>...
  • E.E. Kenneally, K.C. Claffy, An internet data sharing framework for balancing privacy and utility, in: Proceedings of...
  • D.C. Sicker, P. Ohm, D. Grunwald, Legal issues surrounding monitoring during network research, in: IMC’07: Proceedings...
  • 18 united states code §2511....
  • 18 united states code §3127....
  • 18 united states code §2701....
  • 18 united states code §2702....
  • 18 united states code §2703....
  • K.C. Claffy, Ten things lawyers should know about internet research, Technical Report, CAIDA, SDSC, UCSD....
  • K.C. Claffy, Internet as emerging critical infrastructure: what needs to be measured?, in: JCC’08: Chilean Computing...
  • T. Karagiannis, A. Broido, N. Brownlee, K. Claffy, M. Faloutsos, Is p2p dying or just hiding?, in: GLOBECOM’04. IEEE...
  • W. John et al.

    Detection of malicious traffic on backbone links via packet header analysis

    Campus Wide Information Systems

    (2008)
  • S. Coull, C. Wright, F. Monrose, M. Collins, M. Reiter, Playing devil’s advocate: inferring sensitive information from...
  • R. Pang et al.

    The devil and packet trace anonymization

    SIGCOMM Computer Communication Review

    (2006)
  • J. Xu, J. Fan, M.H. Ammar, S.B. Moon, Prefix-preserving ip address anonymization: measurement-based security evaluation...
  • T. Ylonen, Thoughts on how to mount an attack on tcpdpriv’s – a50 option, Web White Paper....
  • T. Kohno et al.

    Remote physical device fingerprinting

    IEEE Transactions on Dependable and Secure Computing

    (2005)
  • M. Allman, V. Paxson, Issues and etiquette concerning use of shared measurement data, in: IMC’07: Proceedings of the...
  • ACM workshop on network data anonymization. <http://www.ics.forth.gr/antonat/nda08.html> (accessed...
  • G. Minshall, Tcpdpriv: program for eliminating confidential information from traces....
  • A. Slagell, J. Wang, W. Yurcik, Network log anonymization: application of crypto-pan to cisco netflows, in: SKM’04:...
  • R. Ramaswamy, N. Weng, T. Wolf, An ixa-based network measurement node, in: Proceedings of Intel IXA University Summit,...
  • T. Brekne, A. Årnes, Circumventing ip-address pseudonymization, in: Proceedings of the Third IASTED International...
  • Endace, Dag network monitoring cards. <http://www.endace.com/our-products/dag-network-monitoring-cards/> (accessed...
  • Napatech, Napatech protocol and traffic analysis network adapter. <http://www.napatech.com> (accessed...
  • Cited by (26)

    • The complexity of surveying web participation

      2014, Journal of Business Research
      Citation Excerpt :

      Chatterjee et al., 2003; Dreze & Zufryden, 1997; Thakor et al., 2004). Segmenting consumers on the basis of their session duration yields many insights; duration time is a critical outcome measure of consumption experiences used in web analytics to calculate the stickiness of a website or page (Bucklin & Sismeiro, 2009; Wolfgang et al., 2010). It is also a useful behavioral indicator of experiential versus goal-directed orientations (Holbrook & Gardner, 1993), as well as being a dependent variable in assessing advertising effects (Olney et al., 1991).

    • Research challenges towards the Future Internet

      2011, Computer Communications
      Citation Excerpt :

      This has stimulated intensive research activities to develop methods, tools and testbeds for supporting passive and active measurements. Passive measurements (e.g., see [140] for a review) use the existing network traffic, while active measurements create and send ad hoc probe packets. Active measurements are often used for estimating the network QoS that can be offered to an application – e.g. [142] presents and compares the available techniques and tools for bandwidth estimation.

    • Talking after Lights Out: An Ad Hoc Network for Electric Grid Recovery

      2021, 2021 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids, SmartGridComm 2021
    • Cyber-security research by ISPs: A NetFlow and DNS Anonymization Policy

      2020, International Conference on Cyber Security and Protection of Digital Services, Cyber Security 2020
    View all citing articles on Scopus
    View full text