ReviewPassive internet measurement: Overview and guidelines based on experiences
Introduction
The usage of the Internet has changed dramatically since its initial operation in the early-80s, when it was a research project connecting a handful of computers, facilitating a small set of remote operations. Nowadays (2009), the Internet serves as the data backbone for all kinds of protocols, making it possible to exchange not only text, but also voice, audio, video and various other forms of digital data between hundreds of millions of nodes, ranging from traditional desktop computers, servers or supercomputers to all kinds of wireless devices, embedded systems, sensors and even home equipment.
Traditionally, an illustration of the protocol layers of the Internet has the shape of an hourglass, with a single Internet Protocol (IP) on the central network layer and an increasingly wider spectrum of protocols above and below. Since the introduction of IP in 1981, which is basically still unchanged, technology and protocols have developed significantly. Underlying transmission media evolved from copper to fiber optics and WIFI, routers and switches became more and more intelligent and are able to handle Gbit/s instead of Kbit/s and additional middleware boxes have been introduced (e.g. NAT and firewalls). But also above the network layer new applications have constantly been added, ranging from basic services such as DNS and HTTP, to recent, complex P2P protocols allowing applications such as file-sharing, video streaming and telephony. With IPv6, even the foundation of the Internet is finally about to be substituted. This multiplicity of protocols and technologies leads to an ongoing increase in complexity of the Internet as a whole. Of course, individual network protocols and infrastructures are usually well understood when tested in isolated lab environments or network simulations. However, their behavior when observed while interacting with the vast diversity of applications and technologies in the hostile Internet environment is often unclear, especially on global scale.
This lack of understanding is further amplified by the fact that the topology of the Internet was not planned in advance. It is the result of an uncontrolled extension process, where heterogeneous networks of independent organizations have been connected one by one to the main Internet (INTERconnected NETworks). This means that each autonomous system (AS) has its own set of usage and pricing policies, QoS measures and resulting traffic mix. Thus usage of Internet protocols and applications is not only changing with time, but also with geographical locations. As an example, Nelson et al. [1] reported about an unusual application mix on a campus uplink in New Zealand due to a restrictive pricing policy, probably caused by higher prices for trans-pacific network capacities at this time.
Finally, higher connectivity bandwidths and growing numbers of Internet users also lead to increased misuse and anomalous behavior [2]. Not only the numbers of malicious incidents keep rising, but also the level of sophistication of attack methods and tools has increased. Today, automated attack tools employ more and more advanced attack patterns and react on the deployment of firewalls and intrusion detection systems by clever obfuscation of their malicious intentions. Malicious activities range from scanning to more advanced attack types such as worms and various denial of service attacks. Even well-known or anticipated attack types reappear in modified variants, as shown by the recent renaissance of cache poisoning attacks [3]. Unfortunately, the Internet, initially meant to be a friendly place, eventually became a very hostile environment that needs to be studied continuously in order to develop suitable counter strategies.
Overall, this means that even though the Internet may be considered to be the most important modern communication platform, its behavior is not well understood. It is therefore crucial that the Internet community understands the nature and detailed behavior of modern Internet traffic, in order to be able to improve network applications, protocols and devices and protect its users.
The best way to acquire a better and more detailed understanding of the modern Internet is to monitor and analyze real Internet traffic. Unfortunately, the above described rapid development has left little time or resources to integrate measurement and analysis possibilities into Internet infrastructure, applications and protocols. To compensate for this lack, the research community has started to launch dedicated Internet measurement projects, usually associated with considerable investment of both time and money. However, the experiences from a successful measurement project showed that measuring large-scale Internet traffic is not simple and involves a number of challenging tasks. In order to help future measurement projects to save some of their initial time expenses, this paper gives an overview of the major challenges which will eventually appear when planning to conduct measurements on high-speed network connections. Experiences from the MonNet project will then provide guidelines based on lessons learned (Section 8).
Section 2 starts by giving an overview of different network traffic measurement approaches and methodologies. Sections 3 Legal background, 4 Ethical and moral considerations, 5 Operational difficulties, 6 Technical aspects, 7 Data sharing then address the main challenges encountered while conducting passive Internet measurements. The challenges are discussed in order of their chronological appearance: First, a number of legal and ethical issues have to be sorted out with legislators and network operators before data collection can be started (Sections 3 Legal background, 4 Ethical and moral considerations). Second, operational difficulties need to be solved (Section 5) such as access privileges to the network operator’s premises. Once legal and operational obstacles are overcome, a third challenge is given by various technical difficulties when actually measuring high-speed links (Section 6), ranging from handling of vast data amounts to timing issues. Next public availability of network data are discussed, which should eventually be considered once data are successfully collected (Section 7). Afterwards, Section 8 outlines the MonNet project, which is the measurement project providing the experience for the present paper. Each point from Sections 3 Legal background, 4 Ethical and moral considerations, 5 Operational difficulties, 6 Technical aspects, 7 Data sharing will be revisited and the specific problems and solutions as experienced in the MonNet project are presented. These considerations are then summarized presenting the most important lessons learned regarding each particular issue, providing a quick guide for future measurement projects. Finally, Section 9 discusses future challenges of Internet measurement and concludes the paper.
Section snippets
Overview of network measurement methodologies
This section gives an overview of general network measurement approaches. The basic approaches are categorized among different axes and the most suitable methods for passive Internet measurements according to current best practice are pointed out.
The most common way to classify traffic measurement methods is to distinguish between active and passive approaches. Active measurement involves injection of traffic into the network in order to probe certain network devices (e.g. PING) or to measure
Legal background
In this section the legal background of Internet measurement is presented, which is somewhat in contrast to actual political developments and common academic practice. Current laws and regulations on electronic communication rarely explicitly consider or mention the recording or measurement of traffic for research purposes, which leaves scientific Internet measurement in some kind of legal limbo. In the following paragraphs the existing regulations for the EU and the US are briefly outlined in
Ethical and moral considerations
Besides potential conflicts with legal regulations and directives, Internet measurement activities raise also moral and ethical questions when it comes to privacy and security concerns of individual users or organizations using the networks. These considerations include discussions about what to store, how long to store and in which ways to modify stored data. The goal is to fulfill privacy and security requirements of individuals and organizations, while still keeping scientific relevant
Operational difficulties
Data centers and similar facilities housing networking equipment are usually well secured and access rights are not granted easily, which is especially true for external, non-operational staff, such as researchers. Often it is required that authorized personnel are present when access to certain premises is needed. This dependency makes planning and coordination difficult and reduces flexibility and time-efficiency. Flexibility constraints are further exaggerated by the geographic location of
Technical aspects
Measurement and analysis of Internet traffic is not only challenging in terms of legal and operational issues, but also it is above all a technical challenge. In the following subsections we first discuss methods to physically access and collect network traffic. We will then provide discussions about other important aspects regarding Internet measurement, including strategies to cope with the tremendous amounts of data and some considerations for how to get confidence in the measured data.
Data sharing
The discussions about all the legal, operational and technical difficulties involved in conducting Internet measurement clearly show that proper network traces are the result of a laborious and costly process. This explains why currently only few researchers and research groups have the possibilities to collect Internet backbone data, which makes proper traces a scarce resource. Therefore, the Internet measurement community has repeatedly been encouraged to share their valuable datasets and
Experiences from the MonNet project
This section provides a description and lessons learned from a project for passive Internet traffic monitoring and analysis conducted at Chalmers University of Technology: the MonNet project. The goal of the project is to provide a better understanding of Internet traffic characteristics based on empirical data, i.e. passive measurements on backbone links.
Summary and conclusions
The development of the Internet has without doubt not yet come to an end. In the next years, we have to expect a continuing growth in numbers of users and amounts of traffic. Traffic will exhibit an even higher diversity, with the Internet becoming a more and more unified backbone for all forms of communication and content (e.g. VoIP, IPTV, etc.). As a consequence, network bandwidths will continue to increase with at least the same pace as computer processing and storage capacities. However,
Acknowledgements
This work was supported by SUNET, the Swedish University Computer Network.
References (84)
- et al.
Clock synchronization for internet measurements: a clustering algorithm
Computer Networks
(2004) - et al.
Low-complexity offline and online clock skew estimation and removal
Computer Networks
(2006) - et al.
Analysis of long duration traces
SIGCOMM Computer Communication Review
(2005) - et al.
Computer attack trends challenge internet security
Computer
(2002) - RIPE NCC, YouTube Hijacking: A RIPE NCC RIS case study. <http://www.ripe.net/news/study-youtube-hijacking.html>...
- S. McCanne, V. Jacobson, The BSD packet filter: a new architecture for user-level packet capture, in: USENIX Winter,...
- S. Ubik, P. Zejdl, Passive monitoring of 10Gb/s lines with pc hardware, in: TNC’08: Terena Networking Conference,...
- R. Braden, Requirements for Internet Hosts-Communication Layers, RFC 1122 (Standard),...
- J. Case, M. Fedor, M. Schoffstall, J. Davin, Simple Network Management Protocol (SNMP), RFC 1157 (Historic),...
- B. Claise, Cisco Systems NetFlow Services Export Version 9, RFC 3954 (Informational),...
Detection of malicious traffic on backbone links via packet header analysis
Campus Wide Information Systems
The devil and packet trace anonymization
SIGCOMM Computer Communication Review
Remote physical device fingerprinting
IEEE Transactions on Dependable and Secure Computing
Cited by (26)
The complexity of surveying web participation
2014, Journal of Business ResearchCitation Excerpt :Chatterjee et al., 2003; Dreze & Zufryden, 1997; Thakor et al., 2004). Segmenting consumers on the basis of their session duration yields many insights; duration time is a critical outcome measure of consumption experiences used in web analytics to calculate the stickiness of a website or page (Bucklin & Sismeiro, 2009; Wolfgang et al., 2010). It is also a useful behavioral indicator of experiential versus goal-directed orientations (Holbrook & Gardner, 1993), as well as being a dependent variable in assessing advertising effects (Olney et al., 1991).
Research challenges towards the Future Internet
2011, Computer CommunicationsCitation Excerpt :This has stimulated intensive research activities to develop methods, tools and testbeds for supporting passive and active measurements. Passive measurements (e.g., see [140] for a review) use the existing network traffic, while active measurements create and send ad hoc probe packets. Active measurements are often used for estimating the network QoS that can be offered to an application – e.g. [142] presents and compares the available techniques and tools for bandwidth estimation.
Talking after Lights Out: An Ad Hoc Network for Electric Grid Recovery
2021, 2021 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids, SmartGridComm 2021Cyber-security research by ISPs: A NetFlow and DNS Anonymization Policy
2020, International Conference on Cyber Security and Protection of Digital Services, Cyber Security 2020Shortest path models for scale-free network topologies: Literature review and cross comparisons
2019, Modeling and Simulation of Complex Communication Networks