A probability-guaranteed adaptive timeout algorithm for high-speed network flow detection

doi:10.1016/j.comnet.2004.11.005

Computer Networks

Volume 48, Issue 2, 6 June 2005, Pages 215-233

https://doi.org/10.1016/j.comnet.2004.11.005 Get rights and content

Abstract

Collecting network traffic is becoming a more challenging task in passive network measurement due to the rapid growth of link speed. Flow-based network traffic capture and storage provides an efficient way for high-speed network measurement. The paper concentrates on the flow detection issue which is also the premise for further flow-based traffic analysis and modeling in such challenging environment. Based on the statistical investigation of the correlations between flow size and the maximum packet interarrival time within a flow, we obtain the empirical conditional distribution functions for some popular TCP protocol-based application flows, and then propose a Probability-Guaranteed Adaptive Timeout algorithm (PGAT) for flow termination decision. The assessment criteria for flow termination decision algorithm is systematically developed. Comparisons on flow generation ratio, flow intact ratio, and mean flow extra retaining time metrics indicate that the PGAT algorithm can obtain more attractive performance than other related works.

Introduction

Passive network measurement is one of the main network performance measurement methodologies. Through passive measurement, it is easy to characterize the network traffic, to get constant insight view of network performance, such as link utilization or even end-to-end performance metrics [1], [2]. At the same time, passive measurement also facilitates the researches on backbone traffic modelling [3], flow-based accounting [4] and bandwidth sharing [5], etc. These works are of much importance for network design, maintenance and performance promotion.

To estimate network metrics by passive measurement, network traffic is captured via port mirroring or optical splitter. Thus, the passive measurement technology is composed of three elements, i.e, monitoring, analysis engine and control plane. The monitoring part is responsible for network traffic collection. The analysis engine focuses on mining network performance metrics from those captured traffic traces and provides to users in comprehensive manners. The control plane manages the traffic trace collection strategies and metrics measurement methodologies, as well as other general system configuration tasks. Therefore, the transaction flow of a classical passive measurement system can be summarized as follows [6]:

1.
Capture part/all of the packets that pass through the packet capture card, such as DAG card [7] by means of optical splitter or in other similar ways. The captured information of a packet include Data Link Layer Header, Network Layer Header and Transmission Layer Header (if any), while the content of a packet is excluded for privacy reasons, i.e. only packet header is captured.
2.
Timestamp the collected packet indicating when the packet is captured. If packets are captured at multiple measurement points for further cooperative analysis, clock synchronization must be retained among measurement points.
3.
Transfer captured packet headers temporally hold on capture board to main memory of system for further process, such as trace anonmization and compression [8] or performance metrics calculation.
4.
Transfer packet traces from system main memory to hard disks or tapes for long-term network performance evaluation or backup.

In above procedures, packets are captured and stored individually, this form of packets collection is referred to as packet-based process. With the drastic increase of link bandwidth (nowadays, the bandwidth of the Internet backbone links has reached 2.5 Gbps/10 Gbps or higher) and the advent of high bandwidth-consuming real-time network services, those traditional passive measurement systems are facing more pressing challenges. Meanwhile, part of measurement technologies or methods are not suitable for such circumstances. As this paper concerns on the packet trace collection issue, we list the main challenges that are suffering or will suffer during traffic collection process:

1.
Challenge on PCI bus throughput
In most systems, the PCI is crossed twice for traffic capture and storage, once from capture card to the system main memory and the other from the main memory to disk [9]. Therefore, current PCI bus technologies will not meet the throughput requirement at 10 Gbps or above.
2.
Challenge on storage capacity
When packet-based process is adopted for high-speed network trace collection, the volume of data will in the order of terabytes per day. The storage, transformation, management and analysis for the vast number of data sets are even harder.
3.
Challenge on access speed
Recently, the speeds of storage device and memory do not keep pace with the explosion of link bandwidth. In fact, memory access and disk array speeds have not increased much in the past sevral years and there are no apparent evidence that their speeds will increase considerably in the near future.

Therefore, with current computer architecture, packet-based process will suffer more and more problems for high-speed network measurement. It is crucial to develop new technologies, especially new traffic capture methodologies for passive measurement in high-speed network environment.

In the paper, based on the statistical investigation of the correlations between flow size and the maximum packets interarrival time (MPIT) of consecutive packets within a application flow, we propose a Probability-Guaranteed Adaptive Timeout (PGAT) strategy to determine the flow termination for efficient flow-based traffic collection. Experiments on practical Internet traffic traces show that PGAT algorithm can preserve more application flows to be intact, while flow state retaining duration on capture board is shortened compared with related works. Further analysis indicates that PGAT algorithm is also scalable and flexible.

This paper is organized as follows. In Section 2, we give some backgrounds on flow-based traffic collection and review previous progress on flow termination decision. Section 3 involves in the correlation analysis of flow size and MPIT of consecutive packets within popular types of application flows. Section 4 proposes the PGAT algorithm. In Section 5, we present the performance comparisons between PGAT and Measurement-based Binary Exponential Timeout algorithm (MBET). Finally, we conclude the paper in Section 6.

Section snippets

Related works

In packet-based process style, packet is the minimum unit for capture and storage. Generally, the information collected for a packet are 64-byte block in length which is often called the packet header, including the 12 byte of Data Link Layer Header, 44 byte of IP and TCP/UDP header(for IPv4 protocol), and the 8 byte timestampping. Analysis on the packet headers belonging to a session (HTTP or Telnet session, etc. for instance) shows that most fields of packet headers remain the same during entire

The correlations of some flow characteristics

To address the disadvantages of fixed and adaptive timeout algorithms, we propose a Probability-Guaranteed Adaptive Timeout algorithm for flow termination decision based on the correlation analysis of some flow characteristics on actual network traces. The algorithm not only preserves the flow integrity with a given probability, but also shortens the average extra time for holding flow state compared with related works.

Probability-Guaranteed Adaptive Timeout algorithm (PGAT) for flow termination decision

From the discrete joint probability function p_ij, define the conditional probability function p_i∣j as Eq. (2) $p_{i | j} = P (a_{i} < X ⩽ b_{i} | Y ⩾ c_{j}) = \frac{\sum_{m = j}^{4} P (a_{i} < X ⩽ b_{i}, c_{m} ⩽ Y < d_{m})}{\sum_{l = 1}^{7} \sum_{m = j}^{4} P (a_{l} < X ⩽ b_{l}, c_{m} ⩽ Y < d_{m})},$ where i = 1, 2, … , 7; j = 1, 2, … , 4.

The meaning of p_i∣j is the probability that the MPIT of a flow ranges (a_i, b_i] after c_j packets have been observed.

Let F(x∣c_j), j = 1, 2, … , 4 denote the MPIT conditional distribution function after c_j packets have been observed for a flow, then we get Eq. (3): $F (x | c_{j}) = \sum_{i = 1}^{k} P (a_{i} < X ⩽ b_{i} | Y ⩾ c_{j}), x ⩽ b_{k},$

Metrics definition

Ryu compares the MBET algorithm with the fixed timeout scheme on the thrashing and shortening aspects and concludes that MBET is superior to its counterpart’s [12]. In the paper, we evaluate the PGAT performance only against MBET algorithm systematically on three metrics: flow generation ratio, flow intact ratio and mean flow extra retaining time, which demonstrate the performance of PGAT more comprehensively.

Definition 1

For a packet header trace, when a flow termination decision algorithm is employed, the

Conclusions and remarks

Flow-based traffic collection is an efficient way for high-speed network measurement. It not only compresses captured packet headers losslessly, but also felicitates the flow-based network activity researches. In the paper, based on the correlation analysis between MPIT and flow size for different types of TCP-based application flows, we propose a Probability-Guaranteed Adaptive Timeout (PGAT) algorithm for flow termination decision. Under a proper probability Q, PGAT’s performance overcomes

Acknowledgements

The authors would like to thank Professor Nelson L.S Fonseca and the anonymous reviewers for their helpful suggestions and insightful comments on the paper. we are also indebt to the PMA project for the publication of the packet header traces used in the paper.

Junfeng Wang received the M.S. degree in Computer Application Technology from Chongqing University of Posts and Telecommunications, Chongqing, in 2001 and Ph.D degree (with honor) in Computer Science from University of Electronic Science and Technology of China, Chengdu, in 2004. From July 2004, he holds a postdoctoral position in Institute of Software, the Chinese Academy of Sciences. His recent research interests include satellite and sensor network routing design, protocol modeling and

References (25)

H.S. Martin, A.J. McGregor, J.G. Cleary, Analysis of Internet delay times, in: Proceedings of the Passive and Active...
J.G. Cleary, H.S. Martin, Estimating Bandwidth from passive measurment traces, in: Proceedings of the Passive and...
C. Barakat, P. Thiran, G. Iannaccone, et al., A flow-based model for Internet backbone traffic, in: Proceedings of the...
C. Estan, G. Varghese, New directions in traffic measurement and accounting, in: Proceedings of the First ACM SIGCOMM...
S. Ben Fredj, T. Bonald, A. Proutiere, et al., Statistical bandwidth sharing: a study of congestion at flow level, in:...
G. Iannaccone, C. Diot, I. Graham, et al., Monitoring very high speed links, in: Proceedings of the First ACM SIGCOMM...
Homepage of the DAG project. http://dag.cs.waikato.ac.nz, Feburary...
M. Peuhkuri, A method to compress and anonymize packet traces, in: Proceedings of the First ACM SIGCOMM Workshop on...
C. Fraleigh, C. Diot, B. Lyles, et al., Design and deployment of a passive monitoring infrastructure, in: Proceedings...
K.C. Claffy et al.
A parameterizable methodology for internet traffic flow profilling
IEEE Journal on Selected Areas in Communications
(1995)

K. Thompson et al.

Wide-area internet traffic patterns and characteristics

IEEE Network

(1997)

B. Ryu, D. Cheney, H.-W. Braun, Internet flow characterization: adaptive timeout strategy and statistical modeling, in:...

Cited by (9)

An integrity-guaranteed timeout threshold algorithm for UDP flow identification
2017, 2017 2nd IEEE International Conference on Computational Intelligence and Applications, ICCIA 2017
TCP Flow Identifying Algorithm Based on Finite State Automaton
2017, Tien Tzu Hsueh Pao/Acta Electronica Sinica
Flow-aggregation accelerating strategy for TCP traffic
2014, Journal of Networks
Timeout strategy of sampled flow in high-speed networks
2013, Proceedings - 2013 International Conference on Mechatronic Sciences, Electric Engineering and Computer, MEC 2013
An adaptive timeout strategy for profiling UDP flows
2010, Proceedings - 2010 1st International Conference on Networking and Computing, ICNC 2010
An adaptive timeout strategy for UDP flows using SVMs
2010, Parallel and Distributed Computing, Applications and Technologies, PDCAT Proceedings

View all citing articles on Scopus

Lei Li received the B.S., Ph.D degrees from National University of Defense Technology, Changsha, China, in 1994 and 1999 respectively. He received another Ph.D degree in Communication and Electronic System from Harbin Industry University. From 1999 to 2002, he held a postdoctoral position in the Academy of the General Administration at Tsinghua University. From June 2003, he services as the vice director of Institute of Software, the Chinese Academy of Sciences. His research interests include E-government, network control and management, management science, etc.

Fuchun Sun received the B.S., M.S. degrees from Naval Aeronautical Engineering Academy, Yantai, China, in 1986 and 1989, respectively, and Ph.D degree from the Department of Computer Science and Technology, Tsinghua University, Beijing, China, in 1998. He worked over four years for the Department of Automatic Control at Naval Aeronautical Engineering Academy. From 1998 to 2000 he was a Postdoctoral Fellow of the Department of Automation at Tsinghua University, Beijing, China. Now he is a professor in the Department of Computer Science and Technology, Tsinghua University, Beijing, China. His research interests include intelligent control, network control and management, neural networks, fuzzy systems, variable structure control, nonlinear systems and robotics. He is a member of the IEEE Control System Society. He is the recipient of the excellent Doctoral Dissertation Prize of China in 2000.

Mingtian Zhou was born in 1939. He is a professor at the College of Computer Science and Engineering, University of Electronic Science and Technology of China. He is a senior member of IEEE Computer Society. Now, he is on the Editorial Board of Chinese Journal of Electronics and Acta Electronica Sinica. His research interests include computer networking information system, open distributed processing system, computer system and software, and computer supported collaboration work.

View full text

Computer Networks

A probability-guaranteed adaptive timeout algorithm for high-speed network flow detection

Abstract

Introduction

Section snippets

Related works

The correlations of some flow characteristics

Probability-Guaranteed Adaptive Timeout algorithm (PGAT) for flow termination decision

Metrics definition

Conclusions and remarks

Acknowledgements

A parameterizable methodology for internet traffic flow profilling

IEEE Journal on Selected Areas in Communications

Wide-area internet traffic patterns and characteristics

IEEE Network

An integrity-guaranteed timeout threshold algorithm for UDP flow identification

TCP Flow Identifying Algorithm Based on Finite State Automaton

Flow-aggregation accelerating strategy for TCP traffic

Timeout strategy of sampled flow in high-speed networks

An adaptive timeout strategy for profiling UDP flows

An adaptive timeout strategy for UDP flows using SVMs