Two tools for network traffic analysis

https://doi.org/10.1016/S1389-1286(00)00188-2Get rights and content

Abstract

This paper explores two tools for evaluating irregular sequences of numbers occurring in the operation of computer networks. The time series might be interarrival times, packet lengths, or IP destination addresses. The tools were developed because there is a need for network designers and administrators to understand network traffic, an understanding not provided by conventional statistical methods. An application of the tools would be comparison of real and synthetic sequences; should a synthetic sequence not yield the same outputs as a real sequence, then use of the synthetic sequence in modeling would be questionable. The first tool is a modification of fractal dimension work in other fields, notably radar research. Its application is limited to data for which a numerical comparison makes sense, such as time lengths. The second tool essentially discovers the same thing as the first, the existence of data points that are visited frequently. However, the second tool does not rely upon a metric and so can be applied to data such as addresses.

Introduction

The dream of every network engineer must be synthetic sources of traffic of all types which could be called upon to generate “typical” and “reasonable worst case” traffic streams. Design and administrative decisions based upon the reaction of a network model to such traffic would be sound and defensible. The engineer would be confident of average performance and secure in the knowledge of how bad the worst cases could really be – no surprises upon installation. More precisely, for a given investment and given profile of traffic sources, the failure rate could be predicted; then the cost of system failure could be used to evaluate the investment.

Alas, the arrival of such synthetic sources is not likely to occur soon. The reasons are illustrated in Fig. 1, Fig. 2, Fig. 3. The figures depict data for 1000 Ethernet frames transmitted over about 2 s. The trace is one of many available from the North Carolina State University web site http://www.ece.ncsu.edu/cacc/traffic.

Note for example that the interarrival times in Fig. 1 from about 35th to about 160th are anomalously small. A standard statistical summary of all 1000 points would assimilate and mask this interval, possibly with dire consequences for network design. More to the point, the interarrival times in Fig. 1 and frame sizes in Fig. 2 (up to about 1500 bytes, the largest Ethernet payload) conspire to produce the very irregular send rates in Fig. 3. Such repeated instances of short interarrival times and high send rates could result in the overflow of a buffer and packet loss.

Furthermore, even during epochs of apparently stable send rate, such as from the 160th to the 900th frames, a full description of traffic remains elusive. Other than basic statistical parameters, how should the time series be characterized for the purpose of network testing?

Modern thinking on characterization of traffic has been largely based upon on the important discovery of self-similarity in a range of scales in real traces, as documented in [1], [2]. Real traffic, with long-range dependencies, is far burstier than any Poisson model would be. This discovery marked the beginning of attempts to understand the performance implications for computer networks of chaos in traffic patterns and engendered a vast literature. Aside from interarrival times, frame sizes, and bandwidth measurements shown this paper, the chaotic nature of Internet traffic can be sought and studied in many other types of data, for example, in the study of aggregated TCP session statistics as in [3].

In particular, much progress in understanding network traffic has been made using correlation measurements. For example, Morris and Lin [4] concluded from conventional correlation functions applied to real bandwidth data for a certain link that the relationship between change of mean bandwidth in time versus its variance is the same as for Poisson traffic, namely linear. Furthermore, the change in mean bandwidth in aggregation scale versus its variance is also the same as for Poisson traffic. So to speak, Web traffic bandwidth gets smoother with aggregation with the same rapidity as would a Poisson model.

Other comparisons of models and real traffic have not led immediately to successful traffic prediction. For example, straightforward predictive models such as auto-regressive moving average (ARMA) and Markov-modulated Poisson process (MMPP) have been proposed in [5]. According to Sang and Li [6], such models apparently work best when short-term traffic variations have been filtered out. However, short-term variations may be of high interest to designers of networking hardware because the network infrastructure is often assessed as much on the basis of its reaction to surge conditions as “normal” conditions. This is especially true in electronic commerce.

The present paper introduces two new tools for traffic correlation studies, specifically one that measures correlation with a norm that makes use of all data, not just data near a mean, and another that measures redundancy based upon Least Recently Used cache methods [7, p. 378]. These new tools could also be used to compare real traffic characteristics to conventional models. Thus, the present goal is to detect and measure self-similarity in real and synthetic traces in part for the purpose of validating synthetic generation schemes. That is, the purpose of this paper is to present two tools for obtaining numbers from irregular data streams such as the above Ethernet trace. The goal of both is detect a degree of structure in time series that is not apparent through conventional stochastic analysis. The measures of structure themselves might change over epochs within a time series. Thus, stochastic analysis of the measures, as opposed to stochastic analysis of the traffic itself, might be a key to synthetic traffic generation.

Finally, efforts to understand Internet statistics are not uniformly regarded as successful or conclusive. According to [8], “Internet traffic behavior has been resistant to modeling. The reasons derive from the Internet's evolution as a composition of independently developed and deployed (and by no means synergistic) protocols, technologies, and core applications. Moreover, this evolution, though “punctuated” by new technologies, has experienced no equilibrium thus far. … while the core of the Internet continues its rapid evolution, measurement and modeling of it progress at a leisurely pace.”

Section snippets

A correlation function and its approximation

In the context of analyzing signals from a radar staring at ocean waves, Leung and Haykin [9] and Haykin and Puthusserypady [10] have explored the concept of minimizing the degrees of freedom necessary to describe highly irregular time series. Their work, which inspires ours, suggests that a deterministic model with a low number of degrees of freedom, five or six, can account for a type of radar clutter. That is, signals observed as clutter in ocean radar surveillance can be in principle

Results for computer traffic

The purpose of this section is to use real Ethernet interarrival time data and fit C(r) for all 0⩽r<1. The fit is accomplished with the approximating function rμ. The goodness of fit is defined in terms of the previous section.

For the case n=10 and T=5, the μ̂ values for the first and second halves of the data are 5.98 and 5.86 with a test statistic of t=1.3 (insignificant difference).

Data points (Ethernet interarrival times) from Fig. 1 can be approximated by C(r)=r5.97 as shown in Fig. 4.

List management with least recently used algorithm

The second traffic tool studied in this paper comes from the theory of cache management. In computer memory design, it is desirable to move frequently used information from main storage to a small, fast, expensive cache. The problem is to choose which information truly deserves inclusion in the cache. For a discussion see [10, p. 378].

A common algorithm for cache maintenance is that in which new block labels enter a stack and the “least recently used” label is dropped. That is, if a label

Conclusion

Computer traffic modeling is both difficult and important. Before network designers can be confident in their recommendations, synthetic sources that somehow capture the critical aspects of real data are needed. Two tests have been presented for proposed synthetic sources. The first is a computational tool from chaos theory that looks for a fractal dimension in epochs of apparently steady traffic. Determination of the dimension raises the titillating but as yet unrealized prospect of reverse

Marie Coffin, Ph.D., Iowa State University. She is a statistician with Paradigm Genetics. Her research interests include nonparametric statistics, experimental design, exploratory data analysis, and statistical computing.

References (19)

  • P. Grassberger et al.

    Measuring the strangeness of strange attractors

    Physica D

    (1983)
  • W. Leland, M. Taqqu, W. Willinger, D. Wilson, On the self-similar nature of Ethernet traffic, in: Proceedings of the...
  • A. Erramilli, W. Willinger, T. Lakshman, D. Heyman, A. Mukherjee, S. Li, O. Narayan, Performance impacts of...
  • R. Morris, TCP behavior with many flows, in: Proceedings of the IEEE International Conference on Network Protocols...
  • R. Morris, D. Lin, Variance of aggregated Web traffic, in: Proceedings of the IEEE Infocom'2000 Conference, March 2000,...
  • L. Kulkarni et al.

    Measurement-based traffic modeling

    Journal of Stochastic Model

    (1998)
  • A. Sang, S. Li, A predictability analysis of network traffic, in: Proceedings of the IEEE Infocom'2000 Conference,...
  • J. Hennessy et al.

    Computer Architecture a Quantitative Approach

    (1990)
  • K. Claffy

    Measuring the internet

    IEEE Internet Computing

    (January–February 2000)
There are more references available in the full text version of this article.

Cited by (5)

  • Secure authentication scheme for IoT and cloud servers

    2015, Pervasive and Mobile Computing
    Citation Excerpt :

    By analyzing this information, the attacker can acquire information needed to authenticate the device to the server. An attacker can perform brute-force attacks using traffic analysis tools such as Flowd [38] and pcNetFlow [38]. This type of attack is also called brute force attack.

  • Computation and analysis of node intending trust in WSNs

    2010, Proceedings - 2010 IEEE International Conference on Wireless Communications, Networking and Information Security, WCNIS 2010
  • A kind of prediction method of user behaviour for future trustworthy network

    2006, International Conference on Communication Technology Proceedings, ICCT
  • A prediction method of network traffic using time series models

    2006, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Marie Coffin, Ph.D., Iowa State University. She is a statistician with Paradigm Genetics. Her research interests include nonparametric statistics, experimental design, exploratory data analysis, and statistical computing.

Clark Jeffries, Ph.D., University of Toronto. He is a designer with a computer company. His research interests include hash functions, flow control, classification, memory management, bandwidth allocation, and network management.

Peter C. Kiessler, Ph.D., Virginia Polytechnic Institute and State University. He is Associate Professor in the Department of Mathematical Sciences at Clemson University. His research interests are in applications of probability to queueing models, decision analysis, and reliability.

View full text