ABSTRACT
Entropy has recently gained considerable significance as an important metric for network measurement. Previous research has shown its utility in clustering traffic and detecting traffic anomalies. While measuring the entropy of the traffic observed at a single point has already been studied, an interesting open problem is to measure the entropy of the traffic between every origin-destination pair. In this paper, we propose the first solution to this challenging problem. Our sketch builds upon and extends the Lp sketch of Indyk with significant additional innovations. We present calculations showing that our data streaming algorithm is feasible for high link speeds using commodity CPU/memory at a reasonable cost. Our algorithm is shown to be very accurate in practice via simulations, using traffic traces collected at a tier-1 ISP backbone link.
- A. Chakrabarti, K. Do Ba, and S. Muthukrishnan. Estimating entropy and entropy norm on data streams. In STACS, 2006. Google ScholarDigital Library
- L. Bhuvanagiri and S. Ganguly. Estimating entropy over data streams. In ESA, 2006. Google ScholarDigital Library
- D. Brauckho, B. Tellenbach, A. Wagner, M. May, and A. Lakhina. Impact of packet sampling on anomaly detection metrics. In IMC, 2006. Google ScholarDigital Library
- G. Casella and R. L. Berger. Statistical Inference. Duxbury, 2nd edition, 2002.Google Scholar
- A. Chakrabarti and G. Cormode. A near-optimal algorithm for computing the entropy of a stream. In SODA, 2007. Google ScholarDigital Library
- J. M. Chambers, C. L. Mallows, and B. W. Stuck. A method for simulating stable random variables. Journal of the American Statistical Association, 71(354), 1976.Google ScholarCross Ref
- G. Cormode. Stable distributions for stream computations: It's as easy as 0,1,2. In Workshop on Management and Processing of Data Streams, 2003.Google Scholar
- G. Cormode, P. Indyk, N. Koudas, and S. Muthukrishnan. Fast mining of massive tabular data via approximate distance computations. In ICDE, 2002.Google ScholarCross Ref
- M. Durand and P. Flajolet. Loglog counting of large cardinalities. In ESA, 2003.Google ScholarCross Ref
- C. Estan and G. Varghese. New Directions in Traffic Measurement and Accounting. In SIGCOMM, Aug. 2002. Google ScholarDigital Library
- L. Feinstein, D. Schnackenberg, R. Balupari, and D. Kindred. Statistical approaches to DDoS attack detection and response. In Proceedings of the DARPA Information Survivability Conference and Exposition, 2003.Google ScholarCross Ref
- P. Indyk. Stable distributions, pseudorandom generators, embeddings and data stream computation. In FOCS, 2000. Google ScholarDigital Library
- P. Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. J. ACM, 53(3):307--323, 2006. Google ScholarDigital Library
- A. Kuzmanovic and E. W. Knightly. Low-rate tcp targeted denial of service attacks (the shrew vs. the mice and elephants). In SIGCOMM, 2003. Google ScholarDigital Library
- A. Lakhina, M. Crovella, and C. Diot. Mining anomalies using traffic feature distributions. In SIGCOMM, 2005. Google ScholarDigital Library
- A. Lall, V. Sekar, M. Ogihara, J. Xu, and H. Zhang. Data streaming algorithms for estimating entropy of network traffic. In SIGMETRICS, 2006. Google ScholarDigital Library
- G. S. Manku and R. Motwani. Approximate frequency counts over data streams. In Proceedings of the 28th International Conference on Very Large Data Bases, 2002. Google ScholarDigital Library
- A. Medina, N. Taft, K. Salamatian, S. Bhattacharyya, and C. Diot. Traffic matrix estimation: existing techniques and new directions. In SIGCOMM, Aug. 2002. Google ScholarDigital Library
- S. Muthukrishnan. Data streams: algorithms and applications. available athttp://athos.rutgers.edu/~muthu/.Google Scholar
- J. Nolan. STABLE program. online at http://academic2.american.edu/~jpnolan/stable/stable.html.Google Scholar
- A. Soule, A. Nucci, R. Cruz, E. Leonardi, and N. Taft. How to identify and estimate the largest traffic matrix elements in a dynamic environment. In SIGMETRICS, June 2004. Google ScholarDigital Library
- C. Tebaldi and M. West. Bayesian inference on network traffic using link count data. Journal of American Statistics Association, pages 557--576, 1998.Google Scholar
- Y. Vardi. Internet tomography: estimating source-destination traffic intensities from link data. Journal of American Statistics Association, pages 365--377, 1996.Google Scholar
- A. Wagner and B. Plattner. Entropy Based Worm and Anomaly Detection in Fast IP Networks. In Proceedings of IEEE International Workshop on Enabling Technologies, Infrastructures for Collaborative Enterprises, 2005. Google ScholarDigital Library
- K. Xu, Z.-L. Zhang, and S. Bhattacharya. Proling internet backbone traffic: Behavior models and applications. In SIGCOMM, 2005. Google ScholarDigital Library
- Y. Zhang, Z. M. Mao, and J. Wang. Low-rate tcp-targeted dos attack disrupts internet routing. In Proc. 14th Annual Network & Distributed System Security Symposium, 2007.Google Scholar
- Q. Zhao, Z. Ge, J. Wang, and J. Xu. Robust traffic matrix estimation with imperfect information: making use of multiple data sources. In SIGMETRICS, 2006. Google ScholarDigital Library
- Q. Zhao, A. Kumar, J. Wang, and J. Xu. Data streaming algorithms for accurate and efficient measurement of traffic and flow matrices. In SIGMETRICS, June 2005. Google ScholarDigital Library
- V. M. Zolotarev. One-Dimensional Stable Distributions, volume 65 of Translations of Mathematical Monographs. American Mathematical Society, Providence, RI, 1986.Google Scholar
Index Terms
- A data streaming algorithm for estimating entropies of od flows
Recommendations
Data streaming algorithms for estimating entropy of network traffic
Performance evaluation reviewUsing entropy of traffic distributions has been shown to aid a wide variety of network monitoring applications such as anomaly detection, clustering to reveal interesting patterns, and traffic classification. However, realizing this potential benefit in ...
Data streaming algorithms for accurate and efficient measurement of traffic and flow matrices
Performance evaluation reviewThe traffic volume between origin/destination (OD) pairs in a network, known as traffic matrix, is essential for efficient network provisioning and traffic engineering. Existing approaches of estimating the traffic matrix, based on statistical inference ...
Data streaming algorithms for accurate and efficient measurement of traffic and flow matrices
SIGMETRICS '05: Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systemsThe traffic volume between origin/destination (OD) pairs in a network, known as traffic matrix, is essential for efficient network provisioning and traffic engineering. Existing approaches of estimating the traffic matrix, based on statistical inference ...
Comments