skip to main content
10.1145/1298306.1298345acmconferencesArticle/Chapter ViewAbstractPublication PagesimcConference Proceedingsconference-collections
Article

A data streaming algorithm for estimating entropies of od flows

Published:24 October 2007Publication History

ABSTRACT

Entropy has recently gained considerable significance as an important metric for network measurement. Previous research has shown its utility in clustering traffic and detecting traffic anomalies. While measuring the entropy of the traffic observed at a single point has already been studied, an interesting open problem is to measure the entropy of the traffic between every origin-destination pair. In this paper, we propose the first solution to this challenging problem. Our sketch builds upon and extends the Lp sketch of Indyk with significant additional innovations. We present calculations showing that our data streaming algorithm is feasible for high link speeds using commodity CPU/memory at a reasonable cost. Our algorithm is shown to be very accurate in practice via simulations, using traffic traces collected at a tier-1 ISP backbone link.

References

  1. A. Chakrabarti, K. Do Ba, and S. Muthukrishnan. Estimating entropy and entropy norm on data streams. In STACS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. L. Bhuvanagiri and S. Ganguly. Estimating entropy over data streams. In ESA, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Brauckho, B. Tellenbach, A. Wagner, M. May, and A. Lakhina. Impact of packet sampling on anomaly detection metrics. In IMC, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. Casella and R. L. Berger. Statistical Inference. Duxbury, 2nd edition, 2002.Google ScholarGoogle Scholar
  5. A. Chakrabarti and G. Cormode. A near-optimal algorithm for computing the entropy of a stream. In SODA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. M. Chambers, C. L. Mallows, and B. W. Stuck. A method for simulating stable random variables. Journal of the American Statistical Association, 71(354), 1976.Google ScholarGoogle ScholarCross RefCross Ref
  7. G. Cormode. Stable distributions for stream computations: It's as easy as 0,1,2. In Workshop on Management and Processing of Data Streams, 2003.Google ScholarGoogle Scholar
  8. G. Cormode, P. Indyk, N. Koudas, and S. Muthukrishnan. Fast mining of massive tabular data via approximate distance computations. In ICDE, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  9. M. Durand and P. Flajolet. Loglog counting of large cardinalities. In ESA, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  10. C. Estan and G. Varghese. New Directions in Traffic Measurement and Accounting. In SIGCOMM, Aug. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. Feinstein, D. Schnackenberg, R. Balupari, and D. Kindred. Statistical approaches to DDoS attack detection and response. In Proceedings of the DARPA Information Survivability Conference and Exposition, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  12. P. Indyk. Stable distributions, pseudorandom generators, embeddings and data stream computation. In FOCS, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. J. ACM, 53(3):307--323, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Kuzmanovic and E. W. Knightly. Low-rate tcp targeted denial of service attacks (the shrew vs. the mice and elephants). In SIGCOMM, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Lakhina, M. Crovella, and C. Diot. Mining anomalies using traffic feature distributions. In SIGCOMM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Lall, V. Sekar, M. Ogihara, J. Xu, and H. Zhang. Data streaming algorithms for estimating entropy of network traffic. In SIGMETRICS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. S. Manku and R. Motwani. Approximate frequency counts over data streams. In Proceedings of the 28th International Conference on Very Large Data Bases, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Medina, N. Taft, K. Salamatian, S. Bhattacharyya, and C. Diot. Traffic matrix estimation: existing techniques and new directions. In SIGCOMM, Aug. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Muthukrishnan. Data streams: algorithms and applications. available athttp://athos.rutgers.edu/~muthu/.Google ScholarGoogle Scholar
  20. J. Nolan. STABLE program. online at http://academic2.american.edu/~jpnolan/stable/stable.html.Google ScholarGoogle Scholar
  21. A. Soule, A. Nucci, R. Cruz, E. Leonardi, and N. Taft. How to identify and estimate the largest traffic matrix elements in a dynamic environment. In SIGMETRICS, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. Tebaldi and M. West. Bayesian inference on network traffic using link count data. Journal of American Statistics Association, pages 557--576, 1998.Google ScholarGoogle Scholar
  23. Y. Vardi. Internet tomography: estimating source-destination traffic intensities from link data. Journal of American Statistics Association, pages 365--377, 1996.Google ScholarGoogle Scholar
  24. A. Wagner and B. Plattner. Entropy Based Worm and Anomaly Detection in Fast IP Networks. In Proceedings of IEEE International Workshop on Enabling Technologies, Infrastructures for Collaborative Enterprises, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. K. Xu, Z.-L. Zhang, and S. Bhattacharya. Proling internet backbone traffic: Behavior models and applications. In SIGCOMM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Y. Zhang, Z. M. Mao, and J. Wang. Low-rate tcp-targeted dos attack disrupts internet routing. In Proc. 14th Annual Network & Distributed System Security Symposium, 2007.Google ScholarGoogle Scholar
  27. Q. Zhao, Z. Ge, J. Wang, and J. Xu. Robust traffic matrix estimation with imperfect information: making use of multiple data sources. In SIGMETRICS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Q. Zhao, A. Kumar, J. Wang, and J. Xu. Data streaming algorithms for accurate and efficient measurement of traffic and flow matrices. In SIGMETRICS, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. V. M. Zolotarev. One-Dimensional Stable Distributions, volume 65 of Translations of Mathematical Monographs. American Mathematical Society, Providence, RI, 1986.Google ScholarGoogle Scholar

Index Terms

  1. A data streaming algorithm for estimating entropies of od flows

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      IMC '07: Proceedings of the 7th ACM SIGCOMM conference on Internet measurement
      October 2007
      390 pages
      ISBN:9781595939081
      DOI:10.1145/1298306

      Copyright © 2007 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 24 October 2007

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate277of1,083submissions,26%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader