skip to main content
10.1145/3098593.3098598acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article
Free Access

Hierarchical IP flow clustering

Published:07 August 2017Publication History

ABSTRACT

The analysis of flow traces can help to understand a network's usage patterns. We present a hierarchical clustering algorithm for network flow data that can summarize terabytes of IP traffic into a parsimonious tree model. The method automatically finds an appropriate scale of aggregation so that each cluster represents a local maximum of the traffic density from a block of source addresses to a block of destination addresses. We apply this clustering method on NetFlow data from an enterprise network, find the largest traffic clusters, and analyze their stationarity across time. The existence of heavy-volume clusters that persist over long time scales can help network operators to perform usage-based accounting, capacity provisioning and traffic engineering. Also, changes in the layout of hierarchical clusters can facilitate the detection of anomalies and significant changes in the network workload.

References

  1. Deepak Agarwal, Dhiman Barman, Dimitrios Gunopulos, Neal E Young, Flip Korn, and Divesh Srivastava. Efficient and effective explanation of change in hierarchical summaries. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 6--15. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Jon Louis Bentley. Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9):509--517, 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Rick Chartrand. Numerical differentiation of noisy, nonsmooth data. ISRN Applied Mathematics, 2011, 2011.Google ScholarGoogle Scholar
  4. Cristian Estan, Stefan Savage, and George Varghese. Automatically inferring patterns of resource consumption in network traffic. In Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications, pages 137--148. ACM, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd, volume 96, pages 226--231, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Pasi Franti, Olli Virmajoki, and Ville Hautamaki. Fast agglomerative clustering using a k-nearest neighbor graph. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(11):1875--1881, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Abdun Naser Mahmood, Christopher Leckie, and Parampalli Udaya. An efficient clustering scheme to exploit hierarchical data in network traffic analysis. Knowledge and Data Engineering, IEEE Transactions on, 20(6):752--767, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Antonio Nucci, Ashwin Sridharan, and Nina Taft. The problem of synthetically generatingip traffic matrices: initial recommendations. ACM SIGCOMM Computer Communication Review, 35(3):19--32, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Matthew Roughan. Simplifying the synthesis of internet traffic matrices. ACM SIGCOMM Computer Communication Review, 35(5):93--96, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Juha Vesanto and Esa Alhoniemi. Clustering of the self-organizing map. IEEE Transactions on neural networks, 11(3):586--600, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jisheng Wang, David J Miller, and George Kesidis. Efficient mining of the multidimensional traffic cluster hierarchy for digesting, visualization, and anomaly identification. Selected Areas in Communications, IEEE Journal on, 24(10):1929--1941, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Yin Zhang, Sumeet Singh, Subhabrata Sen, Nick Duffield, and Carsten Lund. Online identification of hierarchical heavy hitters: algorithms, evaluation, and applications. In Proceedings of the 4th ACM SIGCOMM conference on Internet measurement, pages 101--114. ACM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Hierarchical IP flow clustering

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          Big-DAMA '17: Proceedings of the Workshop on Big Data Analytics and Machine Learning for Data Communication Networks
          August 2017
          58 pages
          ISBN:9781450350549
          DOI:10.1145/3098593

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 7 August 2017

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate7of11submissions,64%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader