research-article

Free Access

Hierarchical IP flow clustering

Authors:
Kamal Shadi

Georgia Institute of Technology, Atlanta, Georgia

Georgia Institute of Technology, Atlanta, Georgia
View Profile

,
Preethi Natarajan

Chief Technology and Architecture Office, Cisco, San Jose, CA, USA

Chief Technology and Architecture Office, Cisco, San Jose, CA, USA
View Profile

,
Constantine Dovrolis

Georgia Institute of Technology, Atlanta, Georgia

Georgia Institute of Technology, Atlanta, Georgia
View Profile

Big-DAMA '17: Proceedings of the Workshop on Big Data Analytics and Machine Learning for Data Communication NetworksAugust 2017Pages 25–30https://doi.org/10.1145/3098593.3098598

Published:07 August 2017Publication History

Big-DAMA '17: Proceedings of the Workshop on Big Data Analytics and Machine Learning for Data Communication Networks

Pages 25–30

ABSTRACT

The analysis of flow traces can help to understand a network's usage patterns. We present a hierarchical clustering algorithm for network flow data that can summarize terabytes of IP traffic into a parsimonious tree model. The method automatically finds an appropriate scale of aggregation so that each cluster represents a local maximum of the traffic density from a block of source addresses to a block of destination addresses. We apply this clustering method on NetFlow data from an enterprise network, find the largest traffic clusters, and analyze their stationarity across time. The existence of heavy-volume clusters that persist over long time scales can help network operators to perform usage-based accounting, capacity provisioning and traffic engineering. Also, changes in the layout of hierarchical clusters can facilitate the detection of anomalies and significant changes in the network workload.

References

Deepak Agarwal, Dhiman Barman, Dimitrios Gunopulos, Neal E Young, Flip Korn, and Divesh Srivastava. Efficient and effective explanation of change in hierarchical summaries. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 6--15. ACM, 2007. Google ScholarDigital Library
Jon Louis Bentley. Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9):509--517, 1975. Google ScholarDigital Library
Rick Chartrand. Numerical differentiation of noisy, nonsmooth data. ISRN Applied Mathematics, 2011, 2011.Google Scholar
Cristian Estan, Stefan Savage, and George Varghese. Automatically inferring patterns of resource consumption in network traffic. In Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications, pages 137--148. ACM, 2003. Google ScholarDigital Library
Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd, volume 96, pages 226--231, 1996. Google ScholarDigital Library
Pasi Franti, Olli Virmajoki, and Ville Hautamaki. Fast agglomerative clustering using a k-nearest neighbor graph. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(11):1875--1881, 2006. Google ScholarDigital Library
Abdun Naser Mahmood, Christopher Leckie, and Parampalli Udaya. An efficient clustering scheme to exploit hierarchical data in network traffic analysis. Knowledge and Data Engineering, IEEE Transactions on, 20(6):752--767, 2008. Google ScholarDigital Library
Antonio Nucci, Ashwin Sridharan, and Nina Taft. The problem of synthetically generatingip traffic matrices: initial recommendations. ACM SIGCOMM Computer Communication Review, 35(3):19--32, 2005. Google ScholarDigital Library
Matthew Roughan. Simplifying the synthesis of internet traffic matrices. ACM SIGCOMM Computer Communication Review, 35(5):93--96, 2005. Google ScholarDigital Library
Juha Vesanto and Esa Alhoniemi. Clustering of the self-organizing map. IEEE Transactions on neural networks, 11(3):586--600, 2000. Google ScholarDigital Library
Jisheng Wang, David J Miller, and George Kesidis. Efficient mining of the multidimensional traffic cluster hierarchy for digesting, visualization, and anomaly identification. Selected Areas in Communications, IEEE Journal on, 24(10):1929--1941, 2006. Google ScholarDigital Library
Yin Zhang, Sumeet Singh, Subhabrata Sen, Nick Duffield, and Carsten Lund. Online identification of hierarchical heavy hitters: algorithms, evaluation, and applications. In Proceedings of the 4th ACM SIGCOMM conference on Internet measurement, pages 101--114. ACM, 2004. Google ScholarDigital Library

Index Terms

Hierarchical IP flow clustering
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis
2. Networks
  1. Network performance evaluation
    1. Network measurement
    2. Network performance analysis

Recommendations

Hierarchical IP flow clustering

The analysis of flow traces can help to understand a network's usage patterns. We present a hierarchical clustering algorithm for network flow data that can summarize terabytes of IP traffic into a parsimonious tree model. The method automatically finds ...
Read More
Hierarchical Means Clustering
Abstract
In the cluster analysis literature, there are several partitioning (non-hierarchical) methods for clustering multivariate objects based on model estimation. Distinct to these methods is the use of a system of n nested statistical models and the ...
Read More
Semi-supervised Hierarchical Clustering
ICDM '11: Proceedings of the 2011 IEEE 11th International Conference on Data Mining

Semi-supervised clustering (i.e., clustering with knowledge-based constraints) has emerged as an important variant of the traditional clustering paradigms. However, most existing semi-supervised clustering algorithms are designed for partitional ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

Big-DAMA '17: Proceedings of the Workshop on Big Data Analytics and Machine Learning for Data Communication Networks
August 2017
58 pages
ISBN:9781450350549
DOI:10.1145/3098593

Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 August 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Flow clustering
Hierarchical clustering
NetFlow
Unsupervised Machine Learning
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate7of11submissions,64%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 616
  Total Downloads
- Downloads (Last 12 months)31
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Hierarchical IP flow clustering

Big-DAMA '17: Proceedings of the Workshop on Big Data Analytics and Machine Learning for Data Communication Networks

ABSTRACT

References

Cited By

Index Terms

Recommendations

Hierarchical IP flow clustering

Hierarchical Means Clustering

Semi-supervised Hierarchical Clustering

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Hierarchical IP flow clustering

Big-DAMA '17: Proceedings of the Workshop on Big Data Analytics and Machine Learning for Data Communication Networks

ABSTRACT

References

Cited By

Index Terms

Recommendations

Hierarchical IP flow clustering

Hierarchical Means Clustering

Semi-supervised Hierarchical Clustering

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media