ABSTRACT
The analysis of flow traces can help to understand a network's usage patterns. We present a hierarchical clustering algorithm for network flow data that can summarize terabytes of IP traffic into a parsimonious tree model. The method automatically finds an appropriate scale of aggregation so that each cluster represents a local maximum of the traffic density from a block of source addresses to a block of destination addresses. We apply this clustering method on NetFlow data from an enterprise network, find the largest traffic clusters, and analyze their stationarity across time. The existence of heavy-volume clusters that persist over long time scales can help network operators to perform usage-based accounting, capacity provisioning and traffic engineering. Also, changes in the layout of hierarchical clusters can facilitate the detection of anomalies and significant changes in the network workload.
- Deepak Agarwal, Dhiman Barman, Dimitrios Gunopulos, Neal E Young, Flip Korn, and Divesh Srivastava. Efficient and effective explanation of change in hierarchical summaries. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 6--15. ACM, 2007. Google ScholarDigital Library
- Jon Louis Bentley. Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9):509--517, 1975. Google ScholarDigital Library
- Rick Chartrand. Numerical differentiation of noisy, nonsmooth data. ISRN Applied Mathematics, 2011, 2011.Google Scholar
- Cristian Estan, Stefan Savage, and George Varghese. Automatically inferring patterns of resource consumption in network traffic. In Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications, pages 137--148. ACM, 2003. Google ScholarDigital Library
- Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd, volume 96, pages 226--231, 1996. Google ScholarDigital Library
- Pasi Franti, Olli Virmajoki, and Ville Hautamaki. Fast agglomerative clustering using a k-nearest neighbor graph. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(11):1875--1881, 2006. Google ScholarDigital Library
- Abdun Naser Mahmood, Christopher Leckie, and Parampalli Udaya. An efficient clustering scheme to exploit hierarchical data in network traffic analysis. Knowledge and Data Engineering, IEEE Transactions on, 20(6):752--767, 2008. Google ScholarDigital Library
- Antonio Nucci, Ashwin Sridharan, and Nina Taft. The problem of synthetically generatingip traffic matrices: initial recommendations. ACM SIGCOMM Computer Communication Review, 35(3):19--32, 2005. Google ScholarDigital Library
- Matthew Roughan. Simplifying the synthesis of internet traffic matrices. ACM SIGCOMM Computer Communication Review, 35(5):93--96, 2005. Google ScholarDigital Library
- Juha Vesanto and Esa Alhoniemi. Clustering of the self-organizing map. IEEE Transactions on neural networks, 11(3):586--600, 2000. Google ScholarDigital Library
- Jisheng Wang, David J Miller, and George Kesidis. Efficient mining of the multidimensional traffic cluster hierarchy for digesting, visualization, and anomaly identification. Selected Areas in Communications, IEEE Journal on, 24(10):1929--1941, 2006. Google ScholarDigital Library
- Yin Zhang, Sumeet Singh, Subhabrata Sen, Nick Duffield, and Carsten Lund. Online identification of hierarchical heavy hitters: algorithms, evaluation, and applications. In Proceedings of the 4th ACM SIGCOMM conference on Internet measurement, pages 101--114. ACM, 2004. Google ScholarDigital Library
Index Terms
- Hierarchical IP flow clustering
Recommendations
Hierarchical IP flow clustering
The analysis of flow traces can help to understand a network's usage patterns. We present a hierarchical clustering algorithm for network flow data that can summarize terabytes of IP traffic into a parsimonious tree model. The method automatically finds ...
Hierarchical Means Clustering
AbstractIn the cluster analysis literature, there are several partitioning (non-hierarchical) methods for clustering multivariate objects based on model estimation. Distinct to these methods is the use of a system of n nested statistical models and the ...
Semi-supervised Hierarchical Clustering
ICDM '11: Proceedings of the 2011 IEEE 11th International Conference on Data MiningSemi-supervised clustering (i.e., clustering with knowledge-based constraints) has emerged as an important variant of the traditional clustering paradigms. However, most existing semi-supervised clustering algorithms are designed for partitional ...
Comments