skip to main content
10.1145/1162678.1162679acmotherconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
Article
Free access

Traffic classification using clustering algorithms

Published: 11 September 2006 Publication History

Abstract

Classification of network traffic using port-based or payload-based analysis is becoming increasingly difficult with many peer-to-peer (P2P) applications using dynamic port numbers, masquerading techniques, and encryption to avoid detection. An alternative approach is to classify traffic by exploiting the distinctive characteristics of applications when they communicate on a network. We pursue this latter approach and demonstrate how cluster analysis can be used to effectively identify groups of traffic that are similar using only transport layer statistics. Our work considers two unsupervised clustering algorithms, namely K-Means and DBSCAN, that have previously not been used for network traffic classification. We evaluate these two algorithms and compare them to the previously used AutoClass algorithm, using empirical Internet traces. The experimental results show that both K-Means and DBSCAN work very well and much more quickly then AutoClass. Our results indicate that although DBSCAN has lower accuracy compared to K-Means and AutoClass, DBSCAN produces better clusters.

References

[1]
P. Cheeseman and J. Strutz. Bayesian Classification (AutoClass): Theory and Results. In Advances in Knowledge Discovery and Data Mining, AAI/MIT Press, USA, 1996.
[2]
A. P. Dempster, N. M. Paird, and D. B. Rubin. Maximum likelihood from incomeplete data via the EM algorithm. Journal of the Royal Statistical Society, 39(1): 1--38, 1977.
[3]
C. Dews, A. Wichmann, and A. Feldmann. An analysis of internet chat systems. In IMC'03, Miami Beach, USA, Oct 27--29, 2003.
[4]
M. B. Eisen, P. T. Spellman, P. O. Brown, and D. Botstein. Cluster Analysis and Display of Genome-wide Expression Patterns. Genetics, 95(1): 14863--15868, 1998.
[5]
M. Ester, H. Kriegel, J. Sander, and X. Xu. A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD 96), Portland, USA, 1996.
[6]
P. Haffner, S. Sen, O. Spatscheck, and D. Wang. ACAS: Automated Construction of Application Signatures. In SIGCOMM'05 MineNet Workshop, Philadelphia, USA, August 22--26, 2005.
[7]
A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs, USA, 1988.
[8]
T. Karagiannis, A. Broido, M. Faloutsos, and K. claffy. Transport Layer Identification of P2P Traffic. In IMC'04, Taormina, Italy, October 25--27, 2004.
[9]
T. Karagiannis, K. Papagiannaki, and M. Faloutsos. BLINK: Multilevel Traffic Classification in the Dark. In SIGCOMM'05, Philadelphia, USA, August 21--26, 2005.
[10]
A. McGregor, M. Hall, P. Lorier, and J. Brunskill. Flow Clustering Using Machine Learning Techniques. In PAM 2004, Antibes Juan-les-Pins, France, April 19--20, 2004.
[11]
A. W. Moore and K. Papagiannaki. Toward the Accurate Identification of Network Applications. In PAM 2005, Boston, USA, March 31-April 1, 2005.
[12]
A. W. Moore and D. Zuev. Internet Traffic Classification Using Bayesian Analysis Techniques. In SIGMETRIC'05, Banff, Canada, June 6--10, 2005.
[13]
V. Paxson. Empirically-Derived Analytic Models of Wide-Area TCP Connections. IEEE/ACM Transactions on Networking, 2(4): 316--336, August 1998.
[14]
M. Roughan, S. Sen, O. Spatscheck, and N. Duffield. Class-of-Service Mapping for QoS: A Statistical Signature-based Approach to IP Traffic Classification. In IMC'04, Taormina, Italy, October 25--27, 2004.
[15]
S. Sen, O. Spatscheck, and D. Wang. Accurate, Scalable In-Network Identification of P2P Traffic Using Application Signatures. In WWW2005, New York, USA, May 17--22, 2004.
[16]
I. H. Witten and E. Frank. (2005) Data Mining: Pratical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco, 2nd edition, 2005.
[17]
S. Zander, T. Nguyen, and G. Armitage. Automated Traffic Classification and Application Identification using Machine Learning. In LCN'05, Sydney, Australia, Nov 15--17, 2005.

Cited By

View all
  • (2024)Multi-Source Data-Driven Local-Global Dynamic Multi-Graph Convolutional Network for Bike-Sharing Demands PredictionAlgorithms10.3390/a1709038417:9(384)Online publication date: 1-Sep-2024
  • (2024)A DDPG-Based Zero-Touch Dynamic Prioritization to Address Starvation of Services for Deploying Microservices-Based VNFsIEEE Transactions on Machine Learning in Communications and Networking10.1109/TMLCN.2024.33861522(526-545)Online publication date: 2024
  • (2024)Characterizing Encrypted Application Traffic Through Cellular Radio Interface Protocol2024 IEEE 21st International Conference on Mobile Ad-Hoc and Smart Systems (MASS)10.1109/MASS62177.2024.00050(321-329)Online publication date: 23-Sep-2024
  • Show More Cited By

Index Terms

  1. Traffic classification using clustering algorithms

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    MineNet '06: Proceedings of the 2006 SIGCOMM workshop on Mining network data
    September 2006
    66 pages
    ISBN:159593569X
    DOI:10.1145/1162678
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 September 2006

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. classification
    2. machine learning
    3. unsupervised clustering

    Qualifiers

    • Article

    Conference

    SIGCOMM06
    SIGCOMM06: ACM SIGCOMM 2006 Conference
    September 11 - 15, 2006
    Pisa, Italy

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)444
    • Downloads (Last 6 weeks)42
    Reflects downloads up to 20 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Multi-Source Data-Driven Local-Global Dynamic Multi-Graph Convolutional Network for Bike-Sharing Demands PredictionAlgorithms10.3390/a1709038417:9(384)Online publication date: 1-Sep-2024
    • (2024)A DDPG-Based Zero-Touch Dynamic Prioritization to Address Starvation of Services for Deploying Microservices-Based VNFsIEEE Transactions on Machine Learning in Communications and Networking10.1109/TMLCN.2024.33861522(526-545)Online publication date: 2024
    • (2024)Characterizing Encrypted Application Traffic Through Cellular Radio Interface Protocol2024 IEEE 21st International Conference on Mobile Ad-Hoc and Smart Systems (MASS)10.1109/MASS62177.2024.00050(321-329)Online publication date: 23-Sep-2024
    • (2024)A Survey of Encrypted Traffic Classification: Datasets, Representation, Approaches and Future Thinking2024 IEEE/ACIS 24th International Conference on Computer and Information Science (ICIS)10.1109/ICIS61260.2024.10778376(113-120)Online publication date: 20-Sep-2024
    • (2024)Automated Hyperparameter Tuning and Ensemble Machine Learning Approach for Network Traffic Classification2024 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)10.1109/BMSB62888.2024.10608236(1-6)Online publication date: 19-Jun-2024
    • (2024)CM-UTC: A Cost-sensitive Matrix based Method for Unknown Encrypted Traffic ClassificationThe Computer Journal10.1093/comjnl/bxae01767:7(2441-2452)Online publication date: 26-Feb-2024
    • (2024)STI: A self-evolutive traffic identification system for unknown applications based on improved random forestComputer Communications10.1016/j.comcom.2024.02.010219(64-75)Online publication date: Apr-2024
    • (2024)A comprehensive review of clustering techniques in artificial intelligence for knowledge discovery: Taxonomy, challenges, applications and future prospectsAdvanced Engineering Informatics10.1016/j.aei.2024.10279962(102799)Online publication date: Oct-2024
    • (2024)A dynamic test scenario generation method for autonomous vehicles based on conditional generative adversarial imitation learningAccident Analysis & Prevention10.1016/j.aap.2023.107279194(107279)Online publication date: Jan-2024
    • (2024)Incremental federated learning for traffic flow classification in heterogeneous data scenariosNeural Computing and Applications10.1007/s00521-024-10281-436:32(20401-20424)Online publication date: 12-Aug-2024
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media