Skip to main content
Log in

Network traffic classification based on ensemble learning and co-training

  • Published:
Science in China Series F: Information Sciences Aims and scope Submit manuscript

Abstract

Classification of network traffic is the essential step for many network researches. However, with the rapid evolution of Internet applications the effectiveness of the port-based or payload-based identification approaches has been greatly diminished in recent years. And many researchers begin to turn their attentions to an alternative machine learning based method. This paper presents a novel machine learning-based classification model, which combines ensemble learning paradigm with co-training techniques. Compared to previous approaches, most of which only employed single classifier, multiple classifiers and semi-supervised learning are applied in our method and it mainly helps to overcome three shortcomings: limited flow accuracy rate, weak adaptability and huge demand of labeled training set. In this paper, statistical characteristics of IP flows are extracted from the packet level traces to establish the feature set, then the classification model is created and tested and the empirical results prove its feasibility and effectiveness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Karagiannis T, Konstantina, Papagiannaki. BLINC: Multilevel traffic classification in the dark. In: SIGCOMM’05, Philadelphia, USA, 2005. 229–240

  2. Karagiannis T, Broido A, Faloutsos M. Transport layer identification of P2P traffic. In: IMC’04, Taormina, Sicily, Italy, 2004. 121–134

  3. Sen A, Spatscheck O, Wang D. Accurate, scalable in-network identification of P2P traffic using application signatures. In: www’04, New York, USA, 2004. 512–521

  4. Haffner P, Sen S, Spatscheck O. ACAS: Automated construction of application signatures. In: SIGCOMM’05, Pennsylvania, USA, 2005. 197–202

  5. McGregor A, Hall M, Lorier P, et al. Flow clustering using machine learning techniques. In: PAM 2004. Antibes Juanles-Pins, France, April 2004

  6. Zander A, Nguyen T, Armitage G. Automated traffic classification and application identification using machine learning. In: LCN 2005, Sydney, Australia, Nov. 2005. 250–257

  7. Erman J, Mahanti A, Arlitt M. Identifying and discriminating between web and peer to peer traffic in the network core. In: www’07, Banff, Alberta, Canada, 2007

  8. Bernaille L, Teixeira R, Akodkenou I. Traffic classification on the fly. ACM SIGCOMM Comput Commun Review, 2004, 36(2): 23–26

    Article  Google Scholar 

  9. Moore A W, Zuev D. Internet traffic classification using Bayesian analysis techniques. In: ACM SIGMETRICS 2005, Banff, Alberta, Canada, June 2005. 50–60

  10. Park J, Tyan H -R, Kuo C -C J. Internet traffic classification for scalable QoS provision. In: 2006 IEEE International Conference on Multimedia and Expo. Toronto, Ontario, Canada, July 2006. 1221–1224

  11. Nguyen T, Armitage G. Training on multiple sub-flows to optimize the use of Machine Learning classifiers in real-world IP networks. In: LCN 2006, Tampa, Florida, USA, Nov. 2006. 369–376

  12. Bonfiglio D, Mellia M, Meo M, et al. Revealing Skype traffic: when randomness plays with you. In: SIGCOMM’07. New York, NY, USA, August 2007. 37–38

  13. Auld T, Moore A W, Gull S F. Bayesian neural networks for Internet traffic classification. IEEE Trans Neural Netw, 2007, 18(1): 223–239

    Article  Google Scholar 

  14. Dietterich T G. Ensemble learning. The Handbook of Brain Theory and Neural Networks. 2nd ed. Cambridge, MA: MIT Press, 2002

    Google Scholar 

  15. Blum A, Mitchell T. Combining labeled and unlabeled data with co-training. In: The Eleventh Annual Conference on Computational Learning Theory. Madison, Wisconsin, USA, 1998. 92–100

  16. Li M, Zhou Z. Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Trans Syst, Man and Cybernet — Part A, 2007, 37(6): 1088–1098

    Article  Google Scholar 

  17. Breiman L. Bagging predictors. Machine Learning, 1996, 24(2): 123–140

    MATH  MathSciNet  Google Scholar 

  18. Parmanto B, Munro P W, Doyle H R. Improving committee diagnosis with resampling techniques. Adv Neural Inf Process Syst, 1996, 8: 882–888

    Google Scholar 

  19. Ribeiro V J, Zhang Z -L, Moon S. Small-time scaling behavior of Internet backbone traffic. Int J Comput Telecommun Netw, 2005, 48(3): 315–334

    Google Scholar 

  20. Lan K C, Heidemann J. A measurement study of correlations of Internet flow characteristics. Int J Comput Telecommun Netw, 2006, 50(1): 46–42

    Google Scholar 

  21. Zhou Z -H, Wu J, Tang W. Ensembling neural networks: many could be better than all. Artif Intell, 2002, 137(1–2): 239–263

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to JianMin Wang.

Additional information

Supported by the National Natural Science Foundation of China (Grant Nos. 60525213 and 60776096), the National Basic Research Program of China (Grant No. 2006CB303106), the National High-Tech Research & Development Program of China (Grant Nos. 2007AA01Z236 and 2007AA01Z449), the Joint Funds of NSFC-Guangdong (Grant No. U0735001), and the National Project of Scientific and Technical Supporting Programs (Grant No. 2007BAH13B01)

Rights and permissions

Reprints and permissions

About this article

Cite this article

He, H., Luo, X., Ma, F. et al. Network traffic classification based on ensemble learning and co-training. Sci. China Ser. F-Inf. Sci. 52, 338–346 (2009). https://doi.org/10.1007/s11432-009-0050-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11432-009-0050-8

Keywords

Navigation