Skip to main content

On Addressing the Imbalance Problem: A Correlated KNN Approach for Network Traffic Classification

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 8792))

Abstract

With the arrival of big data era, the Internet traffic is growing exponentially. A wide variety of applications arise on the Internet and traffic classification is introduced to help people manage the massive applications on the Internet for security monitoring and quality of service purposes. A large number of Machine Learning (ML) algorithms are introduced to deal with traffic classification. A significant challenge to the classification performance comes from imbalanced distribution of data in traffic classification system. In this paper, we proposed an Optimised Distance-based Nearest Neighbor (ODNN), which has the capability of improving the classification performance of imbalanced traffic data. We analyzed the proposed ODNN approach and its performance benefit from both theoretical and empirical perspectives. A large number of experiments were implemented on the real-world traffic dataset. The results show that the performance of “small classes” can be improved significantly even only with small number of training data and the performance of “large classes” remains stable.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Auld, T., Moore, A., Gull, S.: Bayesian neural networks for internet traffic classification. IEEE Transactions on Neural Networks 18(1), 223–239 (2007)

    Article  Google Scholar 

  2. Barandela, R., Sánchez, J.S., García, V., Rangel, E.: Strategies for learning in class imbalance problems. Pattern Recognition 36(3), 849–851 (2003)

    Article  Google Scholar 

  3. Bermolen, P., Mellia, M., Meo, M., Rossi, D., Valenti, S.: Abacus: Accurate behavioral classification of p2p-tv traffic. Computer Networks 55(6), 1394–1411 (2011)

    Article  Google Scholar 

  4. Bernaille, L., Teixeira, R., Akodkenou, I., Soule, A., Salamatian, K.: Traffic classification on the fly. SIGCOMM Comput. Commun. Rev. 36(2), 23–26 (2006)

    Article  Google Scholar 

  5. Callado, A., Kelner, J., Sadok, D., Alberto Kamienski, C., Fernandes, S.: Better network traffic identification through the independent combination of techniques. Journal of Network and Computer Applications 33(4), 433–446 (2010)

    Article  Google Scholar 

  6. Carela-Español, V., Barlet-Ros, P., Cabellos-Aparicio, A., Solé-Pareta, J.: Analysis of the impact of sampling on netflow traffic classification. Computer Networks 55(5), 1083–1099 (2011)

    Article  Google Scholar 

  7. Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. ACM Sigkdd Explorations Newsletter 6(1), 1–6 (2004)

    Article  Google Scholar 

  8. Finamore, A., Mellia, M., Meo, M.: Mining unclassified traffic using automatic clustering techniques. In: Domingo-Pascual, J., Shavitt, Y., Uhlig, S. (eds.) TMA 2011. LNCS, vol. 6613, pp. 150–163. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  9. Glatz, E., Dimitropoulos, X.: Classifying internet one-way traffic. In: Proceedings of the 2012 ACM Conference on Internet Measurement Conference, pp. 37–50. ACM (2012)

    Google Scholar 

  10. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering 21(9), 1263–1284 (2009)

    Article  Google Scholar 

  11. Hullár, B., Laki, S., Gyorgy, A.: Early identification of peer-to-peer traffic. In: 2011 IEEE International Conference on Communications (ICC), pp. 1–6. IEEE (2011)

    Google Scholar 

  12. Jin, Y., Duffield, N., Erman, J., Haffner, P., Sen, S., Zhang, Z.L.: A modular machine learning system for flow-level traffic classification in large networks. ACM Transactions on Knowledge Discovery from Data (TKDD) 6(1), 4 (2012)

    Article  Google Scholar 

  13. Joshi, M.V., Kumar, V., Agarwal, R.C.: Evaluating boosting algorithms to classify rare classes: Comparison and improvements. In: Proceedings of the IEEE International Conference on Data Mining, ICDM 2001, pp. 257–264. IEEE (2001)

    Google Scholar 

  14. Karagiannis, T., Papagiannaki, K., Faloutsos, M.: Blinc: multilevel traffic classification in the dark. ACM SIGCOMM Computer Communication Review 35, 229–240 (2005)

    Article  Google Scholar 

  15. Moore, A.W., Zuev, D.: Internet traffic classification using bayesian analysis techniques. SIGMETRICS Perform. Eval. Rev. 33(1), 50–60 (2005)

    Article  Google Scholar 

  16. Nguyen, T.T., Armitage, G., Branch, P., Zander, S.: Timely and continuous machine-learning-based classification for interactive ip traffic. IEEE/ACM Transactions on Networking (TON) 20(6), 1880–1894 (2012)

    Article  Google Scholar 

  17. Nguyen, T.T., Armitage, G.: A survey of techniques for internet traffic classification using machine learning. IEEE Communications Surveys Tutorials 10(4), 56–76 (2008)

    Article  Google Scholar 

  18. Papagiannaki, K., Taft, N., Bhattacharyya, S., Thiran, P., Salamatian, K., Diot, C.: A pragmatic definition of elephants in internet backbone traffic. In: Proceedings of the 2nd ACM SIGCOMM Workshop on Internet Measurment, pp. 175–176. ACM (2002)

    Google Scholar 

  19. Wang, Y., Xiang, Y., Yu, S.Z.: An automatic application signature construction system for unknown traffic. Concurrency and Computation: Practice and Experience 22(13), 1927–1944 (2010)

    Article  Google Scholar 

  20. Wang, Y., Xiang, Y., Zhang, J., Yu, S.: Internet traffic clustering with constraints. In: 2012 8th International Wireless Communications and Mobile Computing Conference (IWCMC), pp. 619–624. IEEE (2012)

    Google Scholar 

  21. Wu, G., Chang, E.Y.: Class-boundary alignment for imbalanced dataset learning. In: ICML 2003 Workshop on Learning from Imbalanced Data Sets II, Washington, DC, pp. 49–56 (2003)

    Google Scholar 

  22. Zander, S., Nguyen, T., Armitage, G.: Automated traffic classification and application identification using machine learning. In: The IEEE Conference on Local Computer Networks 30th Anniversary, pp. 250–257 (November 2005)

    Google Scholar 

  23. Zhang, J., Xiang, Y., Wang, Y., Zhou, W., Xiang, Y., Guan, Y.: Network traffic classification using correlation information. IEEE Transactions on Parallel and Distributed Systems 24(1), 104–117 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Wu, D., Chen, X., Chen, C., Zhang, J., Xiang, Y., Zhou, W. (2014). On Addressing the Imbalance Problem: A Correlated KNN Approach for Network Traffic Classification. In: Au, M.H., Carminati, B., Kuo, CC.J. (eds) Network and System Security. NSS 2015. Lecture Notes in Computer Science, vol 8792. Springer, Cham. https://doi.org/10.1007/978-3-319-11698-3_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11698-3_11

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11697-6

  • Online ISBN: 978-3-319-11698-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics