Skip to main content
Log in

Effectiveness of Statistical Features for Early Stage Internet Traffic Identification

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Identifying network traffic at their early stages accurately is very important for the application of traffic identification. In recent years, more and more studies have tried to build effective machine learning models to identify traffic with the few packets at the early stage. Packet sizes and statistical features have been proved to be effective features which are widely used in early stage traffic identification. However, an important issue is still unconcerned, that is whether there exists essential effectiveness differences between the two kinds of features. In this paper, we set out to evaluate the effectiveness of statistical features in comparing with packet sizes. We firstly extract the packet sizes and their statistical features of the first six packets on three traffic data sets. Then the mutual information between each feature and the corresponding traffic type label is computed to show the effectiveness of the feature. And then we execute crossover identification experiments with different feature sets using ten well-known machine learning classifiers. Our experimental results show that most classifiers get almost the same performances using packet sizes and statistical features for early stage traffic identification. And most classifiers can achieve high identification accuracies using only two statistical features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Bernaille, L., Teixeira, R., Akodkenou, I., Soule, A., Salamatian, K.: Traffic classification on the fly. In: ACM SIGCOMM’06, pp. 23–26 (2006)

  2. Bahl, L.B., de Souza, P., Mercer, R.P., et al.: Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’86), pp. 49–52, IEEE Press (1986)

  3. Breiman, L.: Bagging predictors. Mac. Learn. 24, 123–140 (1996)

    MathSciNet  MATH  Google Scholar 

  4. Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)

    Article  MATH  Google Scholar 

  5. Dainotti, A., Pescapé, A., Claffy, K.C.: Issues and future directions in traffic classification. IEEE Netw. 26(1), 35–40 (2012)

    Article  Google Scholar 

  6. Dainotti, A., Pescapé, A., Sansone, C.: Early classification of network traffic through multi-classification. Lect. Notes Comput. Sci. 6613, 122–135 (2011)

    Article  Google Scholar 

  7. Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn. 29, 103–137 (1997)

    Article  MATH  Google Scholar 

  8. Estan, C., Varghese, G.: New directions in traffic measurement and accounting: focusing on the elephants, ignoring the mice. ACM Trans. Comput. Syst. 21(3), 270–313 (2003)

    Article  Google Scholar 

  9. Este, A., Gringoli, F., Salgarelli, L.: On the stability of the information carried by traffic flow features at the packet level. In: ACM SIGCOMM’09, pp. 13–18 (2009)

  10. Este, A., Gringoli, F., Salgarelli, L.: Support vector machines for TCP traffic classification. Comput. Netw. 53, 2476–2490 (2009)

    Article  MATH  Google Scholar 

  11. Frank, E., Witten, I.H.: Generating accurate rule sets without global optimization. In: The Fifteenth International Conference on Machine Learning, pp. 144–151. IEEE Press (1998)

  12. Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Mach. Learn. 29(2–3), 131–163 (1997)

    Article  MATH  Google Scholar 

  13. Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  14. Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Mach. Learn. 11(1), 63–90 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  15. Huang, N., Jai, G., Chao, H.: Early identifying application traffic with application characteristics. In: IEEE International Conference on Communications (ICC’08). pp. 5788–5792 (2008)

  16. Huang, N., Jai, G., Chao, H., et al.: Application traffic classification at the early stage by characterizing application rounds. Inf. Sci. 232(20), 130–142 (2013)

    Article  Google Scholar 

  17. Hullár, B., Laki, S., Gyorgy, A.: Early identification of peer-to-peer traffic. In: 2011 IEEE International Conference on Communications (ICC), pp. 1–6. IEEE Press (2011)

  18. Gringoli, F., Salgarelli, L., Dusi, M., et al.: Gt: picking up the truth from the ground for internet traffic. ACM SIGCOMM Comput. Commun. Rev. 39(5), 12–18 (2009)

    Article  Google Scholar 

  19. Kohavi, R.: Scaling up the accuracy of Naive-Bayes classifiers: a decision-tree hybrid. In: The Second International Conference on Knowledge Discovery and Data Mining (KDD), pp. 202–207. IEEE Press (1996)

  20. Li, W., Moore, A.W.: A machine learning approach for efficient traffic classification. In: Proceedings of IEEE MASCOTS’07, pp. 310–317 (2007)

  21. Maes, F., Collignon, A., Vandermeulen, D., et al.: Multimodality image registration by maximization of mutual information. IEEE Trans. Med. Imaging 16(2), 187–198 (1997)

    Article  Google Scholar 

  22. Maron, M.E.: Automatic indexing: an experimental inquiry. J. ACM 8(3), 404–417 (1961)

    Article  MATH  Google Scholar 

  23. Moore, A.W., Zuev, D., Crogan, M.: Discriminators for use in flow-based classification. Intel Research Tech. Rep (2005)

  24. Moore, A.W., Zuev, D.: Internet traffic classification using Bayesian analysis techniques. In: ACM SIGMETRICS’05, pp. 50–60 (2005)

  25. Nguyen, T.T.T., Armitage, G.: A survey of techniques for internet traffic classification using machine learning. IEEE Commun. Surv. Tutor. 10(4), 56–76 (2008)

    Article  Google Scholar 

  26. Nguyen, T.T.T., Armitage, G., Branch, P., et al.: Timely and continuous machine-learning-based classification for interactive IP traffic. IEEE/ACM Trans. Netw. 20(6), 1880–1894 (2012)

    Article  Google Scholar 

  27. Peng, H.: Mutual infomation Matlab toolbox, http://www.mathworks.com/matlabcentral/fileexchange/14888-mutual-information-computation

  28. Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)

    Article  Google Scholar 

  29. Peng, L., Zhang, H., Yang, B., et al.: Traffic labeller: collecting internet traffic samples with accurate application information. China Commun. 11(1), 67–78 (2014)

    Article  MathSciNet  Google Scholar 

  30. Qu, B., Zhang, Z., Guo, L., et al.: On accuracy of early traffic classification. In: IEEE 7th International Conference on Networking, Architecture and Storage (NAS), pp. 348–354. IEEE Press (2012)

  31. Quinlan, J.: C4.5: Programs for Machine Learning. Morgan Kauffman, Los Altos (1993)

    Google Scholar 

  32. Rizzi, A., Colabrese, S., Baiocchi, A.: Low complexity, high performance neuro-fuzzy system for internet traffic flows early classification. In: 2013 9th International Wireless Communications and Mobile Computing Conference (IWCMC), pp. 77–82. IEEE Press (2013)

  33. Svetnik, V., Liaw, A., Tong, C., et al.: Random forest: a classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 43(6), 1947–1958 (2003)

    Article  Google Scholar 

  34. Tcpdump/Libpcap. http://www.tcpdump.org

  35. UNIBS: Data sharing. http://www.ing.unibs.it/ntw/tools/traces/

  36. Waikato Internet Traffic Storage (WITS). http://www.wand.net.nz/wits

  37. Weka 3: Data Mining Software in Java, http://www.cs.waikato.ac.nz/ml/weka/

  38. Zander, S., Nguyen, T.T.T., Armitage, G.: Automated traffic classification and application identification using machine learning. In: IEEE Conference on Local Computer Networks 30th Anniversary, IEEE Press (2005)

Download references

Acknowledgments

This research was partially supported by the National Natural Science Foundation of China under Grant Nos. 61472164, 61173078, 61203105, 61173079, and 61373054, the Provincial Natural Science Foundation of Shandong under Grant Nos. ZR2012FM010, ZR2011FZ001, ZR2013FL002 and ZR2012FQ016.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Yang.

Appendix: Detailed Results of the Experimental Study

Appendix: Detailed Results of the Experimental Study

See Tables 3, 4, 5 and 6.

Table 3 Mutual information of all features (the best performed one of each column is shown in bold)
Table 4 Accuracy results for the Auckland II data set (the best performed one of each row is shown in bold)
Table 5 Accuracy results for the UNIBS data set (the best performed one of each row is shown in bold)
Table 6 Accuracy results for the UJN data set (the best performed one of each row is shown in bold)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Peng, L., Yang, B., Chen, Y. et al. Effectiveness of Statistical Features for Early Stage Internet Traffic Identification. Int J Parallel Prog 44, 181–197 (2016). https://doi.org/10.1007/s10766-014-0337-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-014-0337-2

Keywords

Navigation