Effectiveness of Statistical Features for Early Stage Internet Traffic Identification

Peng, Lizhi; Yang, Bo; Chen, Yuehui; Chen, Zhenxiang

doi:10.1007/s10766-014-0337-2

Effectiveness of Statistical Features for Early Stage Internet Traffic Identification

Published: 18 January 2015

Volume 44, pages 181–197, (2016)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Lizhi Peng¹,
Bo Yang¹,
Yuehui Chen¹ &
…
Zhenxiang Chen¹

789 Accesses
33 Citations
Explore all metrics

Abstract

Identifying network traffic at their early stages accurately is very important for the application of traffic identification. In recent years, more and more studies have tried to build effective machine learning models to identify traffic with the few packets at the early stage. Packet sizes and statistical features have been proved to be effective features which are widely used in early stage traffic identification. However, an important issue is still unconcerned, that is whether there exists essential effectiveness differences between the two kinds of features. In this paper, we set out to evaluate the effectiveness of statistical features in comparing with packet sizes. We firstly extract the packet sizes and their statistical features of the first six packets on three traffic data sets. Then the mutual information between each feature and the corresponding traffic type label is computed to show the effectiveness of the feature. And then we execute crossover identification experiments with different feature sets using ten well-known machine learning classifiers. Our experimental results show that most classifiers get almost the same performances using packet sizes and statistical features for early stage traffic identification. And most classifiers can achieve high identification accuracies using only two statistical features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature Evaluation for Early Stage Internet Traffic Identification

Traffic Identification in Big Internet Data

Flexible neural trees based early stage identification for IP traffic

Article 26 October 2015

References

Bernaille, L., Teixeira, R., Akodkenou, I., Soule, A., Salamatian, K.: Traffic classification on the fly. In: ACM SIGCOMM’06, pp. 23–26 (2006)
Bahl, L.B., de Souza, P., Mercer, R.P., et al.: Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’86), pp. 49–52, IEEE Press (1986)
Breiman, L.: Bagging predictors. Mac. Learn. 24, 123–140 (1996)
MathSciNet MATH Google Scholar
Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)
Article MATH Google Scholar
Dainotti, A., Pescapé, A., Claffy, K.C.: Issues and future directions in traffic classification. IEEE Netw. 26(1), 35–40 (2012)
Article Google Scholar
Dainotti, A., Pescapé, A., Sansone, C.: Early classification of network traffic through multi-classification. Lect. Notes Comput. Sci. 6613, 122–135 (2011)
Article Google Scholar
Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn. 29, 103–137 (1997)
Article MATH Google Scholar
Estan, C., Varghese, G.: New directions in traffic measurement and accounting: focusing on the elephants, ignoring the mice. ACM Trans. Comput. Syst. 21(3), 270–313 (2003)
Article Google Scholar
Este, A., Gringoli, F., Salgarelli, L.: On the stability of the information carried by traffic flow features at the packet level. In: ACM SIGCOMM’09, pp. 13–18 (2009)
Este, A., Gringoli, F., Salgarelli, L.: Support vector machines for TCP traffic classification. Comput. Netw. 53, 2476–2490 (2009)
Article MATH Google Scholar
Frank, E., Witten, I.H.: Generating accurate rule sets without global optimization. In: The Fifteenth International Conference on Machine Learning, pp. 144–151. IEEE Press (1998)
Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Mach. Learn. 29(2–3), 131–163 (1997)
Article MATH Google Scholar
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)
Article MathSciNet MATH Google Scholar
Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Mach. Learn. 11(1), 63–90 (1993)
Article MathSciNet MATH Google Scholar
Huang, N., Jai, G., Chao, H.: Early identifying application traffic with application characteristics. In: IEEE International Conference on Communications (ICC’08). pp. 5788–5792 (2008)
Huang, N., Jai, G., Chao, H., et al.: Application traffic classification at the early stage by characterizing application rounds. Inf. Sci. 232(20), 130–142 (2013)
Article Google Scholar
Hullár, B., Laki, S., Gyorgy, A.: Early identification of peer-to-peer traffic. In: 2011 IEEE International Conference on Communications (ICC), pp. 1–6. IEEE Press (2011)
Gringoli, F., Salgarelli, L., Dusi, M., et al.: Gt: picking up the truth from the ground for internet traffic. ACM SIGCOMM Comput. Commun. Rev. 39(5), 12–18 (2009)
Article Google Scholar
Kohavi, R.: Scaling up the accuracy of Naive-Bayes classifiers: a decision-tree hybrid. In: The Second International Conference on Knowledge Discovery and Data Mining (KDD), pp. 202–207. IEEE Press (1996)
Li, W., Moore, A.W.: A machine learning approach for efficient traffic classification. In: Proceedings of IEEE MASCOTS’07, pp. 310–317 (2007)
Maes, F., Collignon, A., Vandermeulen, D., et al.: Multimodality image registration by maximization of mutual information. IEEE Trans. Med. Imaging 16(2), 187–198 (1997)
Article Google Scholar
Maron, M.E.: Automatic indexing: an experimental inquiry. J. ACM 8(3), 404–417 (1961)
Article MATH Google Scholar
Moore, A.W., Zuev, D., Crogan, M.: Discriminators for use in flow-based classification. Intel Research Tech. Rep (2005)
Moore, A.W., Zuev, D.: Internet traffic classification using Bayesian analysis techniques. In: ACM SIGMETRICS’05, pp. 50–60 (2005)
Nguyen, T.T.T., Armitage, G.: A survey of techniques for internet traffic classification using machine learning. IEEE Commun. Surv. Tutor. 10(4), 56–76 (2008)
Article Google Scholar
Nguyen, T.T.T., Armitage, G., Branch, P., et al.: Timely and continuous machine-learning-based classification for interactive IP traffic. IEEE/ACM Trans. Netw. 20(6), 1880–1894 (2012)
Article Google Scholar
Peng, H.: Mutual infomation Matlab toolbox, http://www.mathworks.com/matlabcentral/fileexchange/14888-mutual-information-computation
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)
Article Google Scholar
Peng, L., Zhang, H., Yang, B., et al.: Traffic labeller: collecting internet traffic samples with accurate application information. China Commun. 11(1), 67–78 (2014)
Article MathSciNet Google Scholar
Qu, B., Zhang, Z., Guo, L., et al.: On accuracy of early traffic classification. In: IEEE 7th International Conference on Networking, Architecture and Storage (NAS), pp. 348–354. IEEE Press (2012)
Quinlan, J.: C4.5: Programs for Machine Learning. Morgan Kauffman, Los Altos (1993)
Google Scholar
Rizzi, A., Colabrese, S., Baiocchi, A.: Low complexity, high performance neuro-fuzzy system for internet traffic flows early classification. In: 2013 9th International Wireless Communications and Mobile Computing Conference (IWCMC), pp. 77–82. IEEE Press (2013)
Svetnik, V., Liaw, A., Tong, C., et al.: Random forest: a classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 43(6), 1947–1958 (2003)
Article Google Scholar
Tcpdump/Libpcap. http://www.tcpdump.org
UNIBS: Data sharing. http://www.ing.unibs.it/ntw/tools/traces/
Waikato Internet Traffic Storage (WITS). http://www.wand.net.nz/wits
Weka 3: Data Mining Software in Java, http://www.cs.waikato.ac.nz/ml/weka/
Zander, S., Nguyen, T.T.T., Armitage, G.: Automated traffic classification and application identification using machine learning. In: IEEE Conference on Local Computer Networks 30th Anniversary, IEEE Press (2005)

Download references

Acknowledgments

This research was partially supported by the National Natural Science Foundation of China under Grant Nos. 61472164, 61173078, 61203105, 61173079, and 61373054, the Provincial Natural Science Foundation of Shandong under Grant Nos. ZR2012FM010, ZR2011FZ001, ZR2013FL002 and ZR2012FQ016.

Author information

Authors and Affiliations

Shandong Provincial Key Laboratory for Network Based Intelligent Computing, University of Jinan, Jinan, 250022, People’s Republic of China
Lizhi Peng, Bo Yang, Yuehui Chen & Zhenxiang Chen

Authors

Lizhi Peng
View author publications
You can also search for this author in PubMed Google Scholar
Bo Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yuehui Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhenxiang Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bo Yang.

Appendix: Detailed Results of the Experimental Study

See Tables 3, 4, 5 and 6.

Table 3 Mutual information of all features (the best performed one of each column is shown in bold)

Full size table

Table 4 Accuracy results for the Auckland II data set (the best performed one of each row is shown in bold)

Full size table

Table 5 Accuracy results for the UNIBS data set (the best performed one of each row is shown in bold)

Full size table

Table 6 Accuracy results for the UJN data set (the best performed one of each row is shown in bold)

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Peng, L., Yang, B., Chen, Y. et al. Effectiveness of Statistical Features for Early Stage Internet Traffic Identification. Int J Parallel Prog 44, 181–197 (2016). https://doi.org/10.1007/s10766-014-0337-2

Download citation

Received: 03 July 2014
Accepted: 09 October 2014
Published: 18 January 2015
Issue Date: February 2016
DOI: https://doi.org/10.1007/s10766-014-0337-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effectiveness of Statistical Features for Early Stage Internet Traffic Identification

Abstract

Access this article

Similar content being viewed by others

Feature Evaluation for Early Stage Internet Traffic Identification

Traffic Identification in Big Internet Data

Flexible neural trees based early stage identification for IP traffic

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Detailed Results of the Experimental Study

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Effectiveness of Statistical Features for Early Stage Internet Traffic Identification

Abstract

Access this article

Similar content being viewed by others

Feature Evaluation for Early Stage Internet Traffic Identification

Traffic Identification in Big Internet Data

Flexible neural trees based early stage identification for IP traffic

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Detailed Results of the Experimental Study

Appendix: Detailed Results of the Experimental Study

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation