ABSTRACT
The selection of features for network traffic analysis and anomaly detection is a challenge for experts who aim to build systems that discover traffic patterns, characterize networks, and improve security. There are no major guidelines or best practices for feature selection in the field. The literature is full of different proposals that ultimately depend on feature availability, types of known traffic, tool limitations, specific goals, and, fundamentally, the experts' knowledge and intuition. In this work we have revisited 71 principal publications in the field of network traffic analysis from 2005 to 2017. Relevant information has been curated according to formalized data structures and stored in JSON format, creating a database for the smart retrieval of network traffic analysis researches. Meta-analysis performed upon the explored publications disclosed a set of main features that are common in a considerable volume of works and could be used as a baseline for future research. Additionally, aiming for validation and generalization in network traffic research, the creation of such meta-analysis environments is highly valuable. It allows homogenizing and joining criteria for the design of experiments, thus avoiding getting lost or becoming irrelevant due to the high complexity and variability that network traffic analysis involves.
Supplemental Material
- Maristella Agosti, Giorgio Maria Di Nunzio, and Nicola Ferro. 2007. The Importance of Scientific Data Curation for Evaluation Campaigns. Springer Berlin Heidelberg, Berlin, Heidelberg, 157--166. Google ScholarDigital Library
- Michael Borenstein, Larry V. Hedges, Julian P. T. Higgins, and Hannah R. Roth-stein. 2009. Introduction to Meta-Analysis. John Wiley & Sons, Ltd. 409--414 pages.Google Scholar
- T. Bray. 2014. RFC 7159: The JavaScript Object Notation (JSON) Data Interchange Format. Technical Report. Internet Engineering Task Force (IETF).Google Scholar
- A. Callado, C. Kamienski, G. Szabo, B. P. Gero, J. Kelner, S. Fernandes, and D. Sadok. 2009. A Survey on Internet Traffic Identification. IEEE Communications Surveys Tutorials 11, 3 (2009), 37--52. Google ScholarDigital Library
- B. Claise and B. Trammell. 2013. RFC 7012: Information Model for IP Flow Information Export (IPFIX). Technical Report. Internet Engineering Task Force (IETF). https://www.iana.org/assignments/ipfix/ipfix.xhtmlGoogle Scholar
- Félix Iglesias and Tanja Zseby. 2015. Analysis of network traffic features for anomaly detection. Machine Learning 101, 1 (2015), 59--84. Google ScholarDigital Library
- Hyunchul Kim, KC Claffy, Marina Fomenkov, Dhiman Barman, Michalis Faloutsos, and KiYoung Lee. 2008. Internet Traffic Classification Demystified: Myths, Caveats, and the Best Practices. In Proceedings of the 2008 ACM CoNEXT Conference (CoNEXT '08). ACM, New York, NY, USA, Article 11, 12 pages. Google ScholarDigital Library
- TU Wien CN Group. 2017. Network Traffic Analysis Database. (2017). https://www.cn.tuwien.ac.at/metaGoogle Scholar
- Tanja Zseby, Felix Iglesias Vazquez, Alistair King, and K.C. Claffy. 2015. Teaching Network Security With IP Darkspace Data. IEEE Transactions on Education 59, 1 (2015), 1--7. Google ScholarDigital Library
Index Terms
- A Meta-Analysis Approach for Feature Selection in Network Traffic Research
Recommendations
A Clustering Analysis Method for Network Traffic Based on Feature Parameter Distribution
ETCS '09: Proceedings of the 2009 First International Workshop on Education Technology and Computer Science - Volume 02Network traffic analysis needs a lot of data which include much information. Predominating pattern state of traffic true and roundly has been an active and difficult research topic in the field of traffic analysis for many years. Up to now, simplex data ...
Meta-analysis in psychology: a bibliometric study
Meta-analysis refers to the statistical methods used in research synthesis for combining and integrating results from individual studies. The present study draws on the strengths of bibliometric methods in order to offer an overview of meta-analytic ...
A machine learning approach for feature selection traffic classification using security analysis
Class imbalance has become a big problem that leads to inaccurate traffic classification. Accurate traffic classification of traffic flows helps us in security monitoring, IP management, intrusion detection, etc. To address the traffic classification ...
Comments