Toward effective mobile encrypted traffic classification through deep learning
Introduction
In last years network operators have experienced tremendous growth of network traffic, mostly generated by mobile devices [1]. To face this unique challenge, sophisticated network monitoring systems, incorporating intelligence through machine learning (ML), are employed by several network players [2]. Yet, their success resorts to the design of handcrafted features, thanks to domain experts. Such process is impractical when facing the fast-paced mobile traffic evolution, because it can be neither automated nor crowdsourced to non-experts (due to the high specialization required). After a large number of ML-based approaches [3], [4], [5], [6], recently deep learning (DL) [7], [8], a cutting-edge subset of ML techniques, has emerged as the disruptive breakthrough toward the automatic design of accurate inference systems able to capture complex dependencies among data, thus limiting human expert intervention.
A pillar for network monitoring services is represented by traffic classification (TC) [9], namely how to infer the application generating the traffic. Indeed, TC represents a key prerequisite for security and QoS enforcement, and additional appeal is arising for mobile TC [10], [11], [12], [13] due to its potential for valuable profiling information (e.g. to advertisers and security agencies), while also implying privacy downsides (e.g. recognition of health or dating apps, or in bring-your-own-device scenarios). Concurrently, the broad adoption of encrypted protocols (TLS) and dynamic ports blocks the road to accurate TC, defeating traditional deep packet inspection and port-based techniques [9], [14]. This paves the way to DL techniques, here envisioned as the stepping stone toward the fulfillment of high performance in the challenging encrypted traffic [11], [15] contexts, allowing to train classifiers directly from input data by automatically distilling structured and complex feature representations [7], [16]. Still, DL adoption in network TC is thorny, and currently less understood [13]. More important, other than the encrypted-traffic issue, mobile TC is marked by a high number of apps, possibly generating similar traffic patterns and with complex fingerprints. The latter is due to scarce number of training samples per app and device/OS/version diversity. Hence, such challenging and dynamic scenario justifies DL higher complexity and training requirements.
In view of the discussed considerations, the contributions of this work are manifold:
- •
We give an overview of the key network traffic analysis subjects where DL is foreseen as attractive, since their common intent is to capitalize network-level raw data automatically to extract valuable info.
- •
We categorize the state-of-the-art in DL-based TC toward its effective application in mobile and encrypted context, providing also a systematic taxonomy, of the most-related literature.
- •
To pinpoint and overcome the limitations of literature, we propose a general framework for DL-based mobile and encrypted TC, based on a rigorous definition of its milestones: (i) the choice of the traffic object, (ii) the definition of the input(s), (iii) the simultaneous TC tasks required, and (iv) the corresponding DL architecture. Thanks to the above framework, clear guidelines are provided to designers for the judicious choice of relevant segmentation criteria and unbiased (while effective) input(s) in DL-based TC [13], [17]. More importantly, our proposal overcomes the design limitations of current works (limited to either single-modality or single-task learning, e.g. [17], [18], [19], [20]), by envisioning the joint use of multi-modal and multi-task techniques via the “connectionist” approach granted by DL.
- •
We validate two actual implementations of the proposed framework on three recent human-generated mobile traffic datasets. One instance coincides with the best DL-based baseline on mobile encrypted TC [13], while the other is a novel architecture, drawn from our proposal, we devise herein to exploit multiple inputs. We show that the latter instance surpasses the former, accurately predicts the app generating the traffic, and beats the state-of-the-art in ML-based mobile TC [11].
- •
Finally, our framework allows us to surface future perspectives toward an effective mobile and encrypted TC by means of advanced DL techniques.
The rest of the paper is organized as follows: Section 2 presents a review of the recent success achieved by DL in network traffic analysis; Section 3 provides a categorization of literature background on TC through DL; the proposed general framework for DL-based mobile and encrypted TC is described in Section 4, with Section 5 reporting the experimental validation of its two proposed implementations; finally, Section 6 suggests insights and possible future directions.
To foster manuscript readability, Table 1 summarizes the acronyms used in the main text. Conversely, we report those used only in tables within the corresponding captions.
Section snippets
Deep learning in network traffic analysis
Telecom operators and ISPs have a long history of traffic-data analysis operations, possess a huge availability of network-level data, and have thus enjoyed decades-long research and applications on the topic. The huge success of DL in several fields is recently igniting global interests in exploiting it also in networking, where its adoption can leverage this solid know-how and help facing new challenges of mobile network-level data analysis.
To this end, in this section we review the recent
Deep learning in traffic classification
In this section we provide an intuitive categorization, via a systematic taxonomy, of literature on DL-based TI and TC. We point out that a number of works have faced mobile TC in the last five years, under encrypted-traffic assumption, mostly using ML and based on bot-generated traffic [10], [11].
On the other hand, the appeal of DL to TC is confirmed by several recent works providing initial design attempts of DL-based traffic classifiers, either not-mobile or not-encrypted. All these works
A general framework for deep learning-based mobile encrypted traffic classification
In the following, we introduce and dissect our DL framework for mobile and encrypted TC. Fig. 1 illustrates the proposed framework in terms of its workflow, highlighting the key differences with respect to a traditional ML workflow (cf. Fig. 1(a)). Specifically, the mobile traffic flowing over a network device is captured and segmented into defined packet aggregates of traffic (traffic segmentation). Then, from each traffic object, raw input data is selected (input data selection) and used to
Experimental validation
In this section, we test two actual implementations of the proposed DL framework for mobile and encrypted TC based on three recent human-generated mobile traffic datasets. First, we describe the aforementioned datasets and the key performance indicators (KPIs) adopted for evaluation of TC effectiveness (Section 5.1). Secondly, we show and discuss the experimental results obtained (Section 5.2).
Discussion and future perspectives
In this work we envisioned a DL application to the field of network traffic analysis, focusing on the identification and classification of mobile and encrypted traffic. The result of our study is a DL-based TC framework able to capitalize heterogeneous input data from mobile traffic and solve multiple TC tasks at the same time.
By means of our framework we highlight several shortcomings with previous DL-based attempts to TC, namely: (i) traffic segmentation is often implicit or overlooked; (ii)
CRediT authorship contribution statement
Giuseppe Aceto: Conceptualization, Methodology, Validation, Formal analysis, Investigation, Writing - original draft, Writing - review & editing. Domenico Ciuonzo: Conceptualization, Methodology, Validation, Formal analysis, Investigation, Writing - original draft, Writing - review & editing. Antonio Montieri: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Writing - original draft, Writing - review & editing, Visualization. Antonio Pescapé:
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Giuseppe Aceto is an Assistant Professor at University of Napoli Federico II. He has a PhD in telecommunication engineering from the same University. His work falls in monitoring of network performance and security (focusing on censorship) both in traditional and SDN network environments. He is also working on bioinformatics and ICTs applied to health. He is the recipient of a best paper award at IEEE ISCC 2010, and 2018 Best Journal Paper Award by IEEE CSIM.
References (45)
- et al.
A switching delayed PSO optimized extreme learning machine for short-term load forecasting
Neurocomputing
(2017) - et al.
Multi-classification approaches for classifying mobile app traffic
J. Network Comput. Appl.
(2018) - et al.
A survey of deep neural network architectures and their applications
Neurocomputing
(2017) - et al.
An efficient feature generation approach based on deep learning and feature selection techniques for traffic classification
Comput. Netw.
(2018) - et al.
MIMETIC: mobile encrypted traffic classification using multimodal deep learning
Comput. Netw.
(2019) - N. Heuveldop, et al., Ericsson mobility report, Ericsson AB, Technol. Emerg. Business, Stockholm, Sweden, Tech. Rep....
- M. Cooney, Cisco: How AI and machine learning are going to change your...
- A. Callado, C. Kamienski, G. Szabó, B.P. Gero, J. Kelner, S. Fernandes, D. Sadok, A survey on internet traffic...
- T.T. Nguyen, G. Armitage, A survey of techniques for internet traffic classification using machine learning, Commun....
- V. Carela-Español, P. Barlet-Ros, M. Solé-Simó, A. Dainotti, W. de Donato, A. Pescapè, K-Dimensional Trees for...
Deep belief networks for quantitative analysis of a gold immunochromatographic strip
Cognitive Comput.
Issues and future directions in Traffic Classification
IEEE Network
Robust smartphone app identification via encrypted network traffic analysis
IEEE Trans. Inf. Forensics Secur.
Mobile encrypted traffic classification using deep learning: experimental evaluation, lessons learned, and challenges
IEEE Trans. Network Service Manage.
Identification of Traffic Flows Hiding behind TCP Port 80
Deep packet: a novel approach for encrypted traffic classification using deep learning
Soft. Comput.
The applications of deep learning on traffic identification
Black Hat USA, Las Vegas
Network traffic classifier with convolutional and recurrent neural networks for internet of things
IEEE Access
Common knowledge based and one-shot learning enabled multi-task traffic classification
IEEE Access
Cited by (0)
Giuseppe Aceto is an Assistant Professor at University of Napoli Federico II. He has a PhD in telecommunication engineering from the same University. His work falls in monitoring of network performance and security (focusing on censorship) both in traditional and SDN network environments. He is also working on bioinformatics and ICTs applied to health. He is the recipient of a best paper award at IEEE ISCC 2010, and 2018 Best Journal Paper Award by IEEE CSIM.
Domenico Ciuonzo is an Assistant Professor at University of Napoli Federico II. He holds a Ph.D. in Electronic Engineering from the University of Campania “L. Vanvitelli” and, from 2011, he has held several visiting researcher appointments. Since 2014, he has been in the editorial board of different IEEE Elsevier and IET journals. His research concerns data fusion, wireless sensor networks, machine learning and network analytics. He is an IEEE Senior Member.
Antonio Montieri is a Postdoctoral Researcher at the Department of Electrical Engineering and Information Technology of the University of Napoli Federico II since 2017. He has received a Ph.D. in Information Technology and Electrical Engineering from the same University in 2020. His work concerns network measurements, (encrypted and mobile) traffic classification and modeling, monitoring of cloud network performance. Antonio has co-authored 22 papers in international journals and conference proceedings.
Antonio Pescapé is a Full Professor of computer engineering at the University of Napoli “Federico II”. His work focuses on Internet technologies and more precisely on measurement, monitoring, and analysis of the Internet. Recently, he is working on bioinformatic and ICTs for a smarter health. Antonio has co-authored more than 200 conference and journal papers and is the recipient of a number of research awards.