Elsevier

Neurocomputing

Volume 409, 7 October 2020, Pages 306-315
Neurocomputing

Toward effective mobile encrypted traffic classification through deep learning

https://doi.org/10.1016/j.neucom.2020.05.036Get rights and content

Abstract

Traffic Classification (TC), consisting in how to infer applications generating network traffic, is currently the enabler for valuable profiling information, other than being the workhorse for service differentiation/blocking. Further, TC is fostered by the blooming of mobile (mostly encrypted) traffic volumes, fueled by the huge adoption of hand-held devices. While researchers and network operators still rely on machine learning to pursue accurate inference, we envision Deep Learning (DL) paradigm as the stepping stone toward the design of practical (and effective) mobile traffic classifiers based on automatically-extracted features, able to operate with encrypted traffic, and reflecting complex traffic patterns. In this context, the paper contribution is fourfold. First, it provides a taxonomy of the key network traffic analysis subjects where DL is foreseen as attractive. Secondly, it delves into the non-trivial adoption of DL to mobile TC, surfacing potential gains. Thirdly, to capitalize such gains, it proposes and validates a general framework for DL-based encrypted TC. Two concrete instances originating from our framework are then experimentally evaluated on three mobile datasets of human users’ activity. Lastly, our framework is leveraged to point to future research perspectives.

Introduction

In last years network operators have experienced tremendous growth of network traffic, mostly generated by mobile devices [1]. To face this unique challenge, sophisticated network monitoring systems, incorporating intelligence through machine learning (ML), are employed by several network players [2]. Yet, their success resorts to the design of handcrafted features, thanks to domain experts. Such process is impractical when facing the fast-paced mobile traffic evolution, because it can be neither automated nor crowdsourced to non-experts (due to the high specialization required). After a large number of ML-based approaches [3], [4], [5], [6], recently deep learning (DL) [7], [8], a cutting-edge subset of ML techniques, has emerged as the disruptive breakthrough toward the automatic design of accurate inference systems able to capture complex dependencies among data, thus limiting human expert intervention.

A pillar for network monitoring services is represented by traffic classification (TC) [9], namely how to infer the application generating the traffic. Indeed, TC represents a key prerequisite for security and QoS enforcement, and additional appeal is arising for mobile TC [10], [11], [12], [13] due to its potential for valuable profiling information (e.g. to advertisers and security agencies), while also implying privacy downsides (e.g. recognition of health or dating apps, or in bring-your-own-device scenarios). Concurrently, the broad adoption of encrypted protocols (TLS) and dynamic ports blocks the road to accurate TC, defeating traditional deep packet inspection and port-based techniques [9], [14]. This paves the way to DL techniques, here envisioned as the stepping stone toward the fulfillment of high performance in the challenging encrypted traffic [11], [15] contexts, allowing to train classifiers directly from input data by automatically distilling structured and complex feature representations [7], [16]. Still, DL adoption in network TC is thorny, and currently less understood [13]. More important, other than the encrypted-traffic issue, mobile TC is marked by a high number of apps, possibly generating similar traffic patterns and with complex fingerprints. The latter is due to scarce number of training samples per app and device/OS/version diversity. Hence, such challenging and dynamic scenario justifies DL higher complexity and training requirements.

In view of the discussed considerations, the contributions of this work are manifold:

  • We give an overview of the key network traffic analysis subjects where DL is foreseen as attractive, since their common intent is to capitalize network-level raw data automatically to extract valuable info.

  • We categorize the state-of-the-art in DL-based TC toward its effective application in mobile and encrypted context, providing also a systematic taxonomy, of the most-related literature.

  • To pinpoint and overcome the limitations of literature, we propose a general framework for DL-based mobile and encrypted TC, based on a rigorous definition of its milestones: (i) the choice of the traffic object, (ii) the definition of the input(s), (iii) the simultaneous TC tasks required, and (iv) the corresponding DL architecture. Thanks to the above framework, clear guidelines are provided to designers for the judicious choice of relevant segmentation criteria and unbiased (while effective) input(s) in DL-based TC [13], [17]. More importantly, our proposal overcomes the design limitations of current works (limited to either single-modality or single-task learning, e.g. [17], [18], [19], [20]), by envisioning the joint use of multi-modal and multi-task techniques via the “connectionist” approach granted by DL.

  • We validate two actual implementations of the proposed framework on three recent human-generated mobile traffic datasets. One instance coincides with the best DL-based baseline on mobile encrypted TC [13], while the other is a novel architecture, drawn from our proposal, we devise herein to exploit multiple inputs. We show that the latter instance surpasses the former, accurately predicts the app generating the traffic, and beats the state-of-the-art in ML-based mobile TC [11].

  • Finally, our framework allows us to surface future perspectives toward an effective mobile and encrypted TC by means of advanced DL techniques.

The rest of the paper is organized as follows: Section 2 presents a review of the recent success achieved by DL in network traffic analysis; Section 3 provides a categorization of literature background on TC through DL; the proposed general framework for DL-based mobile and encrypted TC is described in Section 4, with Section 5 reporting the experimental validation of its two proposed implementations; finally, Section 6 suggests insights and possible future directions.

To foster manuscript readability, Table 1 summarizes the acronyms used in the main text. Conversely, we report those used only in tables within the corresponding captions.

Section snippets

Deep learning in network traffic analysis

Telecom operators and ISPs have a long history of traffic-data analysis operations, possess a huge availability of network-level data, and have thus enjoyed decades-long research and applications on the topic. The huge success of DL in several fields is recently igniting global interests in exploiting it also in networking, where its adoption can leverage this solid know-how and help facing new challenges of mobile network-level data analysis.

To this end, in this section we review the recent

Deep learning in traffic classification

In this section we provide an intuitive categorization, via a systematic taxonomy, of literature on DL-based TI and TC. We point out that a number of works have faced mobile TC in the last five years, under encrypted-traffic assumption, mostly using ML and based on bot-generated traffic [10], [11].

On the other hand, the appeal of DL to TC is confirmed by several recent works providing initial design attempts of DL-based traffic classifiers, either not-mobile or not-encrypted. All these works

A general framework for deep learning-based mobile encrypted traffic classification

In the following, we introduce and dissect our DL framework for mobile and encrypted TC. Fig. 1 illustrates the proposed framework in terms of its workflow, highlighting the key differences with respect to a traditional ML workflow (cf. Fig. 1(a)). Specifically, the mobile traffic flowing over a network device is captured and segmented into defined packet aggregates of traffic (traffic segmentation). Then, from each traffic object, raw input data is selected (input data selection) and used to

Experimental validation

In this section, we test two actual implementations of the proposed DL framework for mobile and encrypted TC based on three recent human-generated mobile traffic datasets. First, we describe the aforementioned datasets and the key performance indicators (KPIs) adopted for evaluation of TC effectiveness (Section 5.1). Secondly, we show and discuss the experimental results obtained (Section 5.2).

Discussion and future perspectives

In this work we envisioned a DL application to the field of network traffic analysis, focusing on the identification and classification of mobile and encrypted traffic. The result of our study is a DL-based TC framework able to capitalize heterogeneous input data from mobile traffic and solve multiple TC tasks at the same time.

By means of our framework we highlight several shortcomings with previous DL-based attempts to TC, namely: (i) traffic segmentation is often implicit or overlooked; (ii)

CRediT authorship contribution statement

Giuseppe Aceto: Conceptualization, Methodology, Validation, Formal analysis, Investigation, Writing - original draft, Writing - review & editing. Domenico Ciuonzo: Conceptualization, Methodology, Validation, Formal analysis, Investigation, Writing - original draft, Writing - review & editing. Antonio Montieri: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Writing - original draft, Writing - review & editing, Visualization. Antonio Pescapé:

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Giuseppe Aceto is an Assistant Professor at University of Napoli Federico II. He has a PhD in telecommunication engineering from the same University. His work falls in monitoring of network performance and security (focusing on censorship) both in traditional and SDN network environments. He is also working on bioinformatics and ICTs applied to health. He is the recipient of a best paper award at IEEE ISCC 2010, and 2018 Best Journal Paper Award by IEEE CSIM.

References (45)

  • I. Goodfellow, Y. Bengio, A. Courville, Deep learning, MIT Press,...
  • N. Zeng et al.

    Deep belief networks for quantitative analysis of a gold immunochromatographic strip

    Cognitive Comput.

    (2016)
  • A. Dainotti et al.

    Issues and future directions in Traffic Classification

    IEEE Network

    (2012)
  • T. Stöber, M. Frank, J. Schmitt, I. Martinovic, Who do you sync you are? smartphone fingerprinting via application...
  • V.F. Taylor et al.

    Robust smartphone app identification via encrypted network traffic analysis

    IEEE Trans. Inf. Forensics Secur.

    (2018)
  • G. Aceto et al.

    Mobile encrypted traffic classification using deep learning: experimental evaluation, lessons learned, and challenges

    IEEE Trans. Network Service Manage.

    (2019)
  • A. Dainotti et al.

    Identification of Traffic Flows Hiding behind TCP Port 80

  • B. Saltaformaggio, H. Choi, K. Johnson, Y. Kwon, Q. Zhang, X. Zhang, D. Xu, J. Qian, Eavesdropping on fine-grained user...
  • M. Lotfollahi et al.

    Deep packet: a novel approach for encrypted traffic classification using deep learning

    Soft. Comput.

    (2017)
  • Z. Wang

    The applications of deep learning on traffic identification

    Black Hat USA, Las Vegas

    (2015)
  • M. Lopez-Martin

    Network traffic classifier with convolutional and recurrent neural networks for internet of things

    IEEE Access

    (2017)
  • H. Sun et al.

    Common knowledge based and one-shot learning enabled multi-task traffic classification

    IEEE Access

    (2019)
  • Cited by (0)

    Giuseppe Aceto is an Assistant Professor at University of Napoli Federico II. He has a PhD in telecommunication engineering from the same University. His work falls in monitoring of network performance and security (focusing on censorship) both in traditional and SDN network environments. He is also working on bioinformatics and ICTs applied to health. He is the recipient of a best paper award at IEEE ISCC 2010, and 2018 Best Journal Paper Award by IEEE CSIM.

    Domenico Ciuonzo is an Assistant Professor at University of Napoli Federico II. He holds a Ph.D. in Electronic Engineering from the University of Campania “L. Vanvitelli” and, from 2011, he has held several visiting researcher appointments. Since 2014, he has been in the editorial board of different IEEE Elsevier and IET journals. His research concerns data fusion, wireless sensor networks, machine learning and network analytics. He is an IEEE Senior Member.

    Antonio Montieri is a Postdoctoral Researcher at the Department of Electrical Engineering and Information Technology of the University of Napoli Federico II since 2017. He has received a Ph.D. in Information Technology and Electrical Engineering from the same University in 2020. His work concerns network measurements, (encrypted and mobile) traffic classification and modeling, monitoring of cloud network performance. Antonio has co-authored 22 papers in international journals and conference proceedings.

    Antonio Pescapé is a Full Professor of computer engineering at the University of Napoli “Federico II”. His work focuses on Internet technologies and more precisely on measurement, monitoring, and analysis of the Internet. Recently, he is working on bioinformatic and ICTs for a smarter health. Antonio has co-authored more than 200 conference and journal papers and is the recipient of a number of research awards.

    View full text