skip to main content
10.1145/3534678.3539314acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

MT-FlowFormer: A Semi-Supervised Flow Transformer for Encrypted Traffic Classification

Published: 14 August 2022 Publication History

Abstract

With the increasing demand for the protection of personal network meta-data, encrypted networks have grown in popularity, so do the challenge of monitoring and analyzing encrypted network traffic. Currently, some deep learning-based methods have been proposed to leverage statistical features for encrypted traffic classification, which are barely affected by encryption techniques. However, these works still suffer from two main intrinsic limitations: (1) the feature extraction process lacks a mechanism to take into account correlations between flows in the flow sequence; and (2) a large volume of manually-labeled data is required for training an effective deep classifier. In this paper, we propose a novel semi-supervised framework to address these problems. To be specific, an efficient classifier with attention mechanism is proposed to extract features from flow sequences with low computational cost. Then, a Mean Teacher-style semi-supervised framework is adopted to exploit the unlabeled traffic data, where a spatiotemporal data augmentation method is designed as the key component to explore the spatial and temporal relationship within the unlabeled traffic data. Experimental results on two real-world traffic datasets demonstrate that our method outperforms state-of-the-art methods with a large margin.

Supplemental Material

MP4 File
Video Presentation

References

[1]
Sara A. Althubiti, Eric Marcell Jones, and Kaushik Roy. 2018. LSTM for anomaly based network intrusion detection. In International Telecommunication Networks and Applications Conference. 1--3.
[2]
Blake Anderson and David A. McGrew. 2017. Machine learning for encrypted malware traffic classification: accounting for noisy labels and non-stationarity. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). 1723--1732.
[3]
Ons Aouedi, Kandaraj Piamrat, and Dhruvjyoti Bagadthey. 2020. A semisupervised stacked autoencoder approach for network traffic classification. In IEEE International Conference on Network Protocols (ICNP). 1--6.
[4]
David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver, and Colin A Raffel. 2019. Mixmatch: a holistic approach to semi-supervised learning. In Neural Information Processing Systems (NIPS).
[5]
Qi Cai, Yingwei Pan, Chong-Wah Ngo, Xinmei Tian, Lingyu Duan, and Ting Yao. 2019. Exploring object relation in mean teacher for cross-domain detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 11457--11466.
[6]
Rewon Child, Scott Gray, Alec Radford, and Ilya Sutskever. 2019. Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509 (2019).
[7]
Wenhui Cui, Yanlin Liu, Yuxing Li, Menghao Guo, Yiming Li, Xiuli Li, Tianle Wang, Xiangzhu Zeng, and Chuyang Ye. 2019. Semi-supervised brain lesion segmentation with an adapted mean teacher model. In International Conference on Information Processing in Medical Imaging. 554--565.
[8]
Zihang Dai, Guokun Lai, Yiming Yang, and Quoc Le. 2020. Funnel-Transformer: filtering out sequential redundancy for efficient language processing. In Neural Information Processing Systems (NIPS).
[9]
Zihang Dai, Zhilin Yang, Yiming Yang, Jaime G. Carbonell, Quoc Viet Le, and Ruslan Salakhutdinov. 2019. Transformer-XL: attentive language models beyond a fixed-length context., 2978--2988 pages.
[10]
Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, and Lukasz Kaiser. 2019. Universal Transformers. In International Conference on Learning Representations (ICLR).
[11]
Adil Fahad, Abdulmohsen Almalawi, Zahir Tari, Kurayman Alharthi, Fawaz S. Al-Qahtani, and Mohamed Cheriet. 2019. SemTra: A semi-supervised approach to traffic flow labeling with minimal human effort. Pattern Recognition 91 (2019), 1--12.
[12]
Angela Fan, Thibaut Lavril, Edouard Grave, Armand Joulin, and Sainbayar Sukhbaatar. 2020. Addressing some limitations of transformers with feedback memory. arXiv preprint arXiv:2002.09402 (2020).
[13]
Chuanpu Fu, Qi Li, Meng Shen, and Ke Xu. 2021. Realtime robust malicious traffic detection via frequency domain analysis. In ACM SIGSAC Conference on Computer and Communications Security (CCS). 3431--3446.
[14]
Sascha Grollmisch and Estefanía Cano. 2021. Improving semi-supervised learning for audio classification with FixMatch. Electronics 10, 15 (2021), 1807.
[15]
Qipeng Guo, Xipeng Qiu, Pengfei Liu, Xiangyang Xue, and Zheng Zhang. 2020. Multi-scale self-attention for text classification. In Conference on Artificial Intelligence (AAAI), Vol. 34. 7847--7854.
[16]
Arash Habibi Lashkari, Gerard Draper Gil, Mohammad Mamun, and Ali Ghorbani. 2016. Characterization of encrypted and VPN traffic using time-related features. In International Conference on Information Systems Security and Privacy. 407--414.
[17]
Arash Habibi Lashkari., Gerard Draper Gil., Mohammad Saiful Islam Mamun., and Ali A. Ghorbani. 2017. Characterization of Tor traffic using time based features. In International Conference on Information Systems Security and Privacy. 253--262.
[18]
Auwal Sani Iliyasu and Huifang Deng. 2019. Semi-supervised encrypted traffic classification with deep convolutional generative adversarial networks. IEEE Access 8 (2019), 118--126.
[19]
Chang Liu, Longtao He, Gang Xiong, Zigang Cao, and Zhen Li. 2019. FS-Net: a flow sequence network for encrypted traffic classification. In IEEE International Conference on Computer Communications (INFOCOM). 1171--1179.
[20]
Junming Liu, Yanjie Fu, Jingci Ming, Yong Ren, Leilei Sun, and Hui Xiong. 2017. Effective and real-time in-App activity analysis in encrypted internet traffic streams. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). 335--344.
[21]
Alok Madhukar and Carey Williamson. 2006. A longitudinal study of p2p traffic classification. In IEEE International Symposium on Modeling, Analysis, and Simulation. 179--188.
[22]
Yisroel Mirsky, Tomer Doitshman, Yuval Elovici, and Asaf Shabtai. 2018. Kitsune: an ensemble of autoencoders for online network intrusion detection. In Network and Distributed System Security Symposium (NDSS).
[23]
Antonio Montieri, Domenico Ciuonzo, Giuseppe Aceto, and Antonio Pescapé. 2020. Anonymity services Tor, I2P, JonDonym: classifying in the dark (web). IEEE Transactions on Dependable and Secure Computing 17, 3 (2020), 662--675.
[24]
Eva Papadogiannaki and Sotiris Ioannidis. 2021. A survey on encrypted network traffic analysis applications, techniques, and countermeasures. Comput. Surveys 54, 6 (2021).
[25]
Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Noam Shazeer, Alexander Ku, and Dustin Tran. 2018. Image Transformer. In International Conference on Machine Learning (ICML). 4055--4064.
[26]
Shahbaz Rezaei and Xin Liu. 2019. Deep learning for encrypted traffic classification: an overview. IEEE Communications Magazine 57, 5 (2019), 76--81.
[27]
Shahbaz Rezaei and Xin Liu. 2019. How to achieve high classification accuracy with just a few labels: A semi-supervised approach using sampled packets. In Industrial Conference on Data Mining. 1--15.
[28]
Meng Shen, Yiting Liu, Liehuang Zhu, Ke Xu, Xiaojiang Du, and Nadra Guizani. 2020. Optimizing feature selection for efficient encrypted traffic classification: a systematic approach. IEEE Network 34, 4 (2020), 20--27.
[29]
Kihyuk Sohn, David Berthelot, Nicholas Carlini, Zizhao Zhang, Han Zhang, Colin Raffel, Ekin Dogus Cubuk, Alexey Kurakin, and Chun-Liang Li. 2020. FixMatch: simplifying semi-supervised learning with consistency and confidence. In Neural Information Processing Systems (NIPS).
[30]
Antti Tarvainen and Harri Valpola. 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In International Conference on Learning Representations (ICLR).
[31]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Neural Information Processing Systems (NIPS). 5998--6008.
[32]
Vikas Verma, Kenji Kawaguchi, Alex Lamb, Juho Kannala, Arno Solin, Yoshua Bengio, and David Lopez-Paz. 2022. Interpolation consistency training for semisupervised learning. Neural Networks 145 (2022), 90--106.
[33]
Benyou Wang, Donghao Zhao, Christina Lioma, Qiuchi Li, Peng Zhang, and Jakob Grue Simonsen. 2019. Encoding word order in complex embeddings. In International Conference on Learning Representations (ICLR).
[34]
JingWang, Jian Tang, Zhiyuan Xu, YanzhiWang, Guoliang Xue, Xing Zhang, and Dejun Yang. 2017. Spatiotemporal modeling and prediction in cellular networks: a big data enabled deep learning approach. In IEEE International Conference on Computer Communications (INFOCOM). 1--9.
[35]
Lingfeng Wang, Shisen Wang, Jin Qi, and Kenji Suzuki. 2021. A multi-task mean teacher for semi-supervised facial affective behavior analysis. In IEEE/CVF International Conference on Computer Vision (ICCV). 3603--3608.
[36]
WeiWang, Ming Zhu, Xuewen Zeng, Xiaozhou Ye, and Yiqiang Sheng. 2017. Malware traffic classification using convolutional neural network for representation learning. In International Conference on Information Networking. 712--717.
[37]
Zhanghao Wu, Zhijian Liu, Ji Lin, Yujun Lin, and Song Han. 2019. Lite transformer with long-short range attention. In International Conference on Learning Representations (ICLR).
[38]
Gaogang Xie, Kun Xie, Jun Huang, Xin Wang, Yuxiang Chen, and Jigang Wen. 2017. Fast low-rank matrix approximation with locality sensitive hashing for quick anomaly detection. In IEEE International Conference on Computer Communications (INFOCOM). 1--9.
[39]
Yu Yan, Lin Qi, Jie Wang, Yun Lin, and Lei Chen. 2020. A network intrusion detection method based on stacked autoencoder and LSTM. In IEEE International Conference on Communications (ICC). 1--6.
[40]
Ying Yang, Cuicui Kang, Gaopeng Gou, Zhen Li, and Gang Xiong. 2018. TLS/SSL encrypted traffic classification with autoencoder and convolutional neural network. In IEEE International Conference on High Performance Computing and Communications. 362--369.
[41]
Haipeng Yao, Pengcheng Gao, Peiying Zhang, Jingjing Wang, Chunxiao Jiang, and Lijun Lu. 2019. Hybrid intrusion detection system for edge-based IIoT relying on machine-learning-aided detection. IEEE Network 33, 5 (2019), 75--81.
[42]
Haipeng Yao, Chong Liu, Peiying Zhang, Sheng Wu, Chunxiao Jiang, and Shui Yu. 2019. Identification of encrypted traffic through attention mechanism based long short term memory. IEEE Transactions on Big Data (2019), 1--1.
[43]
Yi Zeng, Huaxi Gu,WentingWei, and Yantao Guo. 2019. Deep-Full-Range: a deep learning based network encrypted traffic classification and intrusion detection framework. IEEE Access (2019), 45182--45190.
[44]
Hongyi Zhang, Moustapha Cissé, Yann N. Dauphin, and David Lopez-Paz. 2018. mixup: beyond empirical risk minimization. In International Conference on Learning Representations (ICLR).
[45]
Jielun Zhang, Fuhao Li, Feng Ye, and Hongyu Wu. 2020. Autonomous unknownapplication filtering and labeling for DL-based traffic classifier update. In IEEE International Conference on Computer Communications (INFOCOM). 397--405.
[46]
Ruijie Zhao, Yiteng Huang, Xianwen Deng, Zhi Xue, Jiabin Li, Zijing Huang, and YijunWang. 2021. Flow Transformer: a novel anonymity network traffic classifier with attention mechanism. In International Conference on Mobility, Sensing and Networking (MSN). 1--8.
[47]
Wenbo Zheng, Chao Gou, Lan Yan, and Shaocong Mo. 2020. Learning to classify: a flow-based relation network for encrypted traffic classification. In The Web Conference (WWW). 13--22.
[48]
Dengyong Zhou, Olivier Bousquet, Thomas Lal, Jason Weston, and Bernhard Schölkopf. 2004. Learning with local and global consistency. In Neural Information Processing Systems (NIPS), Vol. 16.

Cited By

View all
  • (2025)Beyond known threats: A novel strategy for isolating and detecting unknown malicious trafficJournal of Information Security and Applications10.1016/j.jisa.2024.10392089(103920)Online publication date: Mar-2025
  • (2025)A survey on encrypted network traffic: A comprehensive survey of identification/classification techniques, challenges, and future directionsComputer Networks10.1016/j.comnet.2024.110984257(110984)Online publication date: Feb-2025
  • (2024)An Encrypted Traffic Classification Approach Based on Path Signature Features and LSTMElectronics10.3390/electronics1315306013:15(3060)Online publication date: 2-Aug-2024
  • Show More Cited By

Index Terms

  1. MT-FlowFormer: A Semi-Supervised Flow Transformer for Encrypted Traffic Classification

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
    August 2022
    5033 pages
    ISBN:9781450393850
    DOI:10.1145/3534678
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 August 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. data augmentation
    2. semi-supervised learning
    3. traffic classification
    4. transformer

    Qualifiers

    • Research-article

    Funding Sources

    • Cyber Security from the National Key Research and Development Program of Shanghai Jiao Tong University under Grant

    Conference

    KDD '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)388
    • Downloads (Last 6 weeks)46
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Beyond known threats: A novel strategy for isolating and detecting unknown malicious trafficJournal of Information Security and Applications10.1016/j.jisa.2024.10392089(103920)Online publication date: Mar-2025
    • (2025)A survey on encrypted network traffic: A comprehensive survey of identification/classification techniques, challenges, and future directionsComputer Networks10.1016/j.comnet.2024.110984257(110984)Online publication date: Feb-2025
    • (2024)An Encrypted Traffic Classification Approach Based on Path Signature Features and LSTMElectronics10.3390/electronics1315306013:15(3060)Online publication date: 2-Aug-2024
    • (2024)AutoML4ETC: Automated Neural Architecture Search for Real-World Encrypted Traffic ClassificationIEEE Transactions on Network and Service Management10.1109/TNSM.2023.332493621:3(2715-2730)Online publication date: Jun-2024
    • (2024)A Novel Self-Supervised Framework Based on Masked Autoencoder for Traffic ClassificationIEEE/ACM Transactions on Networking10.1109/TNET.2023.333525332:3(2012-2025)Online publication date: Jun-2024
    • (2024)Fractal: Facilitating Robust Encrypted Traffic Classification Using Data Augmentation and Contrastive Learning2024 IEEE International Conference on Systems, Man, and Cybernetics (SMC)10.1109/SMC54092.2024.10831226(3076-3083)Online publication date: 6-Oct-2024
    • (2024)Unidirectional Encrypted Traffic Classification: A Survey2024 Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC)10.1109/IPEC61310.2024.00125(702-708)Online publication date: 12-Apr-2024
    • (2024)An Accurate And Lightweight Intrusion Detection Model Deployed on Edge Network Devices2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10651457(1-8)Online publication date: 30-Jun-2024
    • (2024)Privacy-Preserving Artificial Intelligence on Edge Devices: A Homomorphic Encryption Approach2024 IEEE International Conference on Web Services (ICWS)10.1109/ICWS62655.2024.00061(395-405)Online publication date: 7-Jul-2024
    • (2024)Netmamba: Efficient Network Traffic Classification Via Pre-Training Unidirectional Mamba2024 IEEE 32nd International Conference on Network Protocols (ICNP)10.1109/ICNP61940.2024.10858569(1-11)Online publication date: 28-Oct-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media