skip to main content
10.1145/3097983.3098049acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Effective and Real-time In-App Activity Analysis in Encrypted Internet Traffic Streams

Published: 04 August 2017 Publication History

Abstract

The mobile in-App service analysis, aiming at classifying mobile internet traffic into different types of service usages, has become a challenging and emergent task for mobile service providers due to the increasing adoption of secure protocols for in-App services. While some efforts have been made for the classification of mobile internet traffic, existing methods rely on complex feature construction and large storage cache, which lead to low processing speed, and thus not practical for online real-time scenarios. To this end, we develop an iterative analyzer for classifying encrypted mobile traffic in a real-time way. Specifically, we first select an optimal set of most discriminative features from raw features extracted from traffic packet sequences by a novel Maximizing Inner activity similarity and Minimizing Different activity similarity (MIMD) measurement. To develop the online analyzer, we first represent a traffic flow with a series of time windows, which are described by the optimal feature vector and are updated iteratively at the packet level. Instead of extracting feature elements from a series of raw traffic packets, our feature elements are updated when a new traffic packet is observed and the storage of raw traffic packets is not required. The time windows generated from the same service usage activity are grouped by our proposed method, namely, recursive time continuity constrained KMeans clustering (rCKC). The feature vectors of cluster centers are then fed into a random forest classifier to identify corresponding service usages. Finally, we provide extensive experiments on real-world Internet traffic data from Wechat, Whatsapp, and Facebook to demonstrate the effectiveness and efficiency of our approach. The results show that the proposed analyzer provides high accuracy in real-world scenarios, and has low storage cache requirement as well as fast processing speed.

Supplementary Material

MP4 File (xiong_internet_traffic_streams.mp4)

References

[1]
Saeed Aghabozorgi, Ali Seyed Shirkhorshidi, and Teh Ying Wah. 2015. Time-series clustering--A decade review. Information Sys- tems 53 (2015), 16--38.
[2]
Tengfei Bao, Huanhuan Cao, Enhong Chen, Jilei Tian, and Hui Xiong. 2012. An unsupervised approach to modeling personalized contexts of mobile users. Knowledge and Information Systems 31, 2 (2012), 345--370. https://doi.org/10.1007/s10115-011-0417-1
[3]
Kin-Pong Chan and Ada Wai-Chee Fu. 1999. Efficient time series matching by wavelets. In Data Engineering, 1999. Proceedings., 15th International Conference on. IEEE, 126--133.
[4]
Tak chung Fu. 2011. A review on time series data mining. Engineering Applications of Artificial Intelligence 24, 1 (2011), 164--181. https://doi.org/10.1016/j.engappai.2010.09.007
[5]
Hossein Falaki, Ratul Mahajan, Srikanth Kandula, Dimitrios Lymberopoulos, Ramesh Govindan, and Deborah Estrin. 2010. Diversity in smartphone usage. In Proceedings of the 8th international conference on Mobile systems, applications, and services. ACM, 179--194.
[6]
Y. Fu, J. Liu, X. Li, X. Lu, J. Ming, C. Guan, and H. Xiong. 2016. Service Usage Analysis in Mobile Messaging Apps: A Multi-label Multi-view Perspective. In 2016 IEEE 16th International Conference on Data Mining (ICDM). 877--882. https://doi.org/ 10.1109/ICDM.2016.0106
[7]
Y. Fu, H. Xiong, X. Lu, J. Yang, and C. Chen. 2016. Service Usage Classification with Encrypted Internet Traffic in Mobile Messaging Apps. IEEE Transactions on Mobile Computing 15, 11 (Nov 2016), 2851--2864. https://doi.org/10.1109/TMC.2016.2516020
[8]
Mikel Galar, Alberto Fernández, Edurne Barrenechea, Humberto Bustince, and Francisco Herrera. 2011. An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes. Pattern Recognition 44, 8 (2011), 1761--1776. https://doi.org/10.1016/j. patcog.2011.01.017
[9]
Patrick Haffner, Subhabrata Sen, Oliver Spatscheck, and Dongmei Wang. 2005. ACAS: Automated Construction of Application Signatures. In Proceedings of the 2005 ACM SIGCOMM Workshop on Mining Network Data (MineNet '05). ACM, New York, NY, USA, 197--202. https://doi.org/10.1145/1080173.1080183
[10]
Eamonn Keogh, Kaushik Chakrabarti, Michael Pazzani, and Sharad Mehrotra. 2001. Dimensionality reduction for fast similarity search in large time series databases. Knowledge and information Systems 3, 3 (2001), 263--286.
[11]
E. Keogh, S. Chu, D. Hart, and M. Pazzani. 2001. An online algorithm for segmenting time series. In Proceedings 2001 IEEE International Conference on Data Mining. 289--296. https://doi. org/10.1109/ICDM.2001.989531
[12]
Eamonn Keogh, Selina Chu, David Hart, and Michael Pazzani. 2001. An online algorithm for segmenting time series. In Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on. IEEE, 289--296.
[13]
Eamonn J Keogh and Michael J Pazzani. 1998. An Enhanced Representation of Time Series Which Allows Fast and Accurate Classification, Clustering and Relevance Feedback. In KDD, Vol. 98. 239--243.
[14]
Victor Lavrenko, Matt Schmill, Dawn Lawrie, Paul Ogilvie, David Jensen, and James Allan. 2000. Mining of concurrent text and time series. In KDD-2000 Workshop. 37--44.
[15]
Chung-Sheng Li, Philip S Yu, and Vittorio Castelli. 1998. MALM: a framework for mining sequence database at multiple abstraction levels. In Proceedings of the 7th CIKM.
[16]
T Warren Liao. 2005. Clustering of time series data-a survey. Pattern recognition 38, 11 (2005), 1857--1874.
[17]
T. Warren Liao. 2005. Clustering of time series data-a survey. Pattern Recognition 38, 11 (2005), 1857--1874. https://doi.org/ 10.1016/j.patcog.2005.01.025
[18]
Y. Liu, Z. Li, H. Xiong, X. Gao, and J. Wu. 2010. Understanding of Internal Clustering Validation Measures. In 2010 IEEE International Conference on Data Mining. 911--916. https: //doi.org/10.1109/ICDM.2010.35
[19]
Sanghyun Park, Sang-Wook Kim, and Wesley W Chu. 2001. Segment-based approach for subsequence searches in sequence databases. In Proceedings of the 2001 ACM symposium on Applied computing. ACM, 248--252.
[20]
C-S Perng, Haixun Wang, Sylvia R Zhang, and D Stott Parker. 2000. Landmarks: a new model for similarity-based pattern querying in time series databases. In Data Engineering, 2000. Proceedings. 16th International Conference on. IEEE, 33--42.
[21]
Feng Qian, Zhaoguang Wang, Alexandre Gerber, Zhuoqing Mao, Subhabrata Sen, and Oliver Spatscheck. 2011. Profiling resource usage for mobile applications: a cross-layer approach. In Proceedings of the 9th international conference on Mobile systems, applications, and services. ACM, 321--334.
[22]
B. Raahemi, W. Zhong, and J. Liu. 2008. Peer-to-Peer Traffic Identification by Mining IP Layer Data Streams Using Concept-Adapting Very Fast Decision Tree. In 2008 20th IEEE International Conference on Tools with Artificial Intelligence, Vol. 1. 525--532. https://doi.org/10.1109/ICTAI.2008.12
[23]
Stan Salvador and Philip Chan. 2004. Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms. In Tools with Artificial Intelligence, 2004. ICTAI 2004. 16th IEEE International Conference on. IEEE, 576--584.
[24]
Allou Samé, Faicel Chamroukhi, Gérard Govaert, and Patrice Aknin. 2011. Model-based clustering and segmentation of time series with changes in regime. Advances in Data Analysis and Classification 5, 4 (2011), 301--321.
[25]
Pang-Ning Tan et al. 2006. Introduction to data mining. Pearson Education India.
[26]
Alok Tongaonkar, Shuaifu Dai, Antonio Nucci, and Dawn Song. 2013. Understanding mobile app usage patterns using in-app advertisements. In Passive and Active Measurement. 63--72.
[27]
Hannu Verkasalo. 2009. Contextual patterns in mobile service usage. Personal and Ubiquitous Computing 13, 5 (2009), 331--342.
[28]
Qiang Xu, Jeffrey Erman, Alexandre Gerber, Zhuoqing Mao, Jeffrey Pang, and Shobha Venkataraman. 2011. Identifying diverse usage behaviors of smartphone apps. In Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference. ACM, 329--344.
[29]
S. Zander, T. Nguyen, and G. Armitage. 2005. Automated traffic classification and application identification using machine learning. In The IEEE Conference on Local Computer Networks 30th Anniversary (LCN'05)l. 250--257. https://doi.org/10.1109/LCN. 2005.35
[30]
Ying Zhao and George Karypis. 2002. Evaluation of Hierarchical Clustering Algorithms for Document Datasets. In Proceedings of the Eleventh International Conference on Information and Knowledge Management (CIKM '02). ACM, New York, NY, USA, 515--524. https://doi.org/10.1145/584792.58487

Cited By

View all
  • (2024)AN-Net: an Anti-Noise Network for Anonymous Traffic ClassificationProceedings of the ACM Web Conference 202410.1145/3589334.3645691(4417-4428)Online publication date: 13-May-2024
  • (2024)Robust App Fingerprinting Over the AirIEEE/ACM Transactions on Networking10.1109/TNET.2024.344862132:6(5065-5080)Online publication date: Dec-2024
  • (2024)Representation Learning of Tangled Key-Value Sequence Data for Early Classification2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00086(1063-1075)Online publication date: 13-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2017
2240 pages
ISBN:9781450348874
DOI:10.1145/3097983
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 August 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. in-app analytics
  2. internet traffic analysis
  3. service usage classification
  4. time series segmentation

Qualifiers

  • Research-article

Funding Sources

  • Natural Science Foundation of China
  • Futurewei Technologies, Inc.

Conference

KDD '17
Sponsor:

Acceptance Rates

KDD '17 Paper Acceptance Rate 64 of 748 submissions, 9%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)65
  • Downloads (Last 6 weeks)5
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)AN-Net: an Anti-Noise Network for Anonymous Traffic ClassificationProceedings of the ACM Web Conference 202410.1145/3589334.3645691(4417-4428)Online publication date: 13-May-2024
  • (2024)Robust App Fingerprinting Over the AirIEEE/ACM Transactions on Networking10.1109/TNET.2024.344862132:6(5065-5080)Online publication date: Dec-2024
  • (2024)Representation Learning of Tangled Key-Value Sequence Data for Early Classification2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00086(1063-1075)Online publication date: 13-May-2024
  • (2023)VT-Scanner: Layout Similarity on Smartphone and Its Application for Robust Scene Recognition2023 International Conference on Wireless Communications and Signal Processing (WCSP)10.1109/WCSP58612.2023.10404697(413-419)Online publication date: 2-Nov-2023
  • (2023)A Novel Multimodal Deep Learning Framework for Encrypted Traffic ClassificationIEEE/ACM Transactions on Networking10.1109/TNET.2022.321550731:3(1369-1384)Online publication date: Jun-2023
  • (2023)BehavSniffer: Sniff User Behaviors from the Encrypted Traffic by Traffic Burst Graphs2023 20th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON)10.1109/SECON58729.2023.10287511(456-464)Online publication date: 11-Sep-2023
  • (2023)Identifying Fine-Grained Douyin User Behaviors via Analyzing Encrypted Network Traffic2023 19th International Conference on Mobility, Sensing and Networking (MSN)10.1109/MSN60784.2023.00128(868-875)Online publication date: 14-Dec-2023
  • (2023)A Robust and Accurate Encrypted Video Traffic Identification Method via Graph Neural Network2023 26th International Conference on Computer Supported Cooperative Work in Design (CSCWD)10.1109/CSCWD57460.2023.10152581(867-872)Online publication date: 24-May-2023
  • (2023)Let Model Keep Evolving: Incremental Learning for Encrypted Traffic ClassificationComputers & Security10.1016/j.cose.2023.103624(103624)Online publication date: Dec-2023
  • (2023)Activity Detection from Encrypted Remote Desktop Protocol TrafficInnovative Security Solutions for Information Technology and Communications10.1007/978-3-031-32636-3_14(240-260)Online publication date: 12-May-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media